outside the loop and reducible.
This more completely hides them from LSR, which isn't usually able to
do anything meaningful with non-affine expressions anyway, and this
consequently hides them from SCEVExpander, which is acutely unprepared
for non-affine expressions.
Replace test/CodeGen/X86/lsr-nonaffine.ll with a new test that tests
the new behavior.
This works around the bug in PR10117 / rdar://problem/9633149, and is
generally an improvement besides.
llvm-svn: 134268
The DSP instructions in the Thumb2 instruction set are an optional extension
in the Cortex-M* archtitecture. When present, the implementation is considered
an "ARMv7E-M implementation," and when not, an "ARMv7-M implementation."
Add a subtarget feature hook for the v7e-m instructions and hook it up. The
cortex-m3 cpu is an example of a v7m implementation, while the cortex-m4 is
a v7e-m implementation.
rdar://9572992
llvm-svn: 134261
itineraries.
- Refactor TargetSubtarget to be based on MCSubtargetInfo.
- Change tablegen generated subtarget info to initialize MCSubtargetInfo
and hide more details from targets.
llvm-svn: 134257
t2MOVCC[ri] are just t2MOV[ri] instructions, so properly pseudo-ize them.
The Thumb1 versions, tMOVCC[ri] were only present for use by the size-
reduction pass, so they're no longer necessary at all and can be deleted.
llvm-svn: 134242
copy is a kill") to see if it fixes the i386 dragonegg buildbot, which is timing out
because gcc built with dragonegg is going into an infinite loop.
llvm-svn: 134237
The constraints are represented by the register class of the original
virtual register created for the inline asm. If the register class were
included in the operand descriptor, we might be able to do this.
For now, just give up on regclass inflation when inline asm is involved.
No test case, this bug hasn't happened yet.
llvm-svn: 134226
Merge the tMOVr, tMOVgpr2tgpr, tMOVtgpr2gpr, and tMOVgpr2gpr instructions
into tMOVr. There's no need to keep them separate. Giving the tMOVr
instruction the proper GPR register class for its operands is sufficient
to give the register allocator enough information to do the right thing
directly.
llvm-svn: 134204
Fix a FIXME and allow predication (in Thumb2) for the T1 register to
register MOV instructions. This allows some better codegen with
if-conversion (as seen in the test updates), plus it lays the groundwork
for pseudo-izing the tMOVCC instructions.
llvm-svn: 134197
It's just a t2LDMIA_UPD instruction with extra codegen properties, so it
doesn't need the encoding information. As a side-benefit, we now correctly
recognize for instruction printing as a 'pop' instruction.
llvm-svn: 134173
be the first encoded as the first feature. It then uses the CPU name to look up
features / scheduling itineray even though clients know full well the CPU name
being used to query these properties.
The fix is to just have the clients explictly pass the CPU name!
llvm-svn: 134127
This patch will sometimes choose live range split points next to
interference instead of always splitting next to a register point. That
means spill code can now appear almost anywhere, and it was necessary
to fix code that didn't expect that.
The difficult places were:
- Between a CALL returning a value on the x87 stack and the
corresponding FpPOP_RETVAL (was FpGET_ST0). Probably also near x87
inline assembly, but that didn't actually show up in testing.
- Between a CALL popping arguments off the stack and the corresponding
ADJCALLSTACKUP.
Both are fixed now. The only place spill code can't appear is after
terminators, see SplitAnalysis::getLastSplitPoint.
Original commit message:
Rewrite RAGreedy::splitAroundRegion, now with cool ASCII art.
This function has to deal with a lot of special cases, and the old
version got it wrong sometimes. In particular, it would sometimes leave
multiple uses in the stack interval in a single block. That causes bad
code with multiple reloads in the same basic block.
The new version handles block entry and exit in a single pass. It first
eliminates all the easy cases, and then goes on to create a local
interval for the blocks with difficult interference. Previously, we
would only create the local interval for completely isolated blocks.
It can happen that the stack interval becomes completely empty because
we could allocate a register in all edge bundles, and the new local
intervals deal with the interference. The empty stack interval is
harmless, but we need to remove a SplitKit assertion that checks for
empty intervals.
llvm-svn: 134125
Unlike Thumb1, Thumb2 does not have dedicated encodings for adjusting the
stack pointer. It can just use the normal add-register-immediate encoding
since it can use all registers as a source, not just R0-R7. The extra
instruction definitions are just duplicates of the normal instructions with
the (not well enforced) constraint that the source register was SP.
llvm-svn: 134114
Some x86-32 calls pop values off the stack, and we need to readjust the
stack pointer after the call. This happens when ADJCALLSTACKUP is
eliminated.
It could happen that spill code was inserted between the CALL and
ADJCALLSTACKUP instructions, and we would compute wrong stack pointer
offsets for those frame index references.
Fix this by inserting the stack pointer adjustment immediately after the
call instead of where the ADJCALLSTACKUP instruction was erased.
I don't have a test case since we don't currently insert code in that
position. We will soon, though. I am testing a regalloc patch that
didn't work on Linux because of this.
llvm-svn: 134113
already makes the assumption, which is correct on ARM, that a type's alignment is
less than its alloc size. This improves codegen with Clang (which inserts a lot of
extraneous alignment specifiers) and fixes <rdar://problem/9695089>.
llvm-svn: 134106
The tSpill and tRestore instructions are just copies of the tSTRspi and
tLDRspi instructions, respectively. Just use those directly instead.
llvm-svn: 134092
For example, ".byte 256" would previously assert() when emitting an object
file. Now it generates a diagnostic that the literal value is out of range.
rdar://9686950
llvm-svn: 134069
This function has to deal with a lot of special cases, and the old
version got it wrong sometimes. In particular, it would sometimes leave
multiple uses in the stack interval in a single block. That causes bad
code with multiple reloads in the same basic block.
The new version handles block entry and exit in a single pass. It first
eliminates all the easy cases, and then goes on to create a local
interval for the blocks with difficult interference. Previously, we
would only create the local interval for completely isolated blocks.
It can happen that the stack interval becomes completely empty because
we could allocate a register in all edge bundles, and the new local
intervals deal with the interference. The empty stack interval is
harmless, but we need to remove a SplitKit assertion that checks for
empty intervals.
llvm-svn: 134047
sink them into MC layer.
- Added MCInstrInfo, which captures the tablegen generated static data. Chang
TargetInstrInfo so it's based off MCInstrInfo.
llvm-svn: 134021
Drop the FpMov instructions, use plain COPY instead.
Drop the FpSET/GET instruction for accessing fixed stack positions.
Instead use normal COPY to/from ST registers around inline assembly, and
provide a single new FpPOP_RETVAL instruction that can access the return
value(s) from a call. This is still necessary since you cannot tell from
the CALL instruction alone if it returns anything on the FP stack. Teach
fast isel to use this.
This provides a much more robust way of handling fixed stack registers -
we can tolerate arbitrary FP stack instructions inserted around calls
and inline assembly. Live range splitting could sometimes break x87 code
by inserting spill code in unfortunate places.
As a bonus we handle floating point inline assembly correctly now.
llvm-svn: 134018
When the destination operand is the same as the first source register
operand for arithmetic instructions, the destination operand may be omitted.
For example, the following two instructions are equivalent:
and r1, #ff
and r1, r1, #ff
rdar://9672867
llvm-svn: 133973
Correctly parse the forms of the Thumb mov-immediate instruction:
1. 8-bit immediate 0-255.
2. 12-bit shifted-immediate.
The 16-bit immediate "movw" form is also legal with just a "mov" mnemonic,
but is not yet supported. More parser logic necessary there due to fixups.
llvm-svn: 133966
Thumb2 MOV mnemonic can accept both cc_out and predication. We don't (yet)
encode the instruction properly, but this gets the parsing part.
llvm-svn: 133945
Add aliases for the vpush/vpop mnemonics to the VFP load/store multiple
writeback instructions w/ SP as the base pointer.
rdar://9683231
llvm-svn: 133932
When the destination operand is the same as the first source register
operand for arithmetic instructions, the destination operand may be omitted.
For example, the following two instructions are equivalent:
sub r2, r2, #6
sub r2, #6
rdar://9682597
llvm-svn: 133925
Removed the check that peeks past EXTRA_SUBREG, which I don't think
makes sense any more. Intead treat it as a normal register def. No
significant affect on x86 or ARM benchmarks.
llvm-svn: 133917
alloca that only holds a copy of a global and we're going to replace the users
of the alloca with that global, just nuke the lifetime intrinsics. Part of
PR10121.
llvm-svn: 133905
Both become <earlyclobber> defs on the INLINEASM MachineInstr, but we
now use two different asm operand kinds.
The new Kind_Clobber is treated identically to the old
Kind_RegDefEarlyClobber for now, but x87 floating point stack inline
assembly does care about the difference.
This will pop a register off the stack:
asm("fstp %st" : : "t"(x) : "st");
While this will pop the input and push an output:
asm("fst %st" : "=&t"(r) : "t"(x));
We need to know if ST0 was a clobber or an output operand, and we can't
depend on <dead> flags for that.
llvm-svn: 133902
The INLINEASM MachineInstrs have an immediate operand describing each
original inline asm operand. Decode the bits in MachineInstr::print() so
it is easier to read:
INLINEASM <es:rorq $1,$0>, $0:[regdef], %vreg0<def>, %vreg1<def>, $1:[imm], 1, $2:[reguse] [tiedto:$0], %vreg2, %vreg3, $3:[regdef-ec], %EFLAGS<earlyclobber,imp-def>
llvm-svn: 133901
The .b8 operations in PTX are far more limiting than I first thought. The mov operation isn't even supported, so there's no way of converting a .pred value into a .b8 without going via .b16, which is
not sensible. An improved implementation needs to use the fact that loads and stores automatically extend and truncate to implement support for EXTLOAD and TRUNCSTORE in order to correctly support
boolean values.
llvm-svn: 133873
Move the target-specific RecordRelocation logic out of the generic MC
MachObjectWriter and into the target-specific object writers. This allows
nuking quite a bit of target knowledge from the supposedly target-independent
bits in lib/MC.
llvm-svn: 133844
The fixup value comes in as the whole 32-bit value, so for the lo16 fixup,
the upper bits need to be masked off. Previously we assumed the masking had
already been done and asserted.
rdar://9635991
llvm-svn: 133818
The i8 type is required for boolean values, but can only use ld, st and mov instructions. The i1 type continues to be used for predicates.
llvm-svn: 133814
instructions can be used to match combinations of multiply/divide and VCVT
(between floating-point and integer, Advanced SIMD). Basically the VCVT
immediate operand that specifies the number of fraction bits corresponds to a
floating-point multiply or divide by the corresponding power of 2.
For example, VCVT (floating-point to fixed-point, Advanced SIMD) can replace a
combination of VMUL and VCVT (floating-point to integer) as follows:
Example (assume d17 = <float 8.000000e+00, float 8.000000e+00>):
vmul.f32 d16, d17, d16
vcvt.s32.f32 d16, d16
becomes:
vcvt.s32.f32 d16, d16, #3
Similarly, VCVT (fixed-point to floating-point, Advanced SIMD) can replace a
combinations of VCVT (integer to floating-point) and VDIV as follows:
Example (assume d17 = <float 8.000000e+00, float 8.000000e+00>):
vcvt.f32.s32 d16, d16
vdiv.f32 d16, d17, d16
becomes:
vcvt.f32.s32 d16, d16, #3
llvm-svn: 133813
.file and .loc directives.
Ideally, we would utilize the existing support in AsmPrinter for this, but
I cannot find a way to get .file and .loc directives to print without the
rest of the associated DWARF sections, which ptxas cannot handle.
llvm-svn: 133812
enables SelectionDAG::getLoad at MipsISelLowering.cpp:1914 to return a
pre-existing node instead of redundantly create a new node every time it is
called.
llvm-svn: 133811
target machine from those that are only needed by codegen. The goal is to
sink the essential target description into MC layer so we can start building
MC based tools without needing to link in the entire codegen.
First step is to refactor TargetRegisterInfo. This patch added a base class
MCRegisterInfo which TargetRegisterInfo is derived from. Changed TableGen to
separate register description from the rest of the stuff.
llvm-svn: 133782
parameters if SM >= 2.0
- Update test cases to be more robust against register allocation changes
- Bump up the number of registers to 128 per type
- Include Python script to re-generate register file with any number of
registers
llvm-svn: 133736
It has only one user. This eliminates the last include of
config.h from the public headers -- ideally, config.h
shouldn't even be installed by `make install` anymore.
llvm-svn: 133713
"Reinstate r133435 and r133449 (reverted in r133499) now that the clang
self-hosted build failure has been fixed (r133512)."
Due to some additional warnings.
llvm-svn: 133700
register allocation if it has a indirectbr or if we can duplicate it to
every predecessor.
This fixes the SingleSource/Benchmarks/Shootout-C++/matrix.cpp regression but
keeps the previous improvements to sunspider.
llvm-svn: 133682