The x86_mmx type is used for MMX intrinsics, parameters and
return values where these use MMX registers, and is also
supported in load, store, and bitcast.
Only the above operations generate MMX instructions, and optimizations
do not operate on or produce MMX intrinsics.
MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
smaller pieces. Optimizations may occur on these forms and the
result casted back to x86_mmx, provided the result feeds into a
previous existing x86_mmx operation.
The point of all this is prevent optimizations from introducing
MMX operations, which is unsafe due to the EMMS problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115243 91177308-0d34-0410-b5e6-96231b3b80d8
edited during emission.
If the basic block ends in a switch that gets lowered to a jump table, any
phis at the default edge were getting updated wrong. The jump table data
structure keeps a pointer to the header blocks that wasn't getting updated
after the MBB is split.
This bug was exposed on 32-bit Linux when disabling critical edge splitting in
codegen prepare.
The fix is to uipdate stale MBB pointers whenever a block is split during
emission.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115191 91177308-0d34-0410-b5e6-96231b3b80d8
LDM/STM instructions can run one cycle faster on some ARM processors if the
memory address is 64-bit aligned. Radar 8489376.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@115047 91177308-0d34-0410-b5e6-96231b3b80d8
cost modeling for if-conversion. Now if only we had a way to estimate the misprediction probability.
Adjsut CodeGen/ARM/ifcvt10.ll. The pipeline on Cortex-A8 is long enough that it is still profitable
to predicate an ldm, but the shorter pipeline on Cortex-A9 makes it unprofitable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114995 91177308-0d34-0410-b5e6-96231b3b80d8
Rather than having arbitrary cutoffs, actually try to cost model the conversion.
For now, the constants are tuned to more or less match our existing behavior, but these will be
changed to reflect realistic values as this work proceeds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114973 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts revision 114633. It was breaking llvm-gcc-i386-linux-selfhost.
It seems there is a downstream bug that is exposed by
-cgp-critical-edge-splitting=0. When that bug is fixed, this patch can go back
in.
Note that the changes to tailcallfp2.ll are not reverted. They were good are
required.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114859 91177308-0d34-0410-b5e6-96231b3b80d8
support aligned comm. Detect when compiling for 10.4 and don't
emit an alignment for comm. THis will hopefully fix PR8198.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114817 91177308-0d34-0410-b5e6-96231b3b80d8
x86-32: 32-bit calls were named "call" not "calll". 64-bit calls were correctly
named "callq", so this only impacted x86-32.
This fixes rdar://8456370 - llvm-mc rejects 'calll'
This also exposes that mingw/64 is generating a 32-bit call instead of a 64-bit call,
I will file a bugzilla.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114534 91177308-0d34-0410-b5e6-96231b3b80d8
(sbbl x, x) sets the registers to 0 or ~0. Combined with two's complement arithmetic, we can fold
the intermediate AND and the ADD into a single SUB.
This fixes <rdar://problem/8449754>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114460 91177308-0d34-0410-b5e6-96231b3b80d8
CombinerAA cannot assume that different FrameIndex's never alias, but can instead use
MachineFrameInfo to get the actual offsets of these slots and check for actual aliasing.
This fixes CodeGen/X86/2010-02-19-TailCallRetAddrBug.ll and CodeGen/X86/tailcallstack64.ll
when CombinerAA is enabled, modulo a different register allocation sequence.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114348 91177308-0d34-0410-b5e6-96231b3b80d8
between the high and low registers for prologue/epilogue code. This was
a Darwin-only thing that wasn't providing a realistic benefit anymore.
Combining the save areas simplifies the compiler code and results in better
ARM/Thumb2 codegen.
For example, previously we would generate code like:
push {r4, r5, r6, r7, lr}
add r7, sp, #12
stmdb sp!, {r8, r10, r11}
With this change, we combine the register saves and generate:
push {r4, r5, r6, r7, r8, r10, r11, lr}
add r7, sp, #12
rdar://8445635
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114340 91177308-0d34-0410-b5e6-96231b3b80d8
NO path to the destination containing side effects, not that SOME path contains no side effects.
In practice, this only manifests with CombinerAA enabled, because otherwise the chain has little
to no branching, so "any" is effectively equivalent to "all".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114268 91177308-0d34-0410-b5e6-96231b3b80d8
value should be in GPRs when it's going to be used as a scalar, and we use
VMOVRRD to make that happen, but if the value is converted back to a vector
we need to fold to a simple bit_convert. Radar 8407927.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114233 91177308-0d34-0410-b5e6-96231b3b80d8
1) Do forward copy propagation. This makes it easier to estimate the cost of the
instruction being sunk.
2) Break critical edges on demand, including cases where the value is used by
PHI nodes.
Critical edge splitting is not yet enabled by default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114227 91177308-0d34-0410-b5e6-96231b3b80d8
legacy asm printer uses instructions of the form, "mov r0, r0, lsl #3", while
the MC-instruction printer uses the form "lsl r0, r0, #3". The latter mnemonic
is correct and preferred according the ARM documentation (A8.6.98). The former
are pseudo-instructions for the latter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114221 91177308-0d34-0410-b5e6-96231b3b80d8
walking the asm arguments once and stashing their Values. This is
wrong because the same memory location can be in the list twice, and
if the first one has a sunkaddr substituted, the stashed value for the
second one will be wrong (use-after-free). PR 8154.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114104 91177308-0d34-0410-b5e6-96231b3b80d8
This cleans up after the mess r108567 left in the CellSPU backend.
ORCvt-instruction were used to reinterpret registers, and the ORs were then
removed by isMoveInstr(). This patch now removes 350 instrucions of format:
or $3, $3, $3
(from the 52 testcases in CodeGen/CellSPU). One case of a nonexistant or is
checked for.
Some moves of the form 'ori $., $., 0' and 'ai $., $., 0' still remain.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@114074 91177308-0d34-0410-b5e6-96231b3b80d8
encountered while building llvm-gcc for arm. This is probably the same issue
that the ppc buildbot hit. llvm::prior works on a MachineBasicBlock::iterator,
not a plain MachineInstr.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113983 91177308-0d34-0410-b5e6-96231b3b80d8
backing out following to get it back to green,
so I can investigate in peace:
svn merge -c -113840 llvm/test/CodeGen/ARM/arm-and-tst-peephole.ll
svn merge -c -113876 -c -113839 llvm/lib/Target/ARM/ARMBaseInstrInfo.cpp
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113980 91177308-0d34-0410-b5e6-96231b3b80d8
to expose greater opportunities for store narrowing in codegen. This patch fixes a potential
infinite loop in instcombine caused by one of the introduced transforms being overly aggressive.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113763 91177308-0d34-0410-b5e6-96231b3b80d8
to use AddrMode4, there was a count of the registers stored in one of the
operands. I changed that to just count the operands but forgot to adjust for
the size of D registers. This was noticed by Evan as a performance problem
but it is a potential correctness bug as well, since it is possible that this
could merge a base update with a non-matching immediate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113576 91177308-0d34-0410-b5e6-96231b3b80d8
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113570 91177308-0d34-0410-b5e6-96231b3b80d8
the VST pseudos. The VLD/VST scheduling still needs work (see pr6722), but
at least we shouldn't confuse the loads with the stores.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113473 91177308-0d34-0410-b5e6-96231b3b80d8
Since mem2reg isn't run at -O0, we get a ton of reloads from the stack,
for example, before, this code:
int foo(int x, int y, int z) {
return x+y+z;
}
used to compile into:
_foo: ## @foo
subq $12, %rsp
movl %edi, 8(%rsp)
movl %esi, 4(%rsp)
movl %edx, (%rsp)
movl 8(%rsp), %edx
movl 4(%rsp), %esi
addl %edx, %esi
movl (%rsp), %edx
addl %esi, %edx
movl %edx, %eax
addq $12, %rsp
ret
Now we produce:
_foo: ## @foo
subq $12, %rsp
movl %edi, 8(%rsp)
movl %esi, 4(%rsp)
movl %edx, (%rsp)
movl 8(%rsp), %edx
addl 4(%rsp), %edx ## Folded load
addl (%rsp), %edx ## Folded load
movl %edx, %eax
addq $12, %rsp
ret
Fewer instructions and less register use = faster compiles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@113102 91177308-0d34-0410-b5e6-96231b3b80d8