adjustment. If so, we merge the adjustment into the existing one. This
allows us to generate:
caller2:
sub %ESP, 12
mov DWORD PTR [%ESP], 0
mov %EAX, 1234567890
mov %EDX, 0
call func2
add %ESP, 8
ret 4
intead of:
caller2:
sub %ESP, 12
mov DWORD PTR [%ESP], 0
mov %EAX, 1234567890
mov %EDX, 0
call func2
sub %ESP, 4
add %ESP, 12
ret 4
for X86/fast-cc-merge-stack-adj.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22038 91177308-0d34-0410-b5e6-96231b3b80d8
to do ugly hackery to avoid emitting code like this:
call foo
mov vreg, EAX
adjcallstackup ...
If foo is a fastcc call and if vreg gets spilled, we might end up with this:
call foo
mov [ESP+offset], EAX ;; Offset doesn't consider the 12!
sub ESP, 12
Which is bad. The previous hacky code to deal with this was A) gross B) not
good enough. In particular, it could miss cases and emit the bad code above.
Now we always emit this:
call foo
adjcallstackup ...
mov vreg, EAX
directly.
This makes fastcc with callees poping the stack work much better. Next
stop (finally!) really is tail calls.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22021 91177308-0d34-0410-b5e6-96231b3b80d8
This should fix some missing symbols problems on BSD and improve performance
of programs that use that operation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22012 91177308-0d34-0410-b5e6-96231b3b80d8
each live in, and copy the regs from the vregs. As the very first thing we
do in the function, insert copies from the pregs to the vregs. This fixes
problems where the token chain of CopyFromReg was not enough to allow reordering
of the copyfromreg nodes and other unchained nodes (e.g. div, which clobbers
eax on intel).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21932 91177308-0d34-0410-b5e6-96231b3b80d8
that were failing with the pattern selector. Note that the support that
existed in the simple selector was clearly broken in several ways though
(which has also been fixed).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21831 91177308-0d34-0410-b5e6-96231b3b80d8
population (ctpop). Generic lowering is implemented, however only promotion
is implemented for SelectionDAG at the moment.
More coming soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21676 91177308-0d34-0410-b5e6-96231b3b80d8
(TRUNK)Stores and (EXT|ZEXT|SEXT)Loads have an extra SDOperand which is a SrcValueSDNode which contains the Value*. Note that if the operation is introduced by the backend, it will still have the operand, but the value* will be null.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21599 91177308-0d34-0410-b5e6-96231b3b80d8
printf format strings and other stuff. Instead of generating this:
movl $l1__2E_str_1, %eax
movl %eax, (%esp)
we now emit:
movl $l1__2E_str_1, (%esp)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21406 91177308-0d34-0410-b5e6-96231b3b80d8
Add new ppc beta option related to using condition registers
Make pattern isel control flag (-enable-pattern-isel) global and tristate
0 == off
1 == on
2 == target default
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@21309 91177308-0d34-0410-b5e6-96231b3b80d8
using Function::arg_{iterator|begin|end}. Likewise Module::g* -> Module::global_*.
This patch is contributed by Gabor Greif, thanks!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@20597 91177308-0d34-0410-b5e6-96231b3b80d8
all. This should speed up the X86 backend fairly significantly on integer
codes. Now if only we didn't have to compute livevar still... ;-)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19796 91177308-0d34-0410-b5e6-96231b3b80d8
doesn't support certain directives and symbols on cygwin are prefixed with
an underscore. This patch makes the necessary adjustments to the output.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19775 91177308-0d34-0410-b5e6-96231b3b80d8
pressure, not decreases register pressure. Fix problem where we accidentally
swapped the operands of SHLD, which caused fourinarow to fail. This fixes
fourinarow.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19697 91177308-0d34-0410-b5e6-96231b3b80d8
typically cost 1 cycle) instead of shld/shrd instruction (which are typically
6 or more cycles). This also saves code space.
For example, instead of emitting:
rotr:
mov %EAX, DWORD PTR [%ESP + 4]
mov %CL, BYTE PTR [%ESP + 8]
shrd %EAX, %EAX, %CL
ret
rotli:
mov %EAX, DWORD PTR [%ESP + 4]
shrd %EAX, %EAX, 27
ret
Emit:
rotr32:
mov %CL, BYTE PTR [%ESP + 8]
mov %EAX, DWORD PTR [%ESP + 4]
ror %EAX, %CL
ret
rotli32:
mov %EAX, DWORD PTR [%ESP + 4]
ror %EAX, 27
ret
We also emit byte rotate instructions which do not have a sh[lr]d counterpart
at all.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19692 91177308-0d34-0410-b5e6-96231b3b80d8
* Insert some really pedantic assertions that will notice when we emit the
same loads more than one time, exposing bugs. This turns a miscompilation in
bzip2 into a compile-fail. yaay.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19658 91177308-0d34-0410-b5e6-96231b3b80d8
match (X+Y)+(Z << 1), because we match the X+Y first, consuming the index
register, then there is no place to put the Z.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19652 91177308-0d34-0410-b5e6-96231b3b80d8
1. Fold [mem] += (1|-1) into inc [mem]/dec [mem] to save some icache space.
2. Do not let token factor nodes prevent forming '[mem] op= val' folds.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19643 91177308-0d34-0410-b5e6-96231b3b80d8
If we emit a load because we followed a token chain to get to it, try to
fold it into its single user if possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19620 91177308-0d34-0410-b5e6-96231b3b80d8
* Remove custom promotion for bool and byte select ops. Legalize now
promotes them for us.
* Allow folding ConstantPoolIndexes into EXTLOAD's, useful for float immediates.
* Declare which operations are not supported better.
* Add some hacky code for TRUNCSTORE to pretend that we have truncstore
for i16 types. This is useful for testing promotion code because I can
just remove 16-bit registers all together and verify that programs work.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19614 91177308-0d34-0410-b5e6-96231b3b80d8
256.bzip2 from 7142 to 7103 lines of .s file.
Second, add initial support for folding loads into compares, though this code
is dynamically dead for now. :(
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19493 91177308-0d34-0410-b5e6-96231b3b80d8
Select [mem] += Val operations. For constants, we used to get:
mov %ECX, -32768
add %ECX, DWORD PTR [l4_match_start]
mov DWORD PTR [l4_match_start], %ECX
Now we get:
add DWORD PTR [l4_match_start], -32768
For other values we used to get:
mov %EBP, %EDI ;; because the add destroys the value
add %EBP, DWORD PTR [l4_input_len]
mov DWORD PTR [l4_input_len], %EBP
now we get:
add DWORD PTR [l4_input_len], %EDI
Both of these use less registers than the alternative, are faster and smaller.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19488 91177308-0d34-0410-b5e6-96231b3b80d8
mov %ECX, %EAX
add %ECX, 32768
mov %SI, WORD PTR [2*%ECX + l13_prev]
Generate this:
mov %SI, WORD PTR [2*%ECX + l13_prev + 65536]
This occurs when you have a GEP instruction where an index is
"something + imm".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19472 91177308-0d34-0410-b5e6-96231b3b80d8
int -> FP casting code. Note that we don't have to set it for FP operations
that take FP values as operands: whatever produces the FP value will set the
flag.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19451 91177308-0d34-0410-b5e6-96231b3b80d8
evaluate the LHS or the RHS of an operation first. This causes good things
to happen. For example, instead of compiling a loop to this:
.LBBstrength_result7_1: # loopentry
movl 16(%esp), %edi
movl (%edi), %edi ;;; LOAD
movl (%ecx), %ebx
movl $2, (%eax,%ebx,4)
movl (%edx), %ebx
movl %esi, %ebp
addl $21, %ebp
addl $42, %esi
cmpl $0, %edi ;;; USE
cmovne %esi, %ebp
cmpl %ebp, %ebx
movl %ebp, %esi
jg .LBBstrength_result7_1
We now compile it to this:
.LBBstrength_result7_1: # loopentry
movl %edi, %ebx
addl $42, %ebx
addl $21, %edi
movl (%ecx), %ebp ;; LOAD
cmpl $0, %ebp ;; USE
cmovne %ebx, %edi
movl (%edx), %ebx
movl $2, (%eax,%ebx,4)
movl (%esi), %ebx
cmpl %edi, %ebx
jg .LBBstrength_result7_1
Which reduces register pressure enough (in this case) to avoid spilling in the
loop.
As another example, consider the CodeGen/X86/regpressure.ll testcase. We
used to generate this code for both cases:
regpressure1:
subl $32, %esp
movl %esi, 12(%esp)
movl %edi, 8(%esp)
movl %ebx, 4(%esp)
movl %ebp, (%esp)
movl 36(%esp), %ecx
movl (%ecx), %eax
movl 4(%ecx), %edx
movl %edx, 24(%esp)
movl 8(%ecx), %edx
movl %edx, 16(%esp)
movl 12(%ecx), %edx
movl 16(%ecx), %esi
movl 20(%ecx), %edi
movl 24(%ecx), %ebx
movl %ebx, 28(%esp)
movl 28(%ecx), %ebx
movl 32(%ecx), %ebp
movl %ebp, 20(%esp)
movl 36(%ecx), %ecx
imull 24(%esp), %eax
imull 16(%esp), %eax
imull %edx, %eax
imull %esi, %eax
imull %edi, %eax
imull 28(%esp), %eax
imull %ebx, %eax
imull 20(%esp), %eax
imull %ecx, %eax
movl (%esp), %ebp
movl 4(%esp), %ebx
movl 8(%esp), %edi
movl 12(%esp), %esi
addl $32, %esp
ret
This code is basically trying to do all of the loads first, then execute all
of the multiplies. Because we run out of registers, lots of spill code happens.
We now generate this code for both cases:
regpressure1:
movl 4(%esp), %ecx
movl (%ecx), %eax
movl 4(%ecx), %edx
imull %edx, %eax
movl 8(%ecx), %edx
imull %edx, %eax
movl 12(%ecx), %edx
imull %edx, %eax
movl 16(%ecx), %edx
imull %edx, %eax
movl 20(%ecx), %edx
imull %edx, %eax
movl 24(%ecx), %edx
imull %edx, %eax
movl 28(%ecx), %edx
imull %edx, %eax
movl 32(%ecx), %edx
imull %edx, %eax
movl 36(%ecx), %ecx
imull %ecx, %eax
ret
which is much nicer (when we fold loads into the muls it will be even better).
The old instruction selector used to produce the good code for regpressure1
but not for regpressure2, as it depended on the order of operations in the
LLVM code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19449 91177308-0d34-0410-b5e6-96231b3b80d8
of an ADDri (due to current restrictions on MachineOperand :( ). This allows
us to generate:
leal Data+16000, %edx
instead of:
movl $Data, %edx
addl $16000, %edx
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19420 91177308-0d34-0410-b5e6-96231b3b80d8
store float 123.45, float* %P
as an integer store. This adds handling of float immediate stores as integers
for arguments passed function calls.
This is now tested by CodeGen/X86/store-fp-constant.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19364 91177308-0d34-0410-b5e6-96231b3b80d8
For now, this is the default, as the current selector is missing some big pieces.
To enable the new selector, pass -disable-pattern-isel=false to llc or lli.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19335 91177308-0d34-0410-b5e6-96231b3b80d8
precisely represented as a float, put it into the constant pool as a
float.
2. Use the cbw/cwd/cdq instructions instead of an explicit SAR for signed
division.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19291 91177308-0d34-0410-b5e6-96231b3b80d8
to get Visual Studio to link in X86.lib to the executables that need it. There
is another way of doing it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19252 91177308-0d34-0410-b5e6-96231b3b80d8
1. Add new instructions for checking parity flags: JP, JNP, SETP, SETNP.
2. Set the isCommutable and isPromotableTo3Address bits on several
instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19246 91177308-0d34-0410-b5e6-96231b3b80d8
save small amounts of time for functions that don't call llvm.returnaddress
or llvm.frameaddress (which is almost all functions).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@19006 91177308-0d34-0410-b5e6-96231b3b80d8
don't support long double anyway, and this gives us FP results closer to
other targets.
This also speeds up 179.art from 41.4s to 18.32s, by eliminating a problem
with extra precision that causes an FP == comparison to fail (leading to
extra loop iterations).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@18895 91177308-0d34-0410-b5e6-96231b3b80d8
instead of 80-bits of precision. This fixes PR467.
This change speeds up fldry on X86 with LLC from 7.32s on apoc to 4.68s.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@18433 91177308-0d34-0410-b5e6-96231b3b80d8
to Brian and the Sun compiler for pointing out that the obvious works :)
This also enables folding all long comparisons into setcc and branch
instructions: before we could only do == and !=
For example, for:
void test(unsigned long long A, unsigned long long B) {
if (A < B) foo();
}
We now generate:
test:
subl $4, %esp
movl %esi, (%esp)
movl 8(%esp), %eax
movl 12(%esp), %ecx
movl 16(%esp), %edx
movl 20(%esp), %esi
subl %edx, %eax
sbbl %esi, %ecx
jae .LBBtest_2 # UnifiedReturnBlock
.LBBtest_1: # then
call foo
movl (%esp), %esi
addl $4, %esp
ret
.LBBtest_2: # UnifiedReturnBlock
movl (%esp), %esi
addl $4, %esp
ret
Instead of:
test:
subl $12, %esp
movl %esi, 8(%esp)
movl %ebx, 4(%esp)
movl 16(%esp), %eax
movl 20(%esp), %ecx
movl 24(%esp), %edx
movl 28(%esp), %esi
cmpl %edx, %eax
setb %al
cmpl %esi, %ecx
setb %bl
cmove %ax, %bx
testb %bl, %bl
je .LBBtest_2 # UnifiedReturnBlock
.LBBtest_1: # then
call foo
movl 4(%esp), %ebx
movl 8(%esp), %esi
addl $12, %esp
ret
.LBBtest_2: # UnifiedReturnBlock
movl 4(%esp), %ebx
movl 8(%esp), %esi
addl $12, %esp
ret
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@18330 91177308-0d34-0410-b5e6-96231b3b80d8
* Get rid of "emitMaybePCRelativeValue", either we want to emit a PC relative
value or not: drop the maybe BS. As it turns out, the only places where
the bool was a variable coming in, the bool was a dynamic constant.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17867 91177308-0d34-0410-b5e6-96231b3b80d8
two or three, open code the equivalent operation which is faster on athlon
and P4 (by a substantial margin).
For example, instead of compiling this:
long long X2(long long Y) { return Y << 2; }
to:
X3_2:
movl 4(%esp), %eax
movl 8(%esp), %edx
shldl $2, %eax, %edx
shll $2, %eax
ret
Compile it to:
X2:
movl 4(%esp), %eax
movl 8(%esp), %ecx
movl %eax, %edx
shrl $30, %edx
leal (%edx,%ecx,4), %edx
shll $2, %eax
ret
Likewise, for << 3, compile to:
X3:
movl 4(%esp), %eax
movl 8(%esp), %ecx
movl %eax, %edx
shrl $29, %edx
leal (%edx,%ecx,8), %edx
shll $3, %eax
ret
This matches icc, except that icc open codes the shifts as adds on the P4.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17707 91177308-0d34-0410-b5e6-96231b3b80d8
double %test(uint %X) {
%tmp.1 = cast uint %X to double ; <double> [#uses=1]
ret double %tmp.1
}
into:
test:
sub %ESP, 8
mov %EAX, DWORD PTR [%ESP + 12]
mov %ECX, 0
mov DWORD PTR [%ESP], %EAX
mov DWORD PTR [%ESP + 4], %ECX
fild QWORD PTR [%ESP]
add %ESP, 8
ret
... which basically zero extends to 8 bytes, then does an fild for an
8-byte signed int.
Now we generate this:
test:
sub %ESP, 4
mov %EAX, DWORD PTR [%ESP + 8]
mov DWORD PTR [%ESP], %EAX
fild DWORD PTR [%ESP]
shr %EAX, 31
fadd DWORD PTR [.CPItest_0 + 4*%EAX]
add %ESP, 4
ret
.section .rodata
.align 4
.CPItest_0:
.quad 5728578726015270912
This does a 32-bit signed integer load, then adds in an offset if the sign
bit of the integer was set.
It turns out that this is substantially faster than the preceeding sequence.
Consider this testcase:
unsigned a[2]={1,2};
volatile double G;
void main() {
int i;
for (i=0; i<100000000; ++i )
G += a[i&1];
}
On zion (a P4 Xeon, 3Ghz), this patch speeds up the testcase from 2.140s
to 0.94s.
On apoc, an athlon MP 2100+, this patch speeds up the testcase from 1.72s
to 1.34s.
Note that the program takes 2.5s/1.97s on zion/apoc with GCC 3.3 -O3
-fomit-frame-pointer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17083 91177308-0d34-0410-b5e6-96231b3b80d8
%X = and Y, constantint
%Z = setcc %X, 0
instead of emitting:
and %EAX, 3
test %EAX, %EAX
je .LBBfoo2_2 # UnifiedReturnBlock
We now emit:
test %EAX, 3
je .LBBfoo2_2 # UnifiedReturnBlock
This triggers 581 times on 176.gcc for example.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17080 91177308-0d34-0410-b5e6-96231b3b80d8
case:
int C[100];
int foo() {
return C[4];
}
We now codegen:
foo:
mov %EAX, DWORD PTR [C + 16]
ret
instead of:
foo:
mov %EAX, OFFSET C
mov %EAX, DWORD PTR [%EAX + 16]
ret
Other impressive features may be coming later.
This patch is contributed by Jeff Cohen!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@17011 91177308-0d34-0410-b5e6-96231b3b80d8
the -sse* options (to avoid misleading people).
Also, the stack alignment of the target doesn't depend on whether SSE is
eventually implemented, so remove a comment.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16860 91177308-0d34-0410-b5e6-96231b3b80d8
which prevented setcc's from being folded into branches. It appears that
conditional branchinst's CC operand is actually operand(2), not operand(0)
as we might expect. :(
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16859 91177308-0d34-0410-b5e6-96231b3b80d8
t:
mov %EDX, DWORD PTR [%ESP + 4]
mov %ECX, 2
mov %EAX, %EDX
sar %EDX, 31
idiv %ECX
mov %EAX, %EDX
ret
Generate:
t:
mov %ECX, DWORD PTR [%ESP + 4]
*** mov %EAX, %ECX
cdq
and %ECX, 1
xor %ECX, %EDX
sub %ECX, %EDX
*** mov %EAX, %ECX
ret
Note that the two marked moves are redundant, and should be eliminated by the
register allocator, but aren't.
Compare this to GCC, which generates:
t:
mov %eax, DWORD PTR [%esp+4]
mov %edx, %eax
shr %edx, 31
lea %ecx, [%edx+%eax]
and %ecx, -2
sub %eax, %ecx
ret
or ICC 8.0, which generates:
t:
movl 4(%esp), %ecx #3.5
movl $-2147483647, %eax #3.25
imull %ecx #3.25
movl %ecx, %eax #3.25
sarl $31, %eax #3.25
addl %ecx, %edx #3.25
subl %edx, %eax #3.25
addl %eax, %eax #3.25
negl %eax #3.25
subl %eax, %ecx #3.25
movl %ecx, %eax #3.25
ret #3.25
We would be in great shape if not for the moves.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16763 91177308-0d34-0410-b5e6-96231b3b80d8
* Update comments
* Rearrange code a bit
* Finally ELIMINATE the GAS workaround emitter for Intel mode. woot!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16647 91177308-0d34-0410-b5e6-96231b3b80d8
old and broken AT&T syntax assemblers. The problem with this hack is that
*SOME* forms of the fdiv and fsub instructions have the 'r' bit inverted.
This was a real pain to figure out, but is trivially easy to support: thus
we are now bug compatible with gas and gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16644 91177308-0d34-0410-b5e6-96231b3b80d8
Intel and AT&T style assembly language. The ultimate goal of this is to
eliminate the GasBugWorkaroundEmitter class, but for now AT&T style emission
is not fully operational.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16639 91177308-0d34-0410-b5e6-96231b3b80d8
hopefully lead to the death of the 'GasBugWorkaroundEmitter'. This also
includes changes to wrap the whole file to 80 columns! Woot! :)
Note that the AT&T style output has not been tested at all.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16638 91177308-0d34-0410-b5e6-96231b3b80d8
are only used by the stackifier when transforming FPn register
allocations to the real stack file x87 registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@16472 91177308-0d34-0410-b5e6-96231b3b80d8