Commit Graph

404 Commits

Author SHA1 Message Date
Evan Cheng
af1c76846d When the register allocator runs out of registers, spill a physical register around the def's and use's of the interval being allocated to make it possible for the interval to target a register and spill it right away and restore a register for uses. This likely generates terrible code but is before than aborting.
llvm-svn: 48218
2008-03-11 07:19:34 +00:00
Chris Lattner
f0684bfd16 Don't emit FP_REG_KILL into a block that just returns. Nothing
can be live out of the block anyway, so it isn't needed.

llvm-svn: 48192
2008-03-10 23:34:12 +00:00
Dan Gohman
47137eba06 Fix mul expansion to check the correct number of bits for
zero extension when checking if an unsigned multiply is
safe.

llvm-svn: 48171
2008-03-10 20:42:19 +00:00
Dale Johannesen
62a0b6a79b These tests don't work unless SSE2 is active.
Judging from the checking comments this is intentional,
so add the flag (makes them pass on non-x86 host).

llvm-svn: 48157
2008-03-10 17:33:57 +00:00
Dale Johannesen
c9ecee85c4 There is no "-mattr=+sse1" flag; fix test for non-x86 hosts.
llvm-svn: 48156
2008-03-10 17:13:37 +00:00
Evan Cheng
02b66c3a32 - Fix a subtle bug in RemoveCopyByCommutingDef. ALR is the live range where the source is defined; BLR is the live range which is defined by the copy.
If ALR and BLR overlaps and end of BLR extends beyond end of ALR, e.g.                                                                                                 
 A = or A, B                                                                                                                                                            
 ...                                                                                                                                                                    
 B = A                                                                                                                                                                  
 ...                                                                                                                                                                    
 C = A<kill>                                                                                                                                                            
 ...                                                                                                                                                                    
   = B                                                                                                                                                                  
                                                                                                                                                                        
then do not add kills of A to the newly created B interval.
- Also fix some kill info update bug.

llvm-svn: 48141
2008-03-10 08:11:32 +00:00
Evan Cheng
3c0ddc999f Avoid creating BUILD_VECTOR of all zero elements of "non-normalized" type (e.g. v8i16 on x86) after legalizer. Instruction selection does not expect to see them. In all likelihood this can only be an issue in a bugpoint reduced test case.
llvm-svn: 48136
2008-03-10 07:19:13 +00:00
Chris Lattner
b6bfedbcfd teach X86InstrInfo::copyRegToReg how to copy into ST(0) from
an RFP register class.

Teach ScheduleDAG how to handle CopyToReg with different src/dst 
reg classes.

This allows us to compile trivial inline asms that expect stuff
on the top of x87-fp stack.

llvm-svn: 48107
2008-03-09 09:15:31 +00:00
Chris Lattner
8d0203478f Add ScheduleDAG support for copytoreg where the src/dst register are
in different register classes, e.g. copy of ST(0) to RFP*.  This gets
some really trivial inline asm working that plops things on the top of
stack (PR879)

llvm-svn: 48105
2008-03-09 08:49:15 +00:00
Chris Lattner
b9a4c86fbf reduce this testcase more
llvm-svn: 48092
2008-03-09 06:57:21 +00:00
Chris Lattner
b628208161 Finish implementing a readme entry: when inserting an i64 variable
into a vector of zeros or undef, and when the top part is obviously
zero, we can just use movd + shuffle.  This allows us to compile
vec_set-B.ll into:

_test3:
	movl	$1234567, %eax
	andl	4(%esp), %eax
	movd	%eax, %xmm0
	ret

instead of:

_test3:
	subl	$28, %esp
	movl	$1234567, %eax
	andl	32(%esp), %eax
	movl	%eax, (%esp)
	movl	$0, 4(%esp)
	movq	(%esp), %xmm0
	addl	$28, %esp
	ret

llvm-svn: 48090
2008-03-09 05:42:06 +00:00
Chris Lattner
17f68a3075 Implement a readme entry, compiling
#include <xmmintrin.h>
__m128i doload64(short x) {return _mm_set_epi16(0,0,0,0,0,0,0,1);}

into:
	movl	$1, %eax
	movd	%eax, %xmm0
	ret

instead of a constant pool load.

llvm-svn: 48063
2008-03-09 01:05:04 +00:00
Chris Lattner
24031c9426 make this test harder
llvm-svn: 48061
2008-03-09 00:30:06 +00:00
Chris Lattner
7173d3bd70 Teach SD some vector identities, allowing us to compile vec_set-9 into:
_test3:
	movd	%rdi, %xmm1
	#IMPLICIT_DEF %xmm0
	punpcklqdq	%xmm1, %xmm0
	ret

instead of:

_test3:
	#IMPLICIT_DEF %rax
	movd	%rax, %xmm0
	movd	%rdi, %xmm1
	punpcklqdq	%xmm1, %xmm0
	ret

This is still not ideal.  There is no reason to two xmm regs.

llvm-svn: 48058
2008-03-08 23:43:36 +00:00
Evan Cheng
dba1dfe962 Implement x86 support for @llvm.prefetch. It corresponds to prefetcht{0|1|2} and prefetchnta instructions.
llvm-svn: 48042
2008-03-08 00:58:38 +00:00
Chris Lattner
aa81dc7d21 mark frem as expand for all legal fp types on x86, regardless of whether
we're using SSE or not.  This fixes PR2122.

llvm-svn: 48006
2008-03-07 06:36:32 +00:00
Chris Lattner
a9fcb187af Generalize FP constant shrinking optimization to apply to any vt
except ppc long double.  This allows us to shrink constant pool
entries for x86 long double constants, which in turn allows us to
use flds/fldl instead of fldt.

llvm-svn: 47938
2008-03-05 06:48:13 +00:00
Evan Cheng
e0b3c221ab Add a target lowering hook to control whether it's worthwhile to compress fp constant.
For x86, if sse2 is available, it's not a good idea since cvtss2sd is slower than a movsd load and it prevents load folding. On x87, it's important to shrink fp constant since fldt is very expensive.

llvm-svn: 47931
2008-03-05 01:30:59 +00:00
Evan Cheng
14f556a6d7 Really fix the test.
llvm-svn: 47882
2008-03-04 08:01:56 +00:00
Evan Cheng
7a67175fcc Fix broken test.
llvm-svn: 47881
2008-03-04 07:59:13 +00:00
Evan Cheng
3123d6ced3 Add PR1501 test case.
llvm-svn: 47874
2008-03-04 00:47:45 +00:00
Chris Lattner
299977b5ca Evan implemented these.
llvm-svn: 47828
2008-03-02 18:05:14 +00:00
Evan Cheng
e1d3e0958b Set to default: x86 no longer fold and into test if it has more than one use.
llvm-svn: 47711
2008-02-28 07:46:38 +00:00
Evan Cheng
da92e34fe3 Fix a bug in dead spill slot elimination.
llvm-svn: 47687
2008-02-27 19:57:11 +00:00
Chris Lattner
e51c23341d actually run llc, thanks Dan :)
llvm-svn: 47677
2008-02-27 17:46:54 +00:00
Evan Cheng
295ae42ede Don't track max alignment during stack object allocations since they can be deleted later. Let PEI compute it.
llvm-svn: 47668
2008-02-27 10:04:56 +00:00
Chris Lattner
1f46cc2345 Make X86TargetLowering::LowerSINT_TO_FP return without creating a dead
stack slot and store if the  SINT_TO_FP is actually legal.  This allows
us to compile:

double a(double b) {return (unsigned)b;}

to:

_a:
	cvttsd2siq	%xmm0, %rax
	movl	%eax, %eax
	cvtsi2sdq	%rax, %xmm0
	ret

instead of:

_a:
	subq	$8, %rsp
	cvttsd2siq	%xmm0, %rax
	movl	%eax, %eax
	cvtsi2sdq	%rax, %xmm0
	addq	$8, %rsp
	ret

crazy.

llvm-svn: 47660
2008-02-27 05:57:41 +00:00
Chris Lattner
bc686e546a Compile x86-64-and-mask.ll into:
_test:
	movl	%edi, %eax
	ret

instead of:

_test:
        movl    $4294967295, %ecx
        movq    %rdi, %rax
        andq    %rcx, %rax
        ret

It would be great to write this as a Pat pattern that used subregs 
instead of a 'pseudo' instruction, but I don't know how to do that
in td files.

llvm-svn: 47658
2008-02-27 05:47:54 +00:00
Evan Cheng
7553230e3a Spiller now remove unused spill slots.
llvm-svn: 47657
2008-02-27 03:04:06 +00:00
Evan Cheng
701b6a1dc3 Enable -coalescer-commute-instrs by default.
llvm-svn: 47623
2008-02-26 20:40:22 +00:00
Dan Gohman
8a8f3fe7e0 Avoid aborting on invalid shift counts.
llvm-svn: 47612
2008-02-26 18:50:50 +00:00
Eli Friedman
1f2cabfbcf Fix for pr2093: direct operands aren't necessarily addresses, so don't
try to simplify them.

llvm-svn: 47610
2008-02-26 18:37:49 +00:00
Evan Cheng
8e99554e84 This is possible:
vr1 = extract_subreg vr2, 3
...
vr3 = extract_subreg vr1, 2
The end result is vr3 is equal to vr2 with subidx 2.

llvm-svn: 47592
2008-02-26 08:03:41 +00:00
Evan Cheng
6366bbf577 Fix PR2076. CodeGenPrepare now sinks address computation for inline asm memory
operands into inline asm block.

llvm-svn: 47589
2008-02-26 02:42:37 +00:00
Evan Cheng
d299f09bc5 Rematerialization logic was overly conservative when it comes to loads from fixed stack slots.
llvm-svn: 47529
2008-02-23 03:38:34 +00:00
Evan Cheng
6782480bd1 Update test.
llvm-svn: 47527
2008-02-23 02:57:25 +00:00
Evan Cheng
4e9d5f1ead Remat of pic loads are now on by default.
llvm-svn: 47525
2008-02-23 02:08:30 +00:00
Evan Cheng
7c3a8d0056 Really. Why doesn't every arch support MMX?
llvm-svn: 47513
2008-02-23 00:56:14 +00:00
Evan Cheng
3b35d2a86c Test case for PR2082.
llvm-svn: 47501
2008-02-22 20:38:49 +00:00
Evan Cheng
1b417c4d84 Allow re-materialization of pic load (controlled by -remat-pic-load for now).
llvm-svn: 47476
2008-02-22 09:25:47 +00:00
Chris Lattner
a64d4179d4 copy mmx values from/to memory with GPRs on x86-32
instead of with mmx registers.  This horribleness is apparently
done by gcc to avoid having to insert emms in places that really 
should have it.  This is the second half of rdar://5741668.

llvm-svn: 47474
2008-02-22 05:18:04 +00:00
Chris Lattner
e70bc39d74 Start using GPR's to copy around mmx value instead of mmx regs.
GCC apparently does this, and code depends on not having to do
emms when this happens.  This is x86-64 only so far, second half
should handle x86-32.

rdar://5741668

llvm-svn: 47470
2008-02-22 02:09:43 +00:00
Chris Lattner
4f87f1c087 Treat clobber operands like early clobbers: if we have
any, we force sdisel to do all regalloc for an asm.  This
leads to gross but correct codegen.

This fixes the rest of PR2078.

llvm-svn: 47454
2008-02-21 19:43:13 +00:00
Tanya Lattner
8116db05a6 Remove llvm-upgrade and update tests.
llvm-svn: 47432
2008-02-21 07:42:26 +00:00
Chris Lattner
99b5a37d39 Fix a (harmless) but where vregs were added to the used reg lists for
inline asms.

Fix PR2078 by marking aliases of registers used when a register is 
marked used.  This prevents EAX from being allocated when AX is listed
in the clobber set for the asm.

llvm-svn: 47426
2008-02-21 04:55:52 +00:00
Evan Cheng
33ee06fa48 XFAIL this for now.
llvm-svn: 47355
2008-02-20 02:38:58 +00:00
Chris Lattner
aaafe47a55 this test requires sse2
llvm-svn: 47331
2008-02-19 18:07:46 +00:00
Chris Lattner
3a4ac3a69e Don't fold and's into test instructions if they have multiple uses.
This compiles test-nofold.ll into:

_test:
	movl	$15, %ecx
	andl	4(%esp), %ecx
	testl	%ecx, %ecx
	movl	$42, %eax
	cmove	%ecx, %eax
	ret

instead of:
_test:
	movl	4(%esp), %eax
	movl	%eax, %ecx
	andl	$15, %ecx
	testl	$15, %eax
	movl	$42, %eax
	cmove	%ecx, %eax
	ret

llvm-svn: 47330
2008-02-19 17:37:35 +00:00
Chris Lattner
67f2a6c009 rename tests to avoid a test- prefix when they aren't related to the test instruction.
llvm-svn: 47329
2008-02-19 17:33:52 +00:00
Nick Lewycky
69457748ab Don't spew stats to stderr.
llvm-svn: 47308
2008-02-19 03:11:47 +00:00