1738 Commits

Author SHA1 Message Date
Dan Gohman
a5f382da8b Reapply r143206, with fixes. Disallow physical register lifetimes
across calls, and only check for nested dependences on the special
call-sequence-resource register.

llvm-svn: 143660
2011-11-03 21:49:52 +00:00
Eli Friedman
c60a0ad611 Teach the x86 backend a couple tricks for dealing with v16i8 sra by a constant splat value. Fixes PR11289.
llvm-svn: 143498
2011-11-01 21:18:39 +00:00
Benjamin Kramer
c0001c42c6 X86: Emit logical shift by constant splat of <16 x i8> as a <8 x i16> shift and zero out the bits where zeros should've been shifted in.
llvm-svn: 143315
2011-10-30 17:31:21 +00:00
Nadav Rotem
8282fc9e3b Fix pr11266.
On x86: (shl V, 1) -> add V,V

Hardware support for vector-shift is sparse and in many cases we scalarize the
result. Additionally, on sandybridge padd is faster than shl.

llvm-svn: 143311
2011-10-30 13:24:22 +00:00
Dan Gohman
826cec9a4b Revert r143206, as there are still some failing tests.
llvm-svn: 143262
2011-10-29 00:41:52 +00:00
Dan Gohman
dedcc22bcd Reapply r143177 and r143179 (reverting r143188), with scheduler
fixes: Use a separate register, instead of SP, as the
calling-convention resource, to avoid spurious conflicts with
actual uses of SP. Also, fix unscheduling of calling sequences,
which can be triggered by pseudo-two-address dependencies.

llvm-svn: 143206
2011-10-28 17:55:38 +00:00
Duncan Sands
a6507c4bcb Speculatively disable Dan's commits 143177 and 143179 to see if
it fixes the dragonegg self-host (it looks like gcc is miscompiled).
Original commit messages:
Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.

Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.

Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.

Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.

This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.

Delete #if 0 code accidentally left in.

llvm-svn: 143188
2011-10-28 09:55:57 +00:00
Dan Gohman
484df993bd Eliminate LegalizeOps' LegalizedNodes map and have it just call RAUW
on every node as it legalizes them. This makes it easier to use
hasOneUse() heuristics, since unneeded nodes can be removed from the
DAG earlier.

Make LegalizeOps visit the DAG in an operands-last order. It previously
used operands-first, because LegalizeTypes has to go operands-first, and
LegalizeTypes used to be part of LegalizeOps, but they're now split.
The operands-last order is more natural for several legalization tasks.
For example, it allows lowering code for nodes with floating-point or
vector constants to see those constants directly instead of seeing the
lowered form (often constant-pool loads). This makes some things
somewhat more complicated today, though it ought to allow things to be
simpler in the future. It also fixes some bugs exposed by Legalizing
using RAUW aggressively.

Remove the part of LegalizeOps that attempted to patch up invalid chain
operands on libcalls generated by LegalizeTypes, since it doesn't work
with the new LegalizeOps traversal order. Instead, define what
LegalizeTypes is doing to be correct, and transfer the responsibility
of keeping calls from having overlapping calling sequences into the
scheduler.

Teach the scheduler to model callseq_begin/end pairs as having a
physical register definition/use to prevent calls from having
overlapping calling sequences. This is also somewhat complicated, though
there are ways it might be simplified in the future.

This addresses rdar://9816668, rdar://10043614, rdar://8434668, and others.
Please direct high-level questions about this patch to management.

llvm-svn: 143177
2011-10-28 01:29:32 +00:00
Lang Hames
e8bb71f80d Rename NonScalarIntSafe to something more appropriate.
llvm-svn: 143080
2011-10-26 23:50:43 +00:00
Rafael Espindola
1958dc7193 Fixes an issue reported by -verify-machineinstrs.
Patch by Sanjoy Das.

llvm-svn: 143064
2011-10-26 21:16:41 +00:00
Nadav Rotem
7a79f94aad Fix pr11193.
SHL inserts zeros from the right, thus even when the original
sign_extend_inreg value was of 1-bit, we need to sra.

llvm-svn: 142724
2011-10-22 12:39:25 +00:00
Craig Topper
fd96157f13 Remove intrinsics for X86 BLSI, BLSMSK, and BLSR intrinsics and replace with custom isel lowering code.
llvm-svn: 142642
2011-10-21 06:55:01 +00:00
Evan Cheng
057c12c2a0 Fix TLS lowering bug. The CopyFromReg must be glued to the TLSCALL. rdar://10291355
llvm-svn: 142550
2011-10-19 22:22:54 +00:00
Duncan Sands
2faab7dd2a Fix a bunch of unused variable warnings when doing a release
build with gcc-4.6.

llvm-svn: 142350
2011-10-18 12:44:00 +00:00
Benjamin Kramer
2b5e0cc68a SmallVector -> array
llvm-svn: 142073
2011-10-15 13:28:31 +00:00
Craig Topper
0a11eb1b21 Add X86 ANDN instruction. Including instruction selection.
llvm-svn: 141947
2011-10-14 07:06:56 +00:00
Craig Topper
6b2120a8e1 Add X86 TZCNT instruction and patterns to select it. Also added core-avx2 processor which is gcc's name for Haswell.
llvm-svn: 141939
2011-10-14 03:21:46 +00:00
Bill Wendling
2a571af745 Revert r141854 because it was causing failures:
http://lab.llvm.org:8011/builders/llvm-x86_64-linux/builds/101

--- Reverse-merging r141854 into '.':
U    test/MC/Disassembler/X86/x86-32.txt
U    test/MC/Disassembler/X86/simple-tests.txt
D    test/CodeGen/X86/bmi.ll
U    lib/Target/X86/X86InstrInfo.td
U    lib/Target/X86/X86ISelLowering.cpp
U    lib/Target/X86/X86.td
U    lib/Target/X86/X86Subtarget.h

llvm-svn: 141857
2011-10-13 07:48:07 +00:00
Craig Topper
eb29e18c9b Add X86 TZCNT instruction and patterns to select it. Also added core-avx2 processor which is gcc's name for Haswell.
llvm-svn: 141854
2011-10-13 07:09:14 +00:00
Craig Topper
c498c5c0e6 Add X86 LZCNT instruction. Including instruction selection support.
llvm-svn: 141651
2011-10-11 06:44:02 +00:00
Eli Friedman
7188ba35cb Make sure the X86 backend doesn't explode on 128-bit shuffles in AVX mode. Fixes PR11102.
llvm-svn: 141585
2011-10-10 22:28:47 +00:00
Nadav Rotem
38187aec17 Fix 10892 - When lowering SIGN_EXTEND_INREG do not lower v2i64 because the
instruction set has no 64-bit SRA support.

llvm-svn: 141570
2011-10-10 19:31:45 +00:00
Evan Cheng
99b25c827c High bits of movmskp{s|d} and pmovmskb are known zero. rdar://10247336
llvm-svn: 141371
2011-10-07 17:21:44 +00:00
Eli Friedman
81fc13efd2 PR11033: Make sure we don't generate PCMPGTQ and PCMPEQQ if the target CPU does not support them.
llvm-svn: 140723
2011-09-28 21:00:25 +00:00
Duncan Sands
6d3fe8d11a Implement Chris's suggestion of legalizing the various SSE and AVX
hadd/hsub intrinsics into the new fhadd/fhsub X86 node.

llvm-svn: 140383
2011-09-23 16:10:22 +00:00
Duncan Sands
1da590b589 Synthesize SSE3/AVX 128 bit horizontal add/sub instructions from
floating point add/sub of appropriate shuffle vectors.  Does not
synthesize the 256 bit AVX versions because they work differently.

llvm-svn: 140332
2011-09-22 20:15:48 +00:00
Benjamin Kramer
978ef840ac The SSE version differences for fmin/fmax are more involved than I thought.
- x87: no min or max.
- SSE1: min/max for single precision scalars and vectors.
- SSE2: min/max for single and double precision scalars and vectors.
- AVX: as SSE2, but also supports the wider ymm vectors. (this is covered by the isTypeLegal check)

llvm-svn: 140296
2011-09-22 03:27:22 +00:00
Benjamin Kramer
5844bacf0a X86: Don't form min/max nodes if the target is missing SSE.
llvm-svn: 140294
2011-09-22 03:01:42 +00:00
Nadav Rotem
71bd67ac2e fix comment
llvm-svn: 140258
2011-09-21 17:14:40 +00:00
Nadav Rotem
8fc9d777a3 Insert a sanity check on the combining of x86 truncing-store nodes. This comes to replace the problematic check that was removed in r139995.
llvm-svn: 140246
2011-09-21 08:45:10 +00:00
Richard Trieu
a675de9fac Change:
assert(!"error message");

To:

  assert(0 && "error message");

which is more consistant across the code base.

llvm-svn: 140234
2011-09-21 03:09:09 +00:00
Bruno Cardoso Lopes
035414367a Simplify max/minp[s|d] dagcombine matching
llvm-svn: 140199
2011-09-20 22:34:45 +00:00
Craig Topper
df17f1cc99 Extend changes from r139986 to produce 256-bit AVX minps/minpd/maxps/maxpd.
llvm-svn: 140140
2011-09-20 07:38:59 +00:00
Nadav Rotem
a6af03c6fb Fix typos in my prev commit, found by Tobi.
llvm-svn: 140003
2011-09-18 19:00:23 +00:00
Nadav Rotem
1cfdc59e94 setOperationAction should be done on the return value of the type, not the operands.
llvm-svn: 140001
2011-09-18 14:57:03 +00:00
Nadav Rotem
cfc77bc719 When promoting integer vectors we often create ext-loads. This patch adds a
dag-combine optimization to implement the ext-load efficiently (using shuffles).

For example the type <4 x i8> is stored in memory as i32, but it needs to
find its way into a <4 x i32> register. Previously we scalarized the memory
access, now we use shuffles.

llvm-svn: 139995
2011-09-18 10:39:32 +00:00
Craig Topper
c5a97d12bb Fix typo by changing Lower256IntVETCC to Lower256IntVSETCC.
llvm-svn: 139993
2011-09-18 08:03:58 +00:00
Duncan Sands
4149334f09 Synthesize x86 max/min instructions also for vectors (i.e. produce
maxps and maxpd).  This broke the sse41-blend.ll testcase by causing
maxpd to be produced rather than a cmp+blend pair, which is the reason
I tweaked it.  Gives a small speedup on doduc with dragonegg when the
GCC vectorizer is used.

llvm-svn: 139986
2011-09-17 16:49:39 +00:00
Bruno Cardoso Lopes
8e702bba63 Change all checks regarding the presence of any SSE level to always
take into consideration the presence of AVX. This change, together with
the SSEDomainFix enabled for AVX, makes AVX codegen to always (hopefully)
emit the same code as SSE for 128-bit vector ops. I don't
have a testcase for this, but AVX now beats SSE in performance for
128-bit ops in the majority of programas in the llvm testsuite

llvm-svn: 139817
2011-09-15 18:27:36 +00:00
Eli Friedman
7cb90dcbce Fix the code creating VZEXT_LOAD so that it creates the right memoperand. Issue spotted in -debug output. I can't think of any practical effects at the moment, but it might matter if we start doing more aggressive alias analysis in CodeGen.
llvm-svn: 139758
2011-09-14 23:42:45 +00:00
Bruno Cardoso Lopes
3e6b9661d1 Vector shuffle mask <i32 4, i32 5, i32 2, i32 3> should yield "movsd", not "movss".
llvm-svn: 139686
2011-09-14 02:36:14 +00:00
Bruno Cardoso Lopes
6f299a4937 Revert the remaining part of r139528. According to PR10907 the bug seems
to be in the VSELECT operands order, so I'll leave the fix for Nadav.

llvm-svn: 139624
2011-09-13 19:33:00 +00:00
Nadav Rotem
60df99b809 Add vselect target support for targets that do not support blend but do support
xor/and/or (For example SSE2).

llvm-svn: 139623
2011-09-13 19:17:42 +00:00
Bruno Cardoso Lopes
64e2e852f9 Revert the wrong part of r139528, and fix testcases.
llvm-svn: 139541
2011-09-12 21:24:07 +00:00
Bruno Cardoso Lopes
c67e996fc3 Not sure how CMPPS and CMPPD had already ever worked, I guess it didn't.
However with this fix it does now.

Basically the operand order for the x86 target specific node
is not the same as the instruction, but since the intrinsic need that
specific order at the instruction definition, just change the order
during legalization. Also, there were some wrong invertions of condition
codes, such as GE => LE, GT => LT, fix that too. Fix PR10907.

llvm-svn: 139528
2011-09-12 19:30:40 +00:00
Nadav Rotem
abb5bb41d4 CR fixes per Bruno's request.
Undo the changes from r139285 which added custom lowering to vselect.
Add tablegen lowering for vselect.

llvm-svn: 139479
2011-09-11 15:02:23 +00:00
Eli Friedman
c79e318f02 r139454 activates an assert in a case where we were doing the right thing anyway. Make that explicit, and un-XFAIL the testcase.
llvm-svn: 139458
2011-09-10 02:01:42 +00:00
Richard Trieu
0485e133f2 Fixed an assert from:
assert("not implemented for target shuffle node");

to:

  assert(0 && "not implemented for target shuffle node");

This causes a test failure in CodeGen/X86/palignr.ll which has
been marked as XFAIL for the time being.
Test failure filed at PR10901.

llvm-svn: 139454
2011-09-10 01:26:21 +00:00
Nadav Rotem
ccb46031e6 Implement vector-select support for avx256. Refactor the vblend implementation to have tablegen match the instruction by the node type
llvm-svn: 139400
2011-09-09 20:29:17 +00:00
Nadav Rotem
2f256b7f9f Dix the 80-columns and remove unsupported v8i16 type from the list of legal vselect types.
llvm-svn: 139324
2011-09-08 22:17:35 +00:00