Commit Graph

16272 Commits

Author SHA1 Message Date
Jim Grosbach
0e16cdc5e2 ARM: allow vanilla expressions for movw/movt.
Expressions for movw/movt don't always have an :upper16: or :lower16:
on them and that's ok. When they don't, it's just a plain [0-65536]
immediate result, effectively the same as a :lower16: variant kind.

rdar://10550147

llvm-svn: 155941
2012-05-01 20:43:21 +00:00
Jim Grosbach
a538b5240e MC: Unknown assembler directives are now hard errors.
Previously, an unsupported/unknown assembler directive issued a warning.
That's generally unsafe, and inconsistent with the behaviour of pretty
much every system assembler. Now that the MC assemblers are mature
enough to be the default on multiple targets, it's reasonable to
issue errors for these.

For target or platform directives that need to stay warnings, we
should add explicit handlers for them in, e.g., ELFAsmParser.cpp,
DarwinAsmParser.cpp, et. al., and issue the warning there.

rdar://9246275

llvm-svn: 155926
2012-05-01 18:38:27 +00:00
Manman Ren
2a032bd8f9 X86: optimization for max-like struct
This patch will optimize the following cases on X86
(a > b) ? (a-b) : 0
(a >= b) ? (a-b) : 0
(b < a) ? (a-b) : 0
(b <= a) ? (a-b) : 0

FROM
movl    %edi, %ecx
subl    %esi, %ecx
cmpl    %edi, %esi
movl    $0, %eax
cmovll  %ecx, %eax
TO
xorl    %eax, %eax
subl    %esi, %edi
cmovll  %eax, %edi
movl    %edi, %eax

rdar: 10734411
llvm-svn: 155919
2012-05-01 17:16:15 +00:00
Alexey Samsonov
246af5318a X86: Use StackRegister instead of FrameRegister in getFrameIndexReference (to generate debug info for local variables) if stack needs realignment
llvm-svn: 155917
2012-05-01 15:16:06 +00:00
Jay Foad
5af5a8548b Regression test for PR2960.
llvm-svn: 155912
2012-05-01 11:11:34 +00:00
Nick Lewycky
fd4342c2f1 An instruction in a loop is not guaranteed to be executed just because the loop
has no exit blocks. Fixes PR12706!

llvm-svn: 155884
2012-05-01 04:03:01 +00:00
Lang Hames
26d71c9d0a Add support for llvm.arm.neon.vmull* intrinsics to InstCombine. Fixes
<rdar://problem/11291436>.

This is a second attempt at a fix for this, the first was r155468. Thanks
to Chandler, Bob and others for the feedback that helped me improve this.

llvm-svn: 155866
2012-05-01 00:20:38 +00:00
Manman Ren
0a8b8b491f X86: optimization for -(x != 0)
This patch will optimize -(x != 0) on X86
FROM 
cmpl	$0x01,%edi
sbbl	%eax,%eax
notl	%eax
TO
negl %edi
sbbl %eax %eax

llvm-svn: 155853
2012-04-30 22:51:25 +00:00
Manman Ren
71aa37cdf5 test/CodeGen/X86/select.ll: remove spaces
llvm-svn: 155840
2012-04-30 18:54:27 +00:00
Derek Schuff
85abcc8498 Fix fastcc structure return with fast-isel on x86-32
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.

(this time, actually commit what was reviewed!)

llvm-svn: 155825
2012-04-30 16:57:15 +00:00
Bob Wilson
d2f6ff588b Don't introduce illegal types when creating vmull operations. <rdar://11324364>
ARM BUILD_VECTORs created after type legalization cannot use i8 or i16
operands, since those types are not legal.  Instead use i32 operands, which
will be implicitly truncated by the BUILD_VECTOR to match the element type.

llvm-svn: 155824
2012-04-30 16:53:34 +00:00
Duncan Sands
8fa4166239 Just mark the sign bit as known zero, rather than any other irrelevant bits
known zero in the LHS.  Fixes PR12541.

llvm-svn: 155818
2012-04-30 11:56:58 +00:00
Bill Wendling
5a1a6421ca Second attempt at PR12573:
Allow the "SplitCriticalEdge" function to split the edge to a landing pad. If
the pass is *sure* that it thinks it knows what it's doing, then it may go ahead
and specify that the landing pad can have its critical edge split. The loop
unswitch pass is one of these passes. It will split the critical edges of all
edges coming from a loop to a landing pad not within the loop. Doing so will
retain important loop analysis information, such as loop simplify.

llvm-svn: 155817
2012-04-30 10:44:54 +00:00
Rafael Espindola
314a1a477a Make sure HoistInsertPosition finds a position that is dominated by all
inputs.

llvm-svn: 155809
2012-04-30 03:53:06 +00:00
Andrew Trick
55623eaf5a Reapply 155668: Fix the SD scheduler to avoid gluing the same node twice.
This time, also fix the caller of AddGlue to properly handle
incomplete chains. AddGlue had failure modes, but shamefully hid them
from its caller. It's luck ran out.

Fixes rdar://11314175: BuildSchedUnits assert.

llvm-svn: 155749
2012-04-28 01:03:23 +00:00
Jim Grosbach
92e628a9c2 ARM: Thumb add(sp plus register) asm constraints.
Make sure when parsing the Thumb1 sp+register ADD instruction that
the source and destination operands match. In thumb2, just use the
wide encoding if they don't. In Thumb1, issue a diagnostic.

rdar://11219154

llvm-svn: 155748
2012-04-27 23:51:36 +00:00
Derek Schuff
7fe1fbbe81 Revert r155745
llvm-svn: 155746
2012-04-27 23:37:41 +00:00
Derek Schuff
80bd01f406 Fix fastcc structure return with fast-isel on x86-32
On x86-32, structure return via sret lets the callee pop the hidden
pointer argument off the stack, which the caller then re-pushes.
However if the calling convention is fastcc, then a register is used
instead, and the caller should not adjust the stack. This is
implemented with a check of IsTailCallConvention
X86TargetLowering::LowerCall but is now checked properly in
X86FastISel::DoSelectCall.

llvm-svn: 155745
2012-04-27 23:27:17 +00:00
Andrew Trick
cbe7b03dbe Temporarily revert r155668: Fix the SD scheduler to avoid gluing.
This definitely caused regression with ARM -mno-thumb.

llvm-svn: 155743
2012-04-27 22:55:59 +00:00
Chad Rosier
d627fcbf2a Add x86-specific DAG combine to simplify:
x == -y --> x+y == 0
 x != -y --> x+y != 0

On x86, the generated code goes from
   negl    %esi
   cmpl    %esi, %edi
   je    .LBB0_2
to
   addl    %esi, %edi
   je    .L4

This case is correctly handled for ARM with "cmn".

Patch by Manman Ren.
rdar://11245199
PR12545

llvm-svn: 155739
2012-04-27 22:33:25 +00:00
Evan Cheng
d337b7cc8a Make test less fragile.
llvm-svn: 155732
2012-04-27 20:48:18 +00:00
Hal Finkel
a565d03d78 Don't vectorize target-specific types (ppc_fp128, x86_fp80, etc.).
Target specific types should not be vectorized. As a practical matter,
these types are already register matched (at least in the x86 case),
and codegen does not always work correctly (at least in the ppc case,
and this is not worth fixing because ppc_fp128 is currently broken and
will probably go away soon).

llvm-svn: 155729
2012-04-27 19:34:00 +00:00
Lang Hames
7d83af4ed0 Fix the order of the operands in the llvm.fma intrinsic patterns for ARM,
<rdar://problem/11325085>.

llvm-svn: 155724
2012-04-27 18:51:24 +00:00
Dan Gohman
25a863dcf7 Reapply r155682, making constant folding more consistent, with a fix to work
properly with how the code handles all-undef PHI nodes.

llvm-svn: 155721
2012-04-27 17:50:22 +00:00
Richard Barton
f9237b25e6 Fix ARM assembly parsing for upper case condition codes on IT instructions.
llvm-svn: 155720
2012-04-27 17:34:01 +00:00
Benjamin Kramer
15690164a3 Missed some register numbers.
llvm-svn: 155706
2012-04-27 12:21:46 +00:00
Benjamin Kramer
17378d3a7f Update edis test for r155704.
llvm-svn: 155705
2012-04-27 12:14:03 +00:00
Benjamin Kramer
1380494168 X86: Don't emit conditional floating point moves on when targeting pre-pentiumpro architectures.
* Model FPSW (the FPU status word) as a register.
* Add ISel patterns for the FUCOM*, FNSTSW and SAHF instructions.
* During Legalize/Lowering, build a node sequence to transfer the comparison
result from FPSW into EFLAGS. If you're wondering about the right-shift: That's
an implicit sub-register extraction (%ax -> %ah) which is handled later on by
the instruction selector.

Fixes PR6679. Patch by Christoph Erhardt!

llvm-svn: 155704
2012-04-27 12:07:43 +00:00
NAKAMURA Takumi
a28147f072 Revert r155682, "Use ConstantExpr::getExtractElement when constant-folding vectors"
It broke stage2 build. stage1/clang sometimes crashed.

llvm-svn: 155699
2012-04-27 07:59:20 +00:00
Kostya Serebryany
0cd695bd39 [tsan] Atomic support for ThreadSanitizer, patch by Dmitry Vyukov
llvm-svn: 155698
2012-04-27 07:31:53 +00:00
Craig Topper
b53325d70e Add mcpu to tests to prevent them from using AVX instructions on Sandy Bridge after r155618.
llvm-svn: 155696
2012-04-27 07:11:58 +00:00
Evan Cheng
f35523d08a Implement a bastardized ABI.
llvm-svn: 155686
2012-04-27 02:11:10 +00:00
Evan Cheng
594fb11f12 - thumbv6 shouldn't imply +thumb2. Cortex-M0 doesn't suppport 32-bit Thumb2
instructions.
- However, it does support dmb, dsb, isb, mrs, and msr.
rdar://11331541

llvm-svn: 155685
2012-04-27 01:27:19 +00:00
Dan Gohman
a72b2f97a6 Use ConstantExpr::getExtractElement when constant-folding vectors
instead of getAggregateElement. This has the advantage of being
more consistent and allowing higher-level constant folding to
procede even if an inner extract element cannot be folded.

Make ConstantFoldInstruction call ConstantFoldConstantExpression
on the instruction's operands, making it more consistent with 
ConstantFoldConstantExpression itself. This makes sure that
ConstantExprs get TargetData-aware folding before being handed
off as operands for further folding.

This causes more expressions to be folded, but due to a known
shortcoming in constant folding, this currently has the side effect
of stripping a few more nuw and inbounds flags in the non-targetdata
side of constant-fold-gep.ll. This is mostly harmless.

This fixes rdar://11324230.

llvm-svn: 155682
2012-04-27 00:54:36 +00:00
Chad Rosier
f3d4646377 Add instcombine patterns for the following transformations:
(x & y) | (x ^ y) -> x | y 
 (x & y) + (x ^ y) -> x | y 

Patch by Manman Ren.
rdar://10770603

llvm-svn: 155674
2012-04-26 23:29:14 +00:00
Andrew Trick
1aa00c0baa Fix the SD scheduler to avoid gluing the same node twice.
DAGCombine strangeness may result in multiple loads from the same
offset. They both may try to glue themselves to another load. We could
insist that the redundant loads glue themselves to each other, but the
beter fix is to bail out from bad gluing at the time we detect it.

Fixes rdar://11314175: BuildSchedUnits assert.

llvm-svn: 155668
2012-04-26 21:48:25 +00:00
Tim Northover
876c151146 Use VLD1 in NEON extenting-load patterns instead of VLDR.
On some cores it's a bad idea for performance to mix VFP and NEON instructions
and since these patterns are NEON anyway, the NEON load should be used.

llvm-svn: 155630
2012-04-26 08:46:29 +00:00
Chandler Carruth
587c136c31 Teach the reassociate pass to fold chains of multiplies with repeated
elements to minimize the number of multiplies required to compute the
final result. This uses a heuristic to attempt to form near-optimal
binary exponentiation-style multiply chains. While there are some cases
it misses, it seems to at least a decent job on a very diverse range of
inputs.

Initial benchmarks show no interesting regressions, and an 8%
improvement on SPASS. Let me know if any other interesting results (in
either direction) crop up!

Credit to Richard Smith for the core algorithm, and helping code the
patch itself.

llvm-svn: 155616
2012-04-26 05:30:30 +00:00
Evan Cheng
505263cb36 Specify cpu to unbreak tests.
llvm-svn: 155604
2012-04-26 01:38:10 +00:00
Evan Cheng
4d570a3f0e If triple is armv7 / thumbv7 and a CPU is specified, do not automatically assume
the feature set of v7a. This comes about if the user specifies something like
-arch armv7 -mcpu=cortex-m3. We shouldn't be generating instructions such as
uxtab in this case.

rdar://11318438

llvm-svn: 155601
2012-04-26 01:13:36 +00:00
Jakob Stoklund Olesen
b35c6a6369 Try to fix llvm-arm-linux builder with -mcpu.
llvm-svn: 155589
2012-04-25 21:22:33 +00:00
Preston Gurd
237e411bb4 Trivial change to make the test use -mcpu=generic so as to avoid
a failure if run on an Intel Atom with post RA instruction scheduling.

llvm-svn: 155587
2012-04-25 21:04:54 +00:00
Chandler Carruth
8821a79947 Actually delete now-empty file.
llvm-svn: 155532
2012-04-25 02:30:00 +00:00
Lang Hames
7f69fbca29 Reverting r155468. Chris and Chandler have convinced me that it's dangerous and
in poor taste.

Talking through some alternate solutions with Chandler.

llvm-svn: 155530
2012-04-25 02:16:54 +00:00
Akira Hatanaka
b3ecf903f1 Do not use $gp as a dedicated global register if the target ABI is not O32.
llvm-svn: 155522
2012-04-25 01:24:52 +00:00
Jim Grosbach
7ac2ac85a8 ARM: improved assembler diagnostics for missing CPU features.
When an instruction match is found, but the subtarget features it
requires are not available (missing floating point unit, or thumb vs arm
mode, for example), issue a diagnostic that identifies what the feature
mismatch is.

rdar://11257547

llvm-svn: 155499
2012-04-24 22:40:08 +00:00
Nadav Rotem
3c817bb807 ConstantFoldSelectInstruction swapped the operands of the select.
Fix 12592. Patch by Matt Pharr.

llvm-svn: 155480
2012-04-24 20:18:49 +00:00
Nadav Rotem
62a0eeb276 Fix the testcase. We do expect two vblendw on XMMs.
llvm-svn: 155477
2012-04-24 19:57:38 +00:00
Nadav Rotem
d35ec2e4ad Add a testcase for 155440
llvm-svn: 155475
2012-04-24 19:45:28 +00:00
Evan Cheng
7f9bf43bcf MachineBasicBlock::SplitCriticalEdge() should follow LLVM IR variant and refuse to break edge to EH landing pad. rdar://11300144
llvm-svn: 155470
2012-04-24 19:06:55 +00:00
Lang Hames
08eb5f2340 Add support for llvm.arm.neon.vmull* intrinsics to InstCombine. This fixes
<rdar://problem/11291436>.

llvm-svn: 155468
2012-04-24 18:58:36 +00:00
Chandler Carruth
eb9c5df516 Fix a crash on valid (if UB) bitcode that is produced for some global
constants in C++11 mode. I have no idea why it required such particular
circumstances to get here, the code seems clearly to rely upon unchecked
assumptions.

Specifically, when we decide to form an index into a struct type, we may
have gone through (at least one) zero-length array indexing round, which
would have left the offset un-adjusted, and thus not necessarily valid
for use when indexing the struct type.

This is just an canonicalization step, so the correct thing is to refuse
to canonicalize nonsensical GEPs of this form. Implemented, and test
case added.

Fixes PR12642. Pair debugged and coded with Richard Smith. =] I credit
him with most of the debugging, and preventing me from writing the wrong
code.

llvm-svn: 155466
2012-04-24 18:42:47 +00:00
Kevin Enderby
f954efdbea Add missing test cases for ARM VLD3 (single 3-element structure to all lanes)
instructions.

llvm-svn: 155453
2012-04-24 17:45:56 +00:00
Kevin Enderby
e7378cb42d Add missing test cases for ARM VLD4 (single 4-element structure to all lanes)
instructions.

llvm-svn: 155444
2012-04-24 15:55:00 +00:00
Nadav Rotem
d060c25823 AVX: We lower VECTOR_SHUFFLE and BUILD_VECTOR nodes into vbroadcast instructions
using the pattern (vbroadcast (i32load src)). In some cases, after we generate
this pattern new users are added to the load node, which prevent the selection
of the blend pattern. This commit provides fallback patterns which perform
in-vector broadcast (using in-vector vbroadcast in AVX2 and pshufd on AVX1).

llvm-svn: 155437
2012-04-24 11:07:03 +00:00
Bill Wendling
9f736e7c65 FileCheck-ize tests.
llvm-svn: 155434
2012-04-24 10:45:44 +00:00
Bill Wendling
6824095c62 FileCheck-ize these tests.
llvm-svn: 155433
2012-04-24 10:36:42 +00:00
Bill Wendling
b394f59f17 FileCheck-ize these tests. Harden some of them.
llvm-svn: 155432
2012-04-24 09:15:38 +00:00
Nadav Rotem
c60ef21760 Optimize the vector UINT_TO_FP, SINT_TO_FP and FP_TO_SINT operations where the integer type is i8 (commonly used in graphics).
llvm-svn: 155397
2012-04-23 21:53:37 +00:00
Preston Gurd
0a730de3c3 This patch fixes a problem which arose when using the Post-RA scheduler
on X86 Atom. Some of our tests failed because the tail merging part of
the BranchFolding pass was creating new basic blocks which did not
contain live-in information. When the anti-dependency code in the Post-RA
scheduler ran, it would sometimes rename the register containing
the function return value because the fact that the return value was
live-in to the subsequent block had been lost. To fix this, it is necessary
to run the RegisterScavenging code in the BranchFolding pass.

This patch makes sure that the register scavenging code is invoked
in the X86 subtarget only when post-RA scheduling is being done.
Post RA scheduling in the X86 subtarget is only done for Atom.

This patch adds a new function to the TargetRegisterClass to control
whether or not live-ins should be preserved during branch folding.
This is necessary in order for the anti-dependency optimizations done
during the PostRASchedulerList pass to work properly when doing
Post-RA scheduling for the X86 in general and for the Intel Atom in particular.

The patch adds and invokes the new function trackLivenessAfterRegAlloc()
instead of using the existing requiresRegisterScavenging().
It changes BranchFolding.cpp to call trackLivenessAfterRegAlloc() instead of
requiresRegisterScavenging(). It changes the all the targets that
implemented requiresRegisterScavenging() to also implement
trackLivenessAfterRegAlloc().  

It adds an assertion in the Post RA scheduler to make sure that post RA
liveness information is available when it is needed.

It changes the X86 break-anti-dependencies test to use –mcpu=atom, in order
to avoid running into the added assertion.

Finally, this patch restores the use of anti-dependency checking
(which was turned off temporarily for the 3.1 release) for
Intel Atom in the Post RA scheduler.

Patch by Andy Zhang!

Thanks to Jakob and Anton for their reviews.

llvm-svn: 155395
2012-04-23 21:39:35 +00:00
Jim Grosbach
649ba20f1a ARM: Add testcases for two-operand variants of VSRA/VRSRA/VSRI.
llvm-svn: 155391
2012-04-23 21:00:47 +00:00
Jim Grosbach
2d1db8e4a5 Add ARM mode tests for the NEON vector shift-accumulate tests.
llvm-svn: 155390
2012-04-23 21:00:44 +00:00
Jim Grosbach
41406f1b7a Tidy up. Reformat for ease of reading.
llvm-svn: 155389
2012-04-23 21:00:42 +00:00
Chandler Carruth
9460759e4f Revert r155365, r155366, and r155367. All three of these have regression
test suite failures. The failures occur at each stage, and only get
worse, so I'm reverting all of them.

Please resubmit these patches, one at a time, after verifying that the
regression test suite passes. Never submit a patch without running the
regression test suite.

llvm-svn: 155372
2012-04-23 18:25:57 +00:00
Sirish Pande
9f4844f7da Hexagon V5 (floating point) support.
llvm-svn: 155367
2012-04-23 17:49:40 +00:00
Sirish Pande
4bcbe40295 Support for Hexagon architectural feature, new value jump.
llvm-svn: 155366
2012-04-23 17:49:28 +00:00
Sirish Pande
2230f1957e Support for Hexagon VLIW Packetizer.
llvm-svn: 155365
2012-04-23 17:49:20 +00:00
Jakob Stoklund Olesen
6c1440cf27 Reapply r155136 after fixing PR12599.
Original commit message:

Defer some shl transforms to DAGCombine.

The shl instruction is used to represent multiplication by a constant
power of two as well as bitwise left shifts. Some InstCombine
transformations would turn an shl instruction into a bit mask operation,
making it difficult for later analysis passes to recognize the
constsnt multiplication.

Disable those shl transformations, deferring them to DAGCombine time.
An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'.

These transformations are deferred:

  (X >>? C) << C   --> X & (-1 << C)  (When X >> C has multiple uses)
  (X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2)   (When C2 > C1)
  (X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2)  (When C1 > C2)

The corresponding exact transformations are preserved, just like
div-exact + mul:

  (X >>?,exact C) << C   --> X
  (X >>?,exact C1) << C2 --> X << (C2-C1)
  (X >>?,exact C1) << C2 --> X >>?,exact (C1-C2)

The disabled transformations could also prevent the instruction selector
from recognizing rotate patterns in hash functions and cryptographic
primitives. I have a test case for that, but it is too fragile.

llvm-svn: 155362
2012-04-23 17:39:52 +00:00
Elena Demikhovsky
587ea8d3fc cleaned line endings in the newly added test file
llvm-svn: 155315
2012-04-22 13:22:48 +00:00
Chandler Carruth
a4f8aa5231 Tidy up this test more:
1) Make the checked assertions a bit more precise. We really want the
   canonical forms coming out of reassociate to be exactly what is
   expected.
2) Remove other passes, and switch the test to actually directly check
   that reassociate makes the important transforms and
   canonicalizations.
3) Fold in a related test case now that we're using FileCheck. Make the
   same tidying changes to it.

llvm-svn: 155311
2012-04-22 10:11:26 +00:00
Chandler Carruth
038c36b06c FileCheck-ize a test, and tidy it up a touch.
llvm-svn: 155310
2012-04-22 10:11:23 +00:00
Elena Demikhovsky
35721fc4f8 ZERO_EXTEND/SIGN_EXTEND/TRUNCATE optimization for AVX2
llvm-svn: 155309
2012-04-22 09:39:03 +00:00
Nadav Rotem
97bbbe3368 Teach getVectorTypeBreakdown about promotion of vectors in addition to widening of vectors.
llvm-svn: 155296
2012-04-21 20:08:32 +00:00
Jakob Stoklund Olesen
adfc8212cf Fix PR12599.
The X86 target is editing the selection DAG while isel is selecting
nodes following a topological ordering. When the DAG hacking triggers
CSE, nodes can be deleted and bad things happen.

llvm-svn: 155257
2012-04-20 23:36:09 +00:00
Jim Grosbach
e33d0c7063 ARM: Update NEON assembly two-operand aliases.
Use the new TwoOperandAliasConstraint to handle lots of the two-operand aliases
for NEON instructions. There's still more to go, but this is a good chunk of
them.

llvm-svn: 155210
2012-04-20 18:12:54 +00:00
Manuel Klimek
dc83827995 Removes json-bench from the test dependencies.
llvm-svn: 155197
2012-04-20 13:45:49 +00:00
Jakob Stoklund Olesen
3d22f26e88 Revert r155136 "Defer some shl transforms to DAGCombine."
While the patch was perfect and defect free, it exposed a really nasty
bug in X86 SelectionDAG that caused an llc crash when compiling lencod.

I'll put the patch back in after fixing the SelectionDAG problem.

llvm-svn: 155181
2012-04-20 00:38:45 +00:00
Jim Grosbach
c935649d5c ARM some VFP tblgen'erated two-operand aliases.
llvm-svn: 155178
2012-04-20 00:15:00 +00:00
Jim Grosbach
4a63ad2ce9 Tidy up. Formatting.
llvm-svn: 155177
2012-04-20 00:14:57 +00:00
Dan Gohman
f4472e9a1f Avoid a bug in the path count computation, preventing an infinite
loop repeatedlt making the same change. This is for rdar://11256239.

llvm-svn: 155160
2012-04-19 21:50:46 +00:00
Joel Jones
7e8e679676 Test for the the problem with xors being changed into ands
when the set bits aren't the same for both args of the xor.
This transformation is in the function TargetLowering::SimplifyDemandedBits
in the file lib/CodeGen/SelectionDAG/TargetLowering.cpp.

I have tested this test using a previous version of llc which the defect and 
the a version of llc which does not. I got the expected fail and pass, 
respectively.

This test goes with rdar://11195364 and the check in with the fix: svn r154955

llvm-svn: 155156
2012-04-19 20:54:44 +00:00
Michael J. Spencer
e6c28a171e Remove llvm-ld and llvm-stub (which is only used by llvm-ld).
llvm-ld is no longer useful and causes confusion and so it is being removed.

* Does not work very well on Windows because it must call a gcc like driver to
  assemble and link.
* Has lots of hard coded paths which are wrong on many systems.
* Does not understand most of ld's options.
* Can be partially replaced by llvm-link | opt | {llc | as, llc -filetype=obj} |
  ld, or fully replaced by Clang.

I know of no production use of llvm-ld, and hacking use should be
replaced by Clang's driver.

llvm-svn: 155147
2012-04-19 19:27:54 +00:00
Jakob Stoklund Olesen
1507d20c57 Defer some shl transforms to DAGCombine.
The shl instruction is used to represent multiplication by a constant
power of two as well as bitwise left shifts. Some InstCombine
transformations would turn an shl instruction into a bit mask operation,
making it difficult for later analysis passes to recognize the
constsnt multiplication.

Disable those shl transformations, deferring them to DAGCombine time.
An 'shl X, C' instruction is now treated mostly the same was as 'mul X, C'.

These transformations are deferred:

  (X >>? C) << C   --> X & (-1 << C)  (When X >> C has multiple uses)
  (X >>? C1) << C2 --> X << (C2-C1) & (-1 << C2)   (When C2 > C1)
  (X >>? C1) << C2 --> X >>? (C1-C2) & (-1 << C2)  (When C1 > C2)

The corresponding exact transformations are preserved, just like
div-exact + mul:

  (X >>?,exact C) << C   --> X
  (X >>?,exact C1) << C2 --> X << (C2-C1)
  (X >>?,exact C1) << C2 --> X >>?,exact (C1-C2)

The disabled transformations could also prevent the instruction selector
from recognizing rotate patterns in hash functions and cryptographic
primitives. I have a test case for that, but it is too fragile.

llvm-svn: 155136
2012-04-19 16:46:26 +00:00
Jakob Stoklund Olesen
3377b8ac85 Extract the broken part of XFAILed test into its own file.
llvm-svn: 155081
2012-04-19 00:20:38 +00:00
Jakob Stoklund Olesen
0ad7ee539b FileCheckize
llvm-svn: 155010
2012-04-18 17:01:26 +00:00
Jakob Stoklund Olesen
330da655a7 Nobody likes shifty instructions, but that was a bit strong.
llvm-svn: 155009
2012-04-18 16:44:44 +00:00
Silviu Baranga
f810ee56fb Added support for disassembling unpredictable swp/swpb ARM instructions.
llvm-svn: 155004
2012-04-18 14:18:57 +00:00
Silviu Baranga
2bbf74b42f Fix the bahavior of the disassembler when decoding unpredictable mrs instructions on ARM. Now the diasassembler emmits warnings instead of errors.
llvm-svn: 155002
2012-04-18 14:09:07 +00:00
Silviu Baranga
82d7afd0d2 Added support for unpredictable mcrr/mcrr2/mrrc/mrrc2 ARM instruction in the disassembler. Since the upredicability conditions are complex, C++ code was added to handle them.
llvm-svn: 155001
2012-04-18 13:12:50 +00:00
Silviu Baranga
8e0ebc8ed7 Fixed decoding for the ARM cdp2 instruction. The restriction on the coprocessor number was removed for this instruction.
llvm-svn: 155000
2012-04-18 13:02:55 +00:00
Silviu Baranga
2ab693789b Add suport for unpredicatble cases of the cmp, tst, teq and cmnz ARM instructions in the disassembler.
llvm-svn: 154999
2012-04-18 12:48:43 +00:00
Joe Groff
1674d0c68d FileCheckify, un-XFAIL SimplifyLibCalls/floor test
Fixes build on MSVC

llvm-svn: 154970
2012-04-18 00:36:07 +00:00
Joe Groff
aff3c9d60d Move win32 SimplifyLibcall test under Transforms
llvm-svn: 154967
2012-04-18 00:07:45 +00:00
Joe Groff
cc9c07aacc fix pr12559: mark unavailable win32 math libcalls
also fix SimplifyLibCalls to use TLI rather than compile-time conditionals to enable optimizations on floor, ceil, round, rint, and nearbyint

llvm-svn: 154960
2012-04-17 23:05:54 +00:00
Akira Hatanaka
ecb1cd1ce4 Add disassembler to MIPS.
Patch by Vladimir Medic. 

llvm-svn: 154935
2012-04-17 18:03:21 +00:00
Benjamin Kramer
cfdd3323eb Force cmov on test so block placement doesn't shuffle the code around.
This made the test fail with -mcpu=generic (when building on a non-x86 host).

llvm-svn: 154926
2012-04-17 13:55:23 +00:00
James Molloy
44927f5296 Fix bad EXTRACT_SUBREG in instruction selection for extending-loads on NEON.
llvm-svn: 154915
2012-04-17 08:18:00 +00:00
Benjamin Kramer
550faddc94 Revert "SCEV: When expanding a GEP the final addition to the base pointer has NUW but not NSW."
This isn't right either, reverting for now.

llvm-svn: 154910
2012-04-17 06:33:57 +00:00
Andrew Trick
3c9809f34d Test cases that assume layout should use -disable-code-place.
llvm-svn: 154908
2012-04-17 06:20:42 +00:00
Kevin Enderby
d64ba28e41 Fix ARM disassembly of VLD2 (single 2-element structure to all lanes)
instructions with writebacks. And add test a case for all opcodes handed by
DecodeVLD2DupInstruction() in ARMDisassembler.cpp .

llvm-svn: 154884
2012-04-17 00:49:27 +00:00
Preston Gurd
02674f0df7 temporarily XFAIL this test until post RA
live-ins is properly enabled.

llvm-svn: 154882
2012-04-17 00:21:35 +00:00
Chandler Carruth
92ef7d8f3e Disable the atom scheduling test after r154874 broke it.
llvm-svn: 154877
2012-04-16 23:11:39 +00:00
Jim Grosbach
13a45d88e5 ARM two-operand forms for vhadd and vhsub instructions.
rdar://11252521

llvm-svn: 154875
2012-04-16 23:00:25 +00:00
Chandler Carruth
5d550b546e Relax this test a touch to cope with different assembly variants.
llvm-svn: 154870
2012-04-16 22:20:48 +00:00
Chandler Carruth
5780b826b0 Fix updateTerminator to be resiliant to degenerate terminators where
both fallthrough and a conditional branch target the same successor.
Gracefully delete the conditional branch and introduce any unconditional
branch needed to reach the actual successor. This fixes memory
corruption in 2009-06-15-RegScavengerAssert.ll and possibly other tests.

Also, while I'm here fix a latent bug I spotted by inspection. I never
applied the same fundamental fix to this fallthrough successor finding
logic that I did to the logic used when there are no conditional
branches. As a consequence it would have selected landing pads had they
be aligned in just the right way here. I don't have a test case as
I spotted this by inspection, and the previous time I found this
required have of TableGen's source code to produce it. =/ I hate backend
bugs. ;]

Thanks to Jim Grosbach for helping me reason through this and reviewing
the fix.

llvm-svn: 154867
2012-04-16 22:03:00 +00:00
Jim Grosbach
9e97ef84db MC assembly parser handling for trailing comma in macro instantiation.
A trailing comma means no argument at all (i.e., as if the comma were not
present), not an empty argument to the invokee.

rdar://11252521

llvm-svn: 154863
2012-04-16 21:18:49 +00:00
Jakob Stoklund Olesen
8bea8a2373 FileCheckize these tests.
Add an extra test to ldr_post with an immediate increment.

llvm-svn: 154859
2012-04-16 20:56:42 +00:00
Jakob Stoklund Olesen
8d6641a2b2 Disable code placement for this test.
It makes it less sensitive to small changes in heuristics.

llvm-svn: 154857
2012-04-16 20:49:06 +00:00
Duncan Sands
518668bd76 Remove support for the special 'fast' value for fpmath accuracy for the moment.
llvm-svn: 154850
2012-04-16 19:39:33 +00:00
Richard Smith
971d090cbb Fix incorrect atomics codegen introduced in r154705, and extend test to catch it.
llvm-svn: 154845
2012-04-16 18:43:53 +00:00
Akira Hatanaka
04dee7c9fc This patch fixes 3 problems:
1. CHECKNEXT was used instead of CHECK-NEXT which caused the line to be
   ignored which in turn hid the next 2 problems:
2. ('sh_offset', 0x{{{[0-9,a-f]+}}) had one too many leading curly braces and
   failed to do it's job of accepting all hex digits and:
3. The check for the hex values for the code instructions didn't account for
   blank separators.

Patch by Jack Carter. 

llvm-svn: 154842
2012-04-16 18:20:26 +00:00
Jim Grosbach
b6c95c9f42 ARM assembly two-operand forms for VRSHL.
rdar://11252521

llvm-svn: 154840
2012-04-16 18:03:16 +00:00
Jim Grosbach
5da9e32405 Tidy up. Test formatting.
llvm-svn: 154839
2012-04-16 18:03:14 +00:00
Akira Hatanaka
0f31530336 Do not add offset in applyFixup. This has already been accounted for in Value.
llvm-svn: 154838
2012-04-16 18:00:19 +00:00
Jim Grosbach
d961988871 ARM two-operand aliases for VRHADD instructions.
rdar://11252521

llvm-svn: 154832
2012-04-16 17:14:11 +00:00
Jim Grosbach
33a32d0d92 Tidy up. Testcase formatting.
llvm-svn: 154831
2012-04-16 17:14:07 +00:00
Bill Wendling
a16a178883 Move to X86 directory because this fails on non-X86 platforms.
llvm-svn: 154825
2012-04-16 16:38:48 +00:00
Duncan Sands
f61d49df40 Make it possible to indicate relaxed floating point requirements at the IR level
through the use of 'fpmath' metadata.  Currently this only provides a 'fpaccuracy'
value, which may be a number in ULPs or the keyword 'fast', however the intent is
that this will be extended with additional information about NaN's, infinities
etc later.  No optimizations have been hooked up to this so far.

llvm-svn: 154822
2012-04-16 16:28:59 +00:00
Chandler Carruth
728acc9bd9 Flip the new block-placement pass to be on by default.
This is mostly to test the waters. I'd like to get results from FNT
build bots and other bots running on non-x86 platforms.

This feature has been pretty heavily tested over the last few months by
me, and it fixes several of the execution time regressions caused by the
inlining work by preventing inlining decisions from radically impacting
block layout.

I've seen very large improvements in yacr2 and ackermann benchmarks,
along with the expected noise across all of the benchmark suite whenever
code layout changes. I've analyzed all of the regressions and fixed
them, or found them to be impossible to fix. See my email to llvmdev for
more details.

I'd like for this to be in 3.1 as it complements the inliner changes,
but if any failures are showing up or anyone has concerns, it is just
a flag flip and so can be easily turned off.

I'm switching it on tonight to try and get at least one run through
various folks' performance suites in case SPEC or something else has
serious issues with it. I'll watch bots and revert if anything shows up.

llvm-svn: 154816
2012-04-16 13:49:17 +00:00
Chandler Carruth
20d3a870c6 Remove an overly brittle test. This test will no longer be interesting
once we start changing the block layout, so just nuke it. If anyone has
ideas about how to craft a code layout agnostic form of the test please
let me know.

llvm-svn: 154815
2012-04-16 13:49:09 +00:00
Chandler Carruth
fbb6219d5b Add a somewhat hacky heuristic to do something different from whole-loop
rotation. When there is a loop backedge which is an unconditional
branch, we will end up with a branch somewhere no matter what. Try
placing this backedge in a fallthrough position above the loop header as
that will definitely remove at least one branch from the loop iteration,
where whole loop rotation may not.

I haven't seen any benchmarks where this is important but loop-blocks.ll
tests for it, and so this will be covered when I flip the default.

llvm-svn: 154812
2012-04-16 13:33:36 +00:00
Richard Barton
9e62efdf5f Add -disassemble support for -show-inst and -show-encode capability llvm-mc. Also refactor so all MC paraphernalia are created once for all uses as much as possible.
The test change is to account for the fact that the default disassembler behaviour has changed with regards to specifying the assembly syntax to use.

llvm-svn: 154809
2012-04-16 11:32:10 +00:00
Chandler Carruth
33b200ad13 Tweak the loop rotation logic to check whether the loop is naturally
laid out in a form with a fallthrough into the header and a fallthrough
out of the bottom. In that case, leave the loop alone because any
rotation will introduce unnecessary branches. If either side looks like
it will require an explicit branch, then the rotation won't add any, do
it to ensure the branch occurs outside of the loop (if possible) and
maximize the benefit of the fallthrough in the bottom.

llvm-svn: 154806
2012-04-16 09:31:23 +00:00
Hal Finkel
457fbe481c Remove dead SD nodes after the combining pass. Fixes PR12201.
llvm-svn: 154786
2012-04-16 03:33:22 +00:00
Chandler Carruth
fc5ab5d388 Rewrite how machine block placement handles loop rotation.
This is a complex change that resulted from a great deal of
experimentation with several different benchmarks. The one which proved
the most useful is included as a test case, but I don't know that it
captures all of the relevant changes, as I didn't have specific
regression tests for each, they were more the result of reasoning about
what the old algorithm would possibly do wrong. I'm also failing at the
moment to craft more targeted regression tests for these changes, if
anyone has ideas, it would be welcome.

The first big thing broken with the old algorithm is the idea that we
can take a basic block which has a loop-exiting successor and a looping
successor and use the looping successor as the layout top in order to
get that particular block to be the bottom of the loop after layout.
This happens to work in many cases, but not in all.

The second big thing broken was that we didn't try to select the exit
which fell into the nearest enclosing loop (to which we exit at all). As
a consequence, even if the rotation worked perfectly, it would result in
one of two bad layouts. Either the bottom of the loop would get
fallthrough, skipping across a nearer enclosing loop and thereby making
it discontiguous, or it would be forced to take an explicit jump over
the nearest enclosing loop to earch its successor. The point of the
rotation is to get fallthrough, so we need it to fallthrough to the
nearest loop it can.

The fix to the first issue is to actually layout the loop from the loop
header, and then rotate the loop such that the correct exiting edge can
be a fallthrough edge. This is actually much easier than I anticipated
because we can handle all the hard parts of finding a viable rotation
before we do the layout. We just store that, and then rotate after
layout is finished. No inner loops get split across the post-rotation
backedge because we check for them when selecting the rotation.

That fix exposed a latent problem with our exitting block selection --
we should allow the backedge to point into the middle of some inner-loop
chain as there is no real penalty to it, the whole point is that it
*won't* be a fallthrough edge. This may have blocked the rotation at all
in some cases, I have no idea and no test case as I've never seen it in
practice, it was just noticed by inspection.

Finally, all of these fixes, and studying the loops they produce,
highlighted another problem: in rotating loops like this, we sometimes
fail to align the destination of these backwards jumping edges. Fix this
by actually walking the backwards edges rather than relying on loopinfo.

This fixes regressions on heapsort if block placement is enabled as well
as lots of other cases where the previous logic would introduce an
abundance of unnecessary branches into the execution.

llvm-svn: 154783
2012-04-16 01:12:56 +00:00
Craig Topper
788250eec1 Remove AVX2 vpermq and vpermpd intrinsics. These can now be handled with normal shuffle vectors.
llvm-svn: 154778
2012-04-15 22:43:31 +00:00
Nadav Rotem
2a4e2ef10c Fix PR12529. The Vxx family of instructions are only supported by AVX.
Use non-vex instructions for SSE4.

llvm-svn: 154770
2012-04-15 19:36:44 +00:00
Nadav Rotem
b8710ee43f When emulating vselect using OR/AND/XOR make sure to bitcast the result back to the original type.
llvm-svn: 154764
2012-04-15 15:08:09 +00:00
Elena Demikhovsky
92fb3e613e Added VPERM optimization for AVX2 shuffles
llvm-svn: 154761
2012-04-15 11:18:59 +00:00
Duncan Sands
40d080e3b7 Rename "fpaccuracy" metadata to the more generic "fpmath". That's because I'm
thinking of generalizing it to be able to specify other freedoms beyond accuracy
(such as that NaN's don't have to be respected).  I'd like the 3.1 release (the
first one with this metadata) to have the more generic name already rather than
having to auto-upgrade it in 3.2.

llvm-svn: 154744
2012-04-14 12:36:06 +00:00
Hal Finkel
028d6e153e Fix an error in BBVectorize important for vectorizing pointer types.
When vectorizing pointer types it is important to realize that potential
pairs cannot be connected via the address pointer argument of a load or store.
This is because even after vectorization, the address is still a scalar because
the address of the higher half of the pair is implicit from the address of the
lower half (it need not be, and should not be, explicitly computed).

llvm-svn: 154735
2012-04-14 07:32:50 +00:00
Hal Finkel
c55edb7b35 Enhance BBVectorize to more-properly handle pointer values and vectorize GEPs.
llvm-svn: 154734
2012-04-14 07:32:43 +00:00
Richard Smith
d5004a79d9 Fix X86 codegen for 'atomicrmw nand' to generate *x = ~(*x & y), not *x = ~*x & y.
llvm-svn: 154705
2012-04-13 22:47:00 +00:00
Hal Finkel
12b4c41203 Add support to BBVectorize for vectorizing selects.
llvm-svn: 154700
2012-04-13 20:45:45 +00:00
Evan Cheng
3499593c7e On Darwin targets, only use vfma etc. if the source use fma() intrinsic explicitly.
llvm-svn: 154689
2012-04-13 18:59:28 +00:00
Dan Gohman
d5743c7fd0 Consider ObjC runtime calls objc_storeWeak and others which make a copy of
their argument as "escape" points for objc_retainBlock optimization.
This fixes rdar://11229925.

llvm-svn: 154682
2012-04-13 18:28:58 +00:00
Sylvestre Ledru
53db93eead Catch the Python exception when subprocess.Popen is failing.
For example, if llc cannot be found, the full python stacktrace is displayed
and no interesting information are provided.
+ fail the process when an exception occurs

llvm-svn: 154665
2012-04-13 11:22:18 +00:00
Dan Gohman
81ac0c921f Use the new Use-aware dominates method to apply the objc runtime
library return value optimization for phi uses. Even when the
phi itself is not dominated, the specific use may be dominated.

llvm-svn: 154647
2012-04-13 01:08:28 +00:00
Dan Gohman
6a5b02f8ee Don't move objc_autorelease calls past autorelease pool boundaries when
optimizing autorelease calls on phi nodes with null operands.
This fixes rdar://11207070.

llvm-svn: 154642
2012-04-13 00:59:57 +00:00
Sirish Pande
04c82d35b9 Disable Hexagon test temporarily.
There is an assert at line 558 in ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA).
This assert needs to addressed for post RA scheduler. Until that assert is addressed,
any passes that uses post ra scheduler will fail. So, I am temporarily disabling the
hexagon tests until that fix is in.

The assert is as follows:
    assert(!MI->isTerminator() && !MI->isLabel() &&
               "Cannot schedule terminators or labels!");

llvm-svn: 154617
2012-04-12 21:06:54 +00:00
Preston Gurd
6e9bcca355 This patch improves the MCJIT runtime dynamic loader by adding new handling
of zero-initialized sections, virtual sections and common symbols
and preventing the loading of sections which are not required for
execution such as debug information.

Patch by Andy Kaylor!

llvm-svn: 154610
2012-04-12 20:13:57 +00:00
Craig Topper
448790d566 Fix 128-bit ptest intrinsics to take v2i64 instead of v4f32 since these are integer instructions.
llvm-svn: 154580
2012-04-12 07:23:00 +00:00
Akira Hatanaka
103a1edc4d Revert changes that were accidentally committed.
llvm-svn: 154563
2012-04-11 23:19:55 +00:00
Akira Hatanaka
991d556243 Fix string that is being checked.
llvm-svn: 154547
2012-04-11 23:11:33 +00:00
Akira Hatanaka
48dbb62cb1 Emit neg.s or neg.d only if -enable-no-nans-fp-math is supplied by user,
otherwise expand FNEG during legalization.

llvm-svn: 154546
2012-04-11 22:59:08 +00:00
Akira Hatanaka
11a442d515 Emit abs.s or abs.d only if -enable-no-nans-fp-math is supplied by user.
Invalid operation is signaled if the operand of these instructions is NaN.

llvm-svn: 154545
2012-04-11 22:49:04 +00:00
Kevin Enderby
64c95fb56a Fixed a case of ARM disassembly getting an assert on a bad encoding
of a VST instruction.

llvm-svn: 154544
2012-04-11 22:40:17 +00:00
Akira Hatanaka
6636922675 Fix bugs in lowering of FCOPYSIGN nodes.
- FCOPYSIGN nodes that have operands of different types were not handled.
- Different code was generated depending on the endianness of the target.

Additionally, code is added that emits INS and EXT instructions, if they are
supported by target (they are R2 instructions).

llvm-svn: 154540
2012-04-11 22:13:04 +00:00
Jim Grosbach
86b5cd7421 ARM 'vuzp.32 Dd, Dm' is a pseudo-instruction.
While there is an encoding for it in VUZP, the result of that is undefined,
so we should avoid it. Define the instruction as a pseudo for VTRN.32
instead, as the ARM ARM indicates.

rdar://11222366

llvm-svn: 154511
2012-04-11 17:40:18 +00:00
Jim Grosbach
e54b48cd74 ARM 'vzip.32 Dd, Dm' is a pseudo-instruction.
While there is an encoding for it in VZIP, the result of that is undefined,
so we should avoid it. Define the instruction as a pseudo for VTRN.32
instead, as the ARM ARM indicates.

rdar://11221911

llvm-svn: 154505
2012-04-11 16:53:25 +00:00
Evan Cheng
f138fb4599 Add more fused mul+add/sub patterns. rdar://10139676
llvm-svn: 154484
2012-04-11 06:59:47 +00:00
Nadav Rotem
c922b4f2a3 Reapply 154396 after fixing a test.
Original message:
Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendV uses a register for the selection while Vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154483
2012-04-11 06:40:27 +00:00
Evan Cheng
f9baff015d Clean up ARM fused multiply + add/sub support some more: rename some isel
predicates.
Also remove NEON2 since it's not really useful and it is confusing. If
NEON + VFP4 implies NEON2 but NEON2 doesn't imply NEON + VFP4, what does it
really mean?

rdar://10139676

llvm-svn: 154480
2012-04-11 05:33:07 +00:00
Evan Cheng
b5291aea18 Match (fneg (fma) to vfnma. rdar://10139676
llvm-svn: 154469
2012-04-11 01:21:25 +00:00
Charles Davis
a5e1970cd0 Add retw and lretw instructions. Also, fix Intel syntax parsing for all
ret instructions.

llvm-svn: 154468
2012-04-11 01:10:53 +00:00
Evan Cheng
eaf8eba8c4 Merge fma.ll into fusedMAC.ll
llvm-svn: 154466
2012-04-11 01:03:11 +00:00
Kevin Enderby
304e4812bc Fix ARM disassembly of VLD instructions with writebacks.  And add test a case
for all opcodes handed by DecodeVLDInstruction() in ARMDisassembler.cpp .

llvm-svn: 154459
2012-04-11 00:25:40 +00:00
Jim Grosbach
b10b1b22cb ARM add missing Thumb1 two-operand aliases for shift-by-immediate.
rdar://11222742

llvm-svn: 154457
2012-04-11 00:15:16 +00:00
Evan Cheng
12bfe1150d Fix a number of problems with ARM fused multiply add/subtract instructions.
1. The new instruction itinerary entries are not properly described.
2. The asm parser can't handle vfms and vfnms.
3. There were no assembler, disassembler test cases.
4. HasNEON2 has the wrong assembler predicate.
rdar://10139676

llvm-svn: 154456
2012-04-11 00:13:00 +00:00
Jakob Stoklund Olesen
4f44c26f15 Fix test to be register assignment invariant.
llvm-svn: 154453
2012-04-11 00:00:24 +00:00
Owen Anderson
a8319713a4 Move the constant-folding support for FP_ROUND in SelectionDAG from the one-operand version of getNode() to the two-operand version, since it became a two-operand node at sound point.
Zap a testcase that this allows us to completely fold away.

llvm-svn: 154447
2012-04-10 22:46:53 +00:00
Kostya Serebryany
3047a70ed9 [tsan] two more compile-time optimizations:
- don't isntrument reads from constant globals.
Saves ~1.5% of instrumented instructions on CPU2006
(counting static instructions, not their execution).
- don't insrument reads from vtable (which is a global constant too).
Saves ~5%.

I did not measure the run-time impact of this,
but it is certainly non-negative.

llvm-svn: 154444
2012-04-10 22:29:17 +00:00
Evan Cheng
f9617f7f54 Handle llvm.fma.* intrinsics. rdar://10914096
llvm-svn: 154439
2012-04-10 21:40:28 +00:00
Duncan Sands
6d360055c5 Add a comment noting that the fdiv -> fmul conversion won't generate
multiplication by a denormal, and some tests checking that.

llvm-svn: 154431
2012-04-10 20:35:27 +00:00
Eric Christopher
f8886e8f48 Temporarily revert this patch to see if it brings the buildbots back.
llvm-svn: 154425
2012-04-10 19:33:16 +00:00
Kostya Serebryany
01d463472d [tsan] compile-time instrumentation: do not instrument a read if
a write to the same temp follows in the same BB.
Also add stats printing.

On Spec CPU2006 this optimization saves roughly 4% of instrumented reads
(which is 3% of all instrumented accesses):
Writes            : 161216
Reads             : 446458
Reads-before-write: 18295

llvm-svn: 154418
2012-04-10 18:18:56 +00:00
Eric Christopher
ec1405e930 To ensure that we have more accurate line information for a block
don't elide the branch instruction if it's the only one in the block,
otherwise it's ok.

PR9796 and rdar://11215207

llvm-svn: 154417
2012-04-10 18:18:10 +00:00
Jim Grosbach
d32f050f68 ARM fix cc_out operand handling for t2SUBrr instructions.
We were incorrectly conflating some add variants which don't have a
cc_out operand with the mirroring sub encodings, which do. Part of the
awesome non-orthogonality legacy of thumb1. Similarly, handling of
add/sub of an immediate was sometimes incorrectly removing the cc_out
operand for add/sub register variants.

rdar://11216577

llvm-svn: 154411
2012-04-10 17:31:55 +00:00
Nadav Rotem
74f87a6bd8 Modify the code that lowers shuffles to blends from using blendvXX to vblendXX.
blendv uses a register for the selection while vblend uses an immediate.
On sandybridge they still have the same latency and execute on the same execution ports.

llvm-svn: 154396
2012-04-10 14:33:13 +00:00
Anton Korobeynikov
0fc5fe0430 Transform div to mul with reciprocal only when fp imm is legal.
This fixes PR12516 and uncovers one weird problem in legalize (workarounded)

llvm-svn: 154394
2012-04-10 13:22:49 +00:00
Duncan Sands
f25460b85f Express the number of ULPs in fpaccuracy metadata as a real rather than a
rational number, eg as 2.5 rather than 5, 2.  OK'd by Peter Collingbourne.

llvm-svn: 154387
2012-04-10 08:22:43 +00:00
Andrew Trick
7230fee696 Fix 12513: Loop unrolling breaks with indirect branches.
Take this opportunity to generalize the indirectbr bailout logic for
loop transformations. CFG transformations will never get indirectbr
right, and there's no point trying.

llvm-svn: 154386
2012-04-10 05:14:42 +00:00
Evan Cheng
d9ff163215 Add proper checks.
llvm-svn: 154379
2012-04-10 03:15:42 +00:00
Evan Cheng
5825e9dbf5 Fix a long standing tail call optimization bug. When a libcall is emitted
legalizer always use the DAG entry node. This is wrong when the libcall is
emitted as a tail call since it effectively folds the return node. If
the return node's input chain is not the entry (i.e. call, load, or store)
use that as the tail call input chain.

PR12419
rdar://9770785
rdar://11195178

llvm-svn: 154370
2012-04-10 01:51:00 +00:00
Rafael Espindola
9febd1fbf7 Don't try to zExt just to check if an integer constant is zero, it might
not fit in a i64.

llvm-svn: 154364
2012-04-10 00:16:22 +00:00
Lang Hames
800642b224 Test case for PR12495.
llvm-svn: 154359
2012-04-09 23:58:59 +00:00
Akira Hatanaka
1b46e841a2 Have TargetLowering::getPICJumpTableRelocBase return a node that points to the
GOT if jump table uses 64-bit gp-relative relocation.

llvm-svn: 154341
2012-04-09 20:32:12 +00:00
Chad Rosier
a588421976 When performing a truncating store, it's possible to rearrange the data
in-register, such that we can use a single vector store rather then a 
series of scalar stores.

For func_4_8 the generated code

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vmov.u16	r0, d16[3]
	strb	r0, [r2, #3]
	vmov.u16	r0, d16[2]
	strb	r0, [r2, #2]
	vmov.u16	r0, d16[1]
	strb	r0, [r2, #1]
	vmov.u16	r0, d16[0]
	strb	r0, [r2]
	bx	lr

becomes

	vldr	d16, LCPI0_0
	vmov	d17, r0, r1
	vadd.i16	d16, d17, d16
	vuzp.8	d16, d17
	vst1.32	{d16[0]}, [r2, :32]
	bx	lr

I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll,
but I couldn't think of a way to judiciously apply this combine.

This

	ldrh	r0, [r0, #4]
	strh	r0, [r1]

becomes

	vldr	d16, [r0]
	vmov.u16	r0, d16[2]
	vmov.32	d16[0], r0
	vuzp.16	d16, d17
	vst1.32	{d16[0]}, [r1, :32]

PR11158
rdar://10703339

llvm-svn: 154340
2012-04-09 20:32:02 +00:00
Rafael Espindola
6b7bf4d0aa Pattern match a setcc of boolean value with 0 as a truncate.
llvm-svn: 154322
2012-04-09 16:06:03 +00:00
Nadav Rotem
9f7f17826e Lower some x86 shuffle sequences to the vblend family of instructions.
llvm-svn: 154313
2012-04-09 08:33:21 +00:00
Nadav Rotem
4499fb1d50 Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type.
Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering.

llvm-svn: 154310
2012-04-09 07:45:58 +00:00
Chandler Carruth
bb1db0e66a Cleanup and relax a restriction on the matching of global offsets into
x86 addressing modes. This allows PIE-based TLS offsets to fit directly
into an addressing mode immediate offset, which is the last remaining
code quality issue from PR12380. With this patch, that PR is completely
fixed.

To understand why this patch is correct to match these offsets into
addressing mode immediates, break it down by cases:
1) 32-bit is trivially correct, and unmodified here.
2) 64-bit non-small mode is unchanged and never matches.
3) 64-bit small PIC code which is RIP-relative is handled specially in
   the match to try to fit RIP into the base register. If it fails, it
   now early exits. This behavior is unchanged by the patch.
4) 64-bit small non-PIC code which is not RIP-relative continues to work
   as it did before. The reason these immediates are safe is because the
   ABI ensures they fit in small mode. This behavior is unchanged.
5) 64-bit small PIC code which is *not* using RIP-relative addressing.
   This is the only case changed by the patch, and the primary place you
   see it is in TLS, either the win64 section offset TLS or Linux
   local-exec TLS model in a PIC compilation. Here the ABI again ensures
   that the immediates fit because we are in small mode, and any other
   operations required due to the PIC relocation model have been handled
   externally to the Wrapper node (extra loads etc are made around the
   wrapper node in ISelLowering).

I've tested this as much as I can comparing it with GCC's output, and
everything appears safe. I discussed this with Anton and it made sense
to him at least at face value. That said, if there are issues with PIC
code after this patch, yell and we can revert it.

llvm-svn: 154304
2012-04-09 02:13:06 +00:00
Chandler Carruth
c29528d66b Fold 15 tiny test cases into a single file that implements the
comprehensive testing of TLS codegen for x86. Convert all of the ones
that were still using grep to use FileCheck. Remove some redundancies
between them.

Perhaps most interestingly expand the test cases so that they actually
fully list the instruction snippet being tested. TLS operations are
*very* narrowly defined, and so these seem reasonably stable. More
importantly, the existing test cases already were crazy fine grained,
expecting specific registers to be allocated. This just clarifies that
no *other* instructions are expected, and fills in some crucial gaps
that weren't being tested at all.

This will make any subsequent changes to TLS much more clear during
review.

llvm-svn: 154303
2012-04-09 01:43:17 +00:00
Duncan Sands
28b9aa998e Only have codegen turn fdiv by a constant into fmul by the reciprocal
when -ffast-math, i.e. don't just always do it if the reciprocal can
be formed exactly.  There is already an IR level transform that does
that, and it does it more carefully.

llvm-svn: 154296
2012-04-08 18:08:12 +00:00
Chandler Carruth
11c412fd2c Teach LLVM about a PIE option which, when enabled on top of PIC, makes
optimizations which are valid for position independent code being linked
into a single executable, but not for such code being linked into
a shared library.

I discussed the design of this with Eric Christopher, and the decision
was to support an optional bit rather than a completely separate
relocation model. Fundamentally, this is still PIC relocation, its just
that certain optimizations are only valid under a PIC relocation model
when the resulting code won't be in a shared library. The simplest path
to here is to expose a single bit option in the TargetOptions. If folks
have different/better designs, I'm all ears. =]

I've included the first optimization based upon this: changing TLS
models to the *Exec models when PIE is enabled. This is the LLVM
component of PR12380 and is all of the hard work.

llvm-svn: 154294
2012-04-08 17:51:45 +00:00
Chandler Carruth
b3fb4be360 Teach InstCombine to nuke a common alloca pattern -- an alloca which has
GEPs, bit casts, and stores reaching it but no other instructions. These
often show up during the iterative processing of the inliner, SROA, and
DCE. Once we hit this point, we can completely remove the alloca. These
were actually showing up in the final, fully optimized code in a bunch
of inliner tests I've been working on, and notably they show up after
LLVM finishes optimizing away all function calls involved in
hash_combine(a, b).

llvm-svn: 154285
2012-04-08 14:36:56 +00:00
Nadav Rotem
8957364ae5 AVX2: Build splat vectors by broadcasting a scalar from the constant pool.
Previously we used three instructions to broadcast an immediate value into a
vector register.
On Sandybridge we continue to load the broadcasted value from the constant pool.

llvm-svn: 154284
2012-04-08 12:54:54 +00:00
Bill Wendling
d36ce9b17b Remove old 'grep' lines.
llvm-svn: 154283
2012-04-08 11:53:54 +00:00
Bill Wendling
90c6d27935 FileCheckize these testcases.
llvm-svn: 154281
2012-04-08 11:00:38 +00:00
Nadav Rotem
37734277f0 1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new
shuffle node because it could introduce new shuffle nodes that were not
   supported efficiently by the target.

2. Add a more restrictive shuffle-of-shuffle optimization for cases where the
   second shuffle reverses the transformation of the first shuffle.

llvm-svn: 154266
2012-04-07 21:19:08 +00:00
Duncan Sands
cd52f3d447 Convert floating point division by a constant into multiplication by the
reciprocal if converting to the reciprocal is exact.  Do it even if inexact
if -ffast-math.  This substantially speeds up ac.f90 from the polyhedron
benchmarks.

llvm-svn: 154265
2012-04-07 20:04:00 +00:00
Chandler Carruth
2817fc1e53 Fix ValueTracking to conclude that debug intrinsics are safe to
speculate. Without this, loop rotate (among many other places) would
suddenly stop working in the presence of debug info. I found this
looking at loop rotate, and have augmented its tests with a reduction
out of a very hot loop in yacr2 where failing to do this rotation costs
sometimes more than 10% in runtime performance, perturbing numerous
downstream optimizations.

This should have no impact on performance without debug info, but the
change in performance when debug info is enabled can be extreme. As
a consequence (and this how I got to this yak) any profiling of
performance problems should be treated with deep suspicion -- they may
have been wildly innacurate of debug info was enabled for profiling. =/
Just a heads up.

llvm-svn: 154263
2012-04-07 19:22:18 +00:00
Benjamin Kramer
a690750db9 SCEV: When expanding a GEP the final addition to the base pointer has NUW but not NSW.
Found by inspection.

llvm-svn: 154262
2012-04-07 17:19:26 +00:00
Alexis Hunt
03ad83efbd Make the test for r154235 more platform-independent with a shorter
string.

llvm-svn: 154243
2012-04-07 01:33:14 +00:00
Alexis Hunt
5c14769849 Output UTF-8-encoded characters as identifier characters into assembly
by default.

This is a behaviour configurable in the MCAsmInfo. I've decided to turn
it on by default in (possibly optimistic) hopes that most assemblers are
reasonably sane. If this proves a problem, switching to default seems
reasonable.

I'm not sure if this is the opportune place to test, but it seemed good
to make sure it was tested somewhere.

llvm-svn: 154235
2012-04-07 00:37:53 +00:00
Akira Hatanaka
5cce394620 Add lines in global-address.ll to test N32 and N64 code generation.
llvm-svn: 154202
2012-04-06 20:23:36 +00:00
Jakob Stoklund Olesen
bb7b631def Allow negative immediates in ARM and Thumb2 compares.
ARM and Thumb2 mode can use cmn instructions to compare against negative
immediates. Thumb1 mode can't.

llvm-svn: 154183
2012-04-06 17:45:04 +00:00
Chandler Carruth
020a15db9d Sink the collection of return instructions until after *all*
simplification has been performed. This is a bit less efficient
(requires another ilist walk of the basic blocks) but shouldn't matter
in practice. More importantly, it's just too much work to keep track of
all the various ways the return instructions can be mutated while
simplifying them. This fixes yet another crasher, reported by Daniel
Dunbar.

llvm-svn: 154179
2012-04-06 17:21:31 +00:00
Chandler Carruth
352f98dd1e Tweak this test to ensure the inliner did indeed fire. Thanks to Richard
Smith for pointing this out in review.

llvm-svn: 154178
2012-04-06 17:21:28 +00:00
Craig Topper
40ac46c3d7 Test case for PR12413
llvm-svn: 154172
2012-04-06 14:38:25 +00:00