31429 Commits

Author SHA1 Message Date
Jingyue Wu
c4aff77ec0 [SeparateConstOffsetFromGEP] sext(a)+sext(b) => sext(a+b) when a+b can't sign-overflow.
Summary:
This patch implements my promised optimization to reunites certain sexts from
operands after we extract the constant offset. See the header comment of
reuniteExts for its motivation.

One key building block that enables this optimization is Bjarke's poison value
analysis (D11212). That helps to prove "a +nsw b" can't overflow.

Reviewers: broune

Subscribers: jholewinski, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12016

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@245003 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-14 02:02:05 +00:00
Alex Lorenz
5d09c2f25d MIR Serialization: Change MIR syntax - use custom syntax for MBBs.
This commit modifies the way the machine basic blocks are serialized - now the
machine basic blocks are serialized using a custom syntax instead of relying on
YAML primitives. Instead of using YAML mappings to represent the individual
machine basic blocks in a machine function's body, the new syntax uses a single
YAML block scalar which contains all of the machine basic blocks and
instructions for that function.

This is an example of a function's body that uses the old syntax:

    body:
      - id: 0
        name: entry
        instructions:
          - '%eax = MOV32r0 implicit-def %eflags'
          - 'RETQ %eax'
    ...

The same body is now written like this:

    body: |
      bb.0.entry:
        %eax = MOV32r0 implicit-def %eflags
        RETQ %eax
    ...

This syntax change is motivated by the fact that the bundled machine
instructions didn't map that well to the old syntax which was using a single
YAML sequence to store all of the machine instructions in a block. The bundled
machine instructions internally use flags like BundledPred and BundledSucc to
determine the bundles, and serializing them as MI flags using the old syntax
would have had a negative impact on the readability and the ease of editing
for MIR files. The new syntax allows me to serialize the bundled machine
instructions using a block construct without relying on the internal flags,
for example:

   BUNDLE implicit-def dead %itstate, implicit-def %s1 ... {
      t2IT 1, 24, implicit-def %itstate
      %s1 = VMOVS killed %s0, 1, killed %cpsr, implicit killed %itstate
   }

This commit also converts the MIR testcases to the new syntax. I developed
a script that can convert from the old syntax to the new one. I will post the
script on the llvm-commits mailing list in the thread for this commit.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244982 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 23:10:16 +00:00
Ahmed Bougacha
7409d94d2d [AArch64] Provide "too few operands" diags on short-form NEON also.
We used to just say "invalid type suffix for instruction", which is
misleading. This is because we fallback to the long-form matcher if the
short-form matcher failed, losing the error information on the way.

Save it, so that we can provide a little better diagnostics when the
long-form matcher thinks a suffix is the cause of the error.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244955 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 21:09:13 +00:00
Alex Lorenz
1b93706717 MIR Parser: Don't allow negative alignments for memory operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244953 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 20:55:01 +00:00
Davide Italiano
d0a074c824 [SimplifyLibCalls] Correctly set the is_zero_undef flag for llvm.cttz
If <src> is non-zero we can safely set the flag to true, and this
results in less code generated for, e.g. ffs(x) + 1 on FreeBSD.
Thanks to majnemer for suggesting the fix and reviewing.

Code generated before the patch was applied:


 0:   0f bc c7                bsf    %edi,%eax
 3:   b9 20 00 00 00          mov    $0x20,%ecx
 8:   0f 45 c8                cmovne %eax,%ecx
 b:   83 c1 02                add    $0x2,%ecx
 e:   b8 01 00 00 00          mov    $0x1,%eax
13:   85 ff                   test   %edi,%edi
15:   0f 45 c1                cmovne %ecx,%eax
18:   c3                      retq

Code generated after the patch was applied:

 0:   0f bc cf                bsf    %edi,%ecx
 3:   83 c1 02                add    $0x2,%ecx
 6:   85 ff                   test   %edi,%edi
 8:   b8 01 00 00 00          mov    $0x1,%eax
 d:   0f 45 c1                cmovne %ecx,%eax
10:   c3                      retq

It seems we can still use cmove and save another 'test' instruction, but
that can be tackled separately.

Differential Revision:  http://reviews.llvm.org/D11989	


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244947 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 20:34:26 +00:00
Simon Pilgrim
d85f3b303d [X86][SSE] Tests for SMAX/SMIN/UMAX/UMIN vector instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244944 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 20:31:03 +00:00
Jingyue Wu
a49e11047c [SeparateConstOffsetFromGEP] strengthen the inbounds attribute
We used to be over-conservative about preserving inbounds. Actually, the second
GEP (which applies the constant offset) can inherit the inbounds attribute of
the original GEP, because the resultant pointer is equivalent to that of the
original GEP. For example,

  x  = GEP inbounds a, i+5
    =>
  y = GEP a, i               // inbounds removed
  x = GEP inbounds y, 5      // inbounds preserved



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244937 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 18:48:49 +00:00
Nemanja Ivanovic
31f6eee816 Scalar to vector conversions using direct moves
This patch corresponds to review:
http://reviews.llvm.org/D11471

It improves the code generated for converting a scalar to a vector value. With
direct moves from GPRs to VSRs, we no longer require expensive stack operations
for this. Subsequent patches will handle the reverse case and more general
operations between vectors and their scalar elements.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244921 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 17:40:44 +00:00
Igor Laevsky
ea56ef761a Emit argmemonly attribute for intrinsics.
Differential Revision: http://reviews.llvm.org/D11352



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244920 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 17:40:04 +00:00
James Molloy
a215ac72ef [ARM] Rejig vmax tests a bit
They rely on global fast-math options, but soon ISel will rely only on fast-math flags on the instructions themselves. Rip the fast checks out into their own file so we can mark their instructions as fast.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244914 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 17:28:16 +00:00
James Molloy
8eafed2468 [AArch64] Small rejig of fmax tests, NFCI.
These tests relied on -enable-no-nans-fp-math, whereas soon they'll take their no-nans hint
from the FCMP instruction itself, so split the no-nans stuff out into its own test.

Also do a slight rejig of instruction order. The old FMIN/MAX backend matching had to deal with looking through casts, which it never did particularly well. Now, instcombine will recognize such patterns and canonicalize the cast outside the select. So modify the test inputs to assume that instcombine has already run.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244913 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 17:28:10 +00:00
Erik Eckstein
22af77d94f [DeadStoreElimination] remove a redundant store even if the load is in a different block.
DeadStoreElimination does eliminate a store if it stores a value which was loaded from the same memory location.
So far this worked only if the store is in the same block as the load.
Now we can also handle stores which are in a different block than the load.
Example:

define i32 @test(i1, i32*) {
entry:
  %l2 = load i32, i32* %1, align 4
  br i1 %0, label %bb1, label %bb2
bb1:
  br label %bb3
bb2:
  ; This store is redundant
  store i32 %l2, i32* %1, align 4
  br label %bb3
bb3:
  ret i32 0
}

Differential Revision: http://reviews.llvm.org/D11854



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244901 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 15:36:11 +00:00
Petar Jovanovic
319eb435fd [mips][mcjit] Calculate correct addend for HI16 and PCHI16 reloc
Previously, for O32 ABI we did not calculate correct addend for R_MIPS_HI16
and R_MIPS_PCHI16 relocations. This patch fixes that.

Patch by Vladimir Radosavljevic.

Differential Revision: http://reviews.llvm.org/D11186


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244897 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 15:12:49 +00:00
Joseph Tremoulet
ccc0cf3d8d [WinEHPrepare] Update demotion logic
Summary:
Update the demotion logic in WinEHPrepare to avoid creating new cleanups by
walking predecessors as necessary to insert stores for EH-pad PHIs.

Also avoid creating stores for EH-pad PHIs that have no uses.

The store/load placement is still pretty naive.  Likely future improvements
(at least for optimized compiles) include:
 - Share loads for related uses as possible
 - Coalesce non-interfering use/def-related PHIs
 - Store at definition point rather than each PHI pred for non-interfering
   lifetimes.


Reviewers: rnk, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D11955

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244894 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 14:30:10 +00:00
Ulrich Weigand
69585fd51b [SystemZ] Support large LLVM IR struct return values
Recent mesa/llvmpipe crashes on SystemZ due to a failed assertion when
attempting to compile a routine with a return type of
  { <4 x float>, <4 x float>, <4 x float>, <4 x float> }
on a system without vector instruction support.

This is because after legalizing the vector type, we get a return value
consisting of 16 floats, which cannot all be returned in registers.

Usually, what should happen in this case is that the target's CanLowerReturn
routine rejects the return type, in which case SelectionDAG falls back to
implementing a structure return in memory via implicit reference.

However, the SystemZ target never actually implemented any CanLowerReturn
routine, and thus would accept any struct return type.

This patch fixes the crash by implementing CanLowerReturn.  As a side effect,
this also handles fp128 return values, fixing a todo that was noted in
SystemZCallingConv.td.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244889 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 13:37:06 +00:00
Charlie Turner
99985c9fac [InstCombinePHI] Partial simplification of identity operations.
Consider this code:

BB:
  %i = phi i32 [ 0, %if.then ], [ %c, %if.else ]
  %add = add nsw i32 %i, %b
  ...

In this common case the add can be moved to the %if.else basic block, because
adding zero is an identity operation. If we go though %if.then branch it's
always a win, because add is not executed; if not, the number of instructions
stays the same.

This pattern applies also to other instructions like sub, shl, shr, ashr | 0,
mul, sdiv, div | 1.

Patch by Jakub Kuderski!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244887 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 12:38:58 +00:00
John Brawn
4d88daed01 [ARM] Reorganise and simplify thumb-1 load/store selection
Other than PC-relative loads/store the patterns that match the various
load/store addressing modes have the same complexity, so the order that they
are matched is the order that they appear in the .td file.

Rearrange the instruction definitions in ARMInstrThumb.td, and make use of
AddedComplexity for PC-relative loads, so that the instruction matching order
is the order that results in the simplest selection logic. This also makes
register-offset load/store be selected when it should, as previously it was
only selected for too-large immediate offsets.

Differential Revision: http://reviews.llvm.org/D11800


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244882 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 10:48:22 +00:00
Simon Pilgrim
335fc61873 [InstCombine] SSE/AVX vector shifts demanded shift amount bits
Most SSE/AVX (non-constant) vector shift instructions only use the lower 64-bits of the 128-bit shift amount vector operand, this patch calls SimplifyDemandedVectorElts to optimize for this.

I had to refactor some of my recent InstCombiner work on the vector shifts to avoid quite a bit of duplicate code, it means that SimplifyX86immshift now (re)decodes the type of shift.

Differential Revision: http://reviews.llvm.org/D11938

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244872 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 07:39:03 +00:00
Ahmed Bougacha
5689f67ef7 [CodeGen] Mark the promoted FCOPYSIGN result FP_ROUND as TRUNCating.
Now that we can properly promote mismatched FCOPYSIGNs (r244858), we
can mark the FP_ROUND on the result as truncating, to expose folding.

FCOPYSIGN doesn't change anything but the sign bit, so
  (fp_round (fcopysign (fpext a), b))
is equivalent to (modulo the sign bit):
  (fp_round (fpext a))
which is a no-op.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244862 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 01:32:30 +00:00
Ahmed Bougacha
c5b90eb284 [AArch64] Cleanup vector-fcopysign.ll test. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244861 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 01:20:38 +00:00
Ahmed Bougacha
2e22177214 [AArch64] Also custom-lowering mismatched vector/f16 FCOPYSIGN.
We can lower them using our cool tricks if we fpext/fptrunc the second
input, like we do for f32/f64.

Follow-up to r243924, r243926, and r244858.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244860 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 01:13:56 +00:00
Ahmed Bougacha
39d772ac64 [CodeGen] When Promoting, don't extend the 2nd FCOPYSIGN operand.
We don't care about its type, and there's even a combine that'll fold
away the FP_EXTEND if we let it run. However, until it does, we'll have
something broken like:
  (f32 (fp_extend (f64 v)))

Scalar f16 follow-up to r243924.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244858 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-13 01:09:43 +00:00
Philip Reames
bf09064f46 [RewriteStatepointsForGC] Avoid using unrelocated pointers after safepoints
To be clear: this is an *optimization* not a correctness change.

CodeGenPrep likes to duplicate icmps feeding branch instructions to take advantage of x86's ability to fuze many comparison/branch patterns into a single micro-op and to reduce the need for materializing i1s into general registers. PlaceSafepoints likes to place safepoint polls right at the end of basic blocks (immediately before terminators) when inserting entry and backedge safepoints. These two heuristics interact in a somewhat unfortunate way where the branch terminating the original block will be controlled by a condition driven by unrelocated pointers. This forces the register allocator to keep both the relocated and unrelocated values of the pointers feeding the icmp alive over the safepoint poll.

One simple fix would have been to just adjust PlaceSafepoints to move one back in the basic block, but you can reach similar cases as a result of LICM or other hoisting passes. As a result, doing a post insertion fixup seems to be more robust.

I considered doing this in CodeGenPrep itself, but having to update the live sets of already rewritten safepoints gets complicated fast. In particular, you can't just use def/use information because by moving the icmp, we're extending the live range of it's inputs potentially.

Instead, this patch teaches RewriteStatepointsForGC to make the required adjustments before making the relocations explicit in the IR. This change really highlights the fact that RSForGC is a CodeGenPrep-like pass which is performing target specific lowering. In the long run, we may even want to combine the two though this would require a lot more smarts to be integrated into RSForGC first. We currently rely on being able to run a set of cleanup passes post rewriting because the IR RSForGC generates is pretty damn ugly.

Differential Revision: http://reviews.llvm.org/D11819



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244821 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 22:11:45 +00:00
Alex Lorenz
79faadca28 MIR Parser: Allow the MI IR references to reference global values.
This commit fixes a bug where MI parser couldn't resolve the named IR
references that referenced named global values.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244817 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 21:27:16 +00:00
Alex Lorenz
c338a581fd MIR Serialization: Serialize the fixed stack pseudo source values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244816 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 21:23:17 +00:00
Alex Lorenz
710eecab5d MIR Serialization: Serialize the jump table pseudo source values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244813 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 21:11:08 +00:00
Alex Lorenz
6b8e62f3f5 MIR Serialization: Serialize the GOT pseudo source values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244809 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 21:00:22 +00:00
Philip Reames
5119cb9b04 [RewriteStatepointsForGC] Handle extractelement fully in the base pointer algorithm
When rewriting the IR such that base pointers are available for every live pointer, we potentially need to duplicate instructions to propagate the base. The original code had only handled PHI and Select under the belief those were the only instructions which would need duplicated. When I added support for vector instructions, I'd added a collection of hacks for ExtractElement which caught most of the common cases. Of course, I then found the one test case my hacks couldn't cover. :)

This change removes all of the early hacks for extract element. By defining extractelement as a BDV (rather than trying to look through it), we can extend the rewriting algorithm to duplicate the extract as needed.  Note that a couple of peephole optimizations were left in for the moment, because while we now handle extractelement as a first class citizen, we're not yet handling insertelement.  That change will follow in the near future.  



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244808 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 21:00:20 +00:00
Alex Lorenz
3f0c495bbb MIR Serialization: Serialize the stack pseudo source values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244806 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 20:44:16 +00:00
Alex Lorenz
ad20340006 MIR Serialization: Serialize the constant pool pseudo source values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244803 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 20:33:26 +00:00
JF Bastien
5f53aadd9c WebAssembly: floating-point comparisons
Summary:
D11924 implemented part of the floating-point comparisons, this patch implements the rest:
 * Tell ISelLowering that all booleans are either 0 or 1.
 * Expand the eq/ne/lt/le/gt/ge floating-point comparisons to the canonical ones (similar to what Mips32r6InstrInfo.td does).
 * Add tests for ord/uno.
 * Add tests for ueq/one/ult/ule/ugt/uge.
 * Fix existing comparison tests to remove the (res & 1) code, which setBooleanContents stops from generating.

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11970

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244779 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 17:53:29 +00:00
Simon Pilgrim
c29cf29e62 Cleaned up test. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244765 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 17:00:50 +00:00
John Brawn
1843d3de5d Redo "Make global aliases have symbol size equal to their type"
r242520 was reverted in r244313 as the expected behaviour of the alias
attribute in C is that the alias has the same size as the aliasee. However
we can re-introduce adding the size on the alias when the aliasee does not,
from a source code or object perspective, exist as a discrete entity. This
happens when the aliasee is not a symbol, or when that symbol is private.

Differential Revision: http://reviews.llvm.org/D11943


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244752 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 15:05:39 +00:00
John Brawn
1a5ed7be24 [GlobalMerge] Only emit aliases for internal linkage variables for non-Mach-O
On Mach-O emitting aliases for the variables that make up a MergedGlobals
variable can cause problems when linking with dead stripping enabled so don't
do that, except for external variables where we must emit an alias.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244748 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 13:36:48 +00:00
Zoran Jovanovic
f7d96a06fb [mips][microMIPS] Create microMIPS64r6 subtarget and implement DALIGN, DAUI, DAHI, DATI, DEXT, DEXTM and DEXTU instructions
Differential Revision: http://reviews.llvm.org/D10923


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244744 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 12:45:16 +00:00
Michael Kuperstein
2fa1b0f1fc [X86] Disable mul -> shl + lea combine when compiling for minsize
Differential Revision: http://reviews.llvm.org/D11904

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244740 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 11:27:26 +00:00
Davide Italiano
6a7d7ba152 [MC] Convert the last test using macho-dump under X86/ to llvm-readobj.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244732 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 10:36:16 +00:00
Michael Kuperstein
426921ffc7 [X86] Allow x86 call frame optimization to fold more loads into pushes
This abstracts away the test for "when can we fold across a MachineInstruction"
into the the MI interface, and changes call-frame optimization use the same test
the peephole optimizer users.

Differential Revision: http://reviews.llvm.org/D11945

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244729 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 10:14:58 +00:00
Matt Arsenault
892803aa81 AMDGPU: Fix assert on dbg_value instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244728 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 09:04:44 +00:00
Simon Pilgrim
49c9300c28 [InstCombine] Move SSE/AVX vector blend folding to instcombiner
As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely).

InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask).

I also moved all the relevant combine tests into InstCombine/blend_x86.ll

Differential Revision: http://reviews.llvm.org/D11934

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244723 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 08:08:56 +00:00
Sanjay Patel
f5a6019248 [x86] enable machine combiner reassociations for 256-bit vector FP mul/add
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244705 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 00:29:10 +00:00
Adam Nemet
212ab28e02 [LoopDist] Add test for missing coverage
Add a testcase to ensure that if we can't find bounds for a necessary
memcheck we don't distribute.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244703 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-12 00:21:59 +00:00
Adam Nemet
cf1d982d01 [LAA] Fix typo in test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244690 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-11 23:03:09 +00:00
Mark Heffernan
67278b1774 Use 32-bit divides instead of 64-bit divides where possible.
For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor
fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason
is that many index computations are carried out in 64-bits but never actually exceed the
capacity of a 32-bit word.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244684 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-11 22:16:34 +00:00
Paul Robinson
bbf32bd275 Make DW_AT_[MIPS_]linkage_name optional, and off by default for SCE.
Mangled "linkage" names can be huge, and if the debugger (or other
tools) have no use for them, the size savings can be very impressive
(on the order of 40%).

Add one test for controlling behavior, and modify a number of tests to
either stop using linkage names, or make llc emit them (so these tests
will still run when the default triple is for PS4).

Differential Revision: http://reviews.llvm.org/D11374


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244678 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-11 21:36:45 +00:00
Sanjoy Das
162f046004 Fix PR24354.
`InstCombiner::OptimizeOverflowCheck` was asserting an
invariant (operands to binary operations are ordered by decreasing
complexity) that wasn't really an invariant.  Fix this by instead having
`InstCombiner::OptimizeOverflowCheck` establish the invariant if it does
not hold.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244676 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-11 21:33:55 +00:00
JF Bastien
06a1f0e6d8 WebAssembly: implement comparison.
Some of the FP comparisons (ueq, one, ult, ule, ugt, uge) are currently broken, I'll fix them in a follow-up.

Reviewers: sunfish

Subscribers: llvm-commits, jfb

Differential Revision: http://reviews.llvm.org/D11924

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244665 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-11 21:02:46 +00:00
Sanjay Patel
fb4d106de1 [x86] enable machine combiner reassociations for 128-bit vector single/double multiplies
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244657 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-11 20:19:23 +00:00
Sanjay Patel
dec9fb9abd fix minsize detection: minsize attribute implies optimizing for size
Also, add a test for optsize because this was not part of any existing regression test.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244651 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-11 19:39:36 +00:00
Jingyue Wu
d1ff8a7f6e SelectionDAG: Prefer to combine multiplication with less uses for fma
Summary:
For example:

  s6 = s0*s5;
  s2 = s6*s6 + s6;
  ...
  s4 = s6*s3;

We notice that it is possible for s2 is folded to fma (s0, s5, fmul (s6 s6)).
This only happens when Aggressive is true, otherwise hasOneUse() check
already prevents from folding the multiplication with more uses.

Test Plan: test/CodeGen/NVPTX/fma-assoc.ll

Patch by Xuetian Weng

Reviewers: hfinkel, apazos, jingyue, ohsallen, arsenm

Subscribers: arsenm, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D11855

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@244649 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-11 19:21:46 +00:00