38101 Commits

Author SHA1 Message Date
Sean Silva
d8c90ea6b8 [PM] Port LoopUnroll.
We just set PreserveLCSSA to always true since we don't have an
analogous method `mustPreserveAnalysisID(LCSSA)`.

Also port LoopInfo verifier pass to test LoopUnrollPass.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276063 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 23:54:23 +00:00
Justin Lebar
85ee7e9487 [LSV] Insert stores at the right point.
Summary:
Previously, the insertion point for stores was the last instruction in
Chain *before calling getVectorizablePrefixEndIdx*.  Thus if
getVectorizablePrefixEndIdx didn't return Chain.size(), we still would
insert at the last instruction in Chain.

This patch changes our internal API a bit in an attempt to make it less
prone to this sort of error.  As a result, we end up recalculating the
Chain's boundary instructions, but I think worrying about the speed hit
of this is a premature optimization right now.

Reviewers: asbirlea, tstellarAMD

Subscribers: mzolotukhin, arsenm, llvm-commits

Differential Revision: https://reviews.llvm.org/D22534

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276056 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 23:19:20 +00:00
Justin Lebar
b2c31893e0 [LSV] Add detail to correct-order.ll test.
Summary:
This helps keep us honest -- there were a number of ways we could screw
up and still have passed this test.

Reviewers: asbirlea

Subscribers: llvm-commits, arsenm

Differential Revision: https://reviews.llvm.org/D22531

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276053 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 23:18:59 +00:00
Matt Arsenault
63be72069d AMDGPU: Change fdiv lowering based on !fpmath metadata
If 2.5 ulp is acceptable, denormals are not required, and
isn't a reciprocal which will already be handled, replace
with a faster fdiv.

Simplify the lowering tests by using per function
subtarget features.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276051 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 23:16:53 +00:00
Paul Robinson
9ab63168a0 Make GVN Hoisting obey optnone/bisect.
Differential Revision: http://reviews.llvm.org/D22545


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276048 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 22:57:14 +00:00
Matthias Braun
c5e14e0478 RegScavenging: Add scavengeRegisterBackwards()
This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.

This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.

Differential Revision: http://reviews.llvm.org/D21885

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276044 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 22:37:09 +00:00
Sanjay Patel
6f62be2e5b regenerate checks
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276042 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 22:32:15 +00:00
Evandro Menezes
bc05d15136 [AArch64] Properly validate the reciprocal estimation.
Add check for legal data types when expanding into a Newton series.

Differential Revision: https://reviews.llvm.org/D22267

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276041 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 22:31:11 +00:00
Sanjay Patel
11faea381f [InstCombine] fold add(zext(xor X, C), C) --> sext X when C is INT_MIN in the source type
The pattern may look more obviously like a sext if written as:

  define i32 @g(i16 %x) {
    %zext = zext i16 %x to i32
    %xor = xor i32 %zext, 32768
    %add = add i32 %xor, -32768
    ret i32 %add
  }

We already have that fold in visitAdd().

Differential Revision: https://reviews.llvm.org/D22477



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276035 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 22:09:34 +00:00
George Burgess IV
bfc580351a [CFLAA] Make a test tell the truth. NFC.
Dishonesty noted by Jia Chen.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276028 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 20:56:41 +00:00
George Burgess IV
1caa063db8 [CFLAA] Add some interproc. analysis to CFLAnders.
This patch adds function summary support to CFLAnders. It also comes
with a lot of tests! Woohoo!

Patch by Jia Chen.

Differential Revision: https://reviews.llvm.org/D22450


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276026 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 20:47:15 +00:00
Kevin Enderby
fa9076153b Next step along the way to getting good error messages for bad archives.
This step builds on Lang Hames work to change Archive::child_iterator
for better interoperation with Error/Expected.  Building on that it is now
possible to return an error message when the size field of an archive
contains non-decimal characters.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276025 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 20:47:07 +00:00
Sanjay Patel
5f33cbe2bf add even more missing tests for simplifySelectBitTest()
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276024 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 20:47:00 +00:00
Vedant Kumar
ea47fc5fc9 [tsan] Don't instrument __llvm_gcov_global_state_pred or __llvm_gcda*
r274801 did not go far enough to allow gcov+tsan to cooperate. With this
commit it's possible to run the following code without false positives:

  std::thread T1(fib), T2(fib);
  T1.join(); T2.join();

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276015 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 20:16:08 +00:00
Tim Northover
b2e69d912a ARM: move feature for Thumb2 pkhbt/pkhtb onto architectures.
There's not much functional change, but it really is an architectural feature
(on v6T2, v7A, v7R and v7EM) rather than something each CPU implements
individually.

The main functional change is the default behaviour you get when specifying
only "-triple".

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276013 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 19:49:13 +00:00
Ahmed Bougacha
98d2ab3a50 [GlobalISel] Mark newly-created gvregs as having a bank.
Also verify that we never try to set the size of a vreg associated
to a register class.

Report an error when we encounter that in MIR. Fix a testcase that
hit that error and had a size for no reason.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276012 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 19:48:36 +00:00
David Majnemer
baf88b3b1a [FunctionAttrs] Correct the safety analysis for inference of 'returned'
We skipped over ReturnInsts which didn't return an argument which would
lead us to incorrectly conclude that an argument returned by another
ReturnInst was 'returned'.

This reverts commit r275756.

This fixes PR28610.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276008 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 18:50:26 +00:00
David Majnemer
3c8951d73e Add a testcase for r275581
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276002 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 17:52:41 +00:00
Sanjay Patel
d9f3c5ebc0 add tests related to PR28466
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275995 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 17:07:35 +00:00
Simon Pilgrim
259dd35565 [X86][AVX512] Added AVX512 subvector broadcast tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275994 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 17:04:28 +00:00
Simon Pilgrim
5d55323a67 [X86][AVX] Fixed typo in test names
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275992 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 16:52:05 +00:00
Sanjay Patel
779845c3ce add missing test for simplifySelectBitTest()
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275990 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 16:49:55 +00:00
Tobias Grosser
65165b21da [InstCombine] Enable cast-folding in logic(cast(icmp), cast(icmp))
Summary:
Currently, InstCombine is already able to fold expressions of the form `logic(cast(A), cast(B))` to the simpler form `cast(logic(A, B))`, where logic designates one of `and`/`or`/`xor`. This transformation is implemented in `foldCastedBitwiseLogic()` in InstCombineAndOrXor.cpp. However, this optimization will not be performed if both `A` and `B` are `icmp` instructions. The decision to preclude casts of `icmp` instructions originates in r48715 in combination with r261707, and can be best understood by the title of the former one:

> Transform (zext (or (icmp), (icmp))) to (or (zext (cimp), (zext icmp))) if at least one of the (zext icmp) can be transformed to eliminate an icmp.

Apparently, it introduced a transformation that is a reverse of the transformation that is done in `foldCastedBitwiseLogic()`. Its purpose is to expose pairs of `zext icmp` that would subsequently be optimized by `transformZExtICmp()` in InstCombineCasts.cpp. Therefore, in order to avoid an endless loop of switching back and forth between these two transformations, the one in `foldCastedBitwiseLogic()` has been restricted to exclude `icmp` instructions which is mirrored in the responsible check:

`if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src)) && ...`

This check seems to sort out more cases than necessary because:
- the reverse transformation is obviously done for `or` instructions only
- and also not every `zext icmp` pair is necessarily the result of this reverse transformation

Therefore we now remove this check and replace it by a more finegrained one in `shouldOptimizeCast()` that now rejects only those `logic(zext(icmp), zext(icmp))` that would be able to be optimized by `transformZExtICmp()`, which also avoids the mentioned endless loop. That means we are now able to also simplify expressions of the form `logic(cast(icmp), cast(icmp))` to `cast(logic(icmp, icmp))` (`cast` being an arbitrary `CastInst`).

As an example, consider the following IR snippet

```
%1 = icmp sgt i64 %a, %b
%2 = zext i1 %1 to i8
%3 = icmp slt i64 %a, %c
%4 = zext i1 %3 to i8
%5 = and i8 %2, %4
```

which would now be transformed to

```
%1 = icmp sgt i64 %a, %b
%2 = icmp slt i64 %a, %c
%3 = and i1 %1, %2
%4 = zext i1 %3 to i8
```

This issue became apparent when experimenting with the programming language Julia, which makes use of LLVM. Currently, Julia lowers its `Bool` datatype to LLVM's `i8` (also see https://github.com/JuliaLang/julia/pull/17225). In fact, the above IR example is the lowered form of the Julia snippet `(a > b) & (a < c)`. Like shown above, this may introduce `zext` operations, casting between `i1` and `i8`, which could for example hinder ScalarEvolution and Polly on certain code.

Reviewers: grosser, vtjnash, majnemer

Subscribers: majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D22511

Contributed-by: Matthias Reisinger

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275989 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 16:39:17 +00:00
Simon Pilgrim
0f9cdd21cd [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.

It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).

This patch changes both scalar and packed versions back to using x86-specific builtins.

It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.

A companion clang patch is at D22105

Differential Revision: https://reviews.llvm.org/D22106

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275981 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 15:07:43 +00:00
Sam Parker
aa39b5574e [ARM] Refactor Thumb2 Mul and Mla instr descs
Recommitting after r274347 was reverted. This patch introduces some
classes to refactor the 3 and 4 register Thumb2 multiplication
instruction descriptions, plus improved tests for some of those
instructions.

Differential Revision: https://reviews.llvm.org/D21929



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275979 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 14:44:05 +00:00
Peter Smith
ca1d6a6554 Add support for tlsldm assembler operator to ARM target
The standard local dynamic model for TLS on ARM systems needs two 
relocations:
- R_ARM_TLS_LDM32 (module idx)
- R_ARM_TLS_LDO32 (offset of object from origin of module TLS block)
    
In GNU style assembler we use symbol(tlsldm) and symbol(tlsldo) to
produce these relocations.
    
llvm-mc for ARM supports symbol(tlsldo) but does not support symbol(tlsldm).
This patch wires up the existing symbol(tlsldm) to R_ARM_TLS_LDM32.
    
TLS for ARM is defined in Addenda to, and Errata in, the ABI for the
ARM Architecture
    
Differential Revision: https://reviews.llvm.org/D22461



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275977 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 14:15:33 +00:00
Simon Pilgrim
c964a662f4 [AARCH64] Fix linu triple typo
As promised in D22191

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275976 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 14:12:45 +00:00
Simon Pilgrim
e927841848 [AARCH64] Enable AARCH64 lit tests on windows dev machines
As discussed on PR27654, this patch fixes the triples of a lot of aarch64 tests and enables lit tests on windows

This will hopefully help stop cases where windows developers break the aarch64 target

Differential Revision: https://reviews.llvm.org/D22191

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275973 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 13:35:11 +00:00
Daniel Sanders
6b209b804f [mips][ias] R_MIPS_GOT_(PAGE|OFST) do not need symbols
Reviewers: sdardis

Subscribers: dsanders, llvm-commits, sdardis

Differential Revision: https://reviews.llvm.org/D22458

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275968 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 10:58:06 +00:00
Daniel Sanders
0f320a4f3b [mips] Correct label prefixes for N32 and N64.
Summary:
N32 and N64 follow the standard ELF conventions (.L) whereas O32 uses its own
($).

This fixes the majority of object differences between -fintegrated-as and
-fno-integrated-as.

Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: https://reviews.llvm.org/D22412



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275967 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 10:49:03 +00:00
Elena Demikhovsky
87a79055a6 AVX-512: Fixed BT instruction selection.
The following condition expression ( a >> n) & 1 is converted to "bt a, n" instruction. It works on all intel targets.
But on AVX-512 it was broken because the expression is modified to (truncate (a >>n) to i1).

I added the new sequence (truncate (a >>n) to i1) to the BT pattern.

Differential Revision: https://reviews.llvm.org/D22354



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275950 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 07:14:21 +00:00
Craig Topper
c61cf90305 [AVX512] Give priority to EVEX encoded PSHUFB over the VEX versions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275942 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 02:00:38 +00:00
George Burgess IV
9906b88abc [MemorySSA] Update to the new shiny walker.
This patch updates MemorySSA's use-optimizing walker to be more
accurate and, in some cases, faster.

Essentially, this changed our core walking algorithm from a
cache-as-you-go DFS to an iteratively expanded DFS, with all of the
caching happening at the end. Said expansion happens when we hit a Phi,
P; we'll try to do the smallest amount of work possible to see if
optimizing above that Phi is legal in the first place. If so, we'll
expand the search to see if we can optimize to the next phi, etc.

An iteratively expanded DFS lets us potentially quit earlier (because we
don't assume that we can optimize above all phis) than our old walker.
Additionally, because we don't cache as we go, we can now optimize above
loops.

As an added bonus, this patch adds a ton of verification (if
EXPENSIVE_CHECKS are enabled), so finding bugs is easier.

Differential Revision: https://reviews.llvm.org/D21777


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275940 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 01:29:15 +00:00
Vedant Kumar
d91d378e3a Retry: [llvm-profdata] Speed up merging by using a thread pool
Add a "-j" option to llvm-profdata to control the number of threads used.
Auto-detect NumThreads when it isn't specified, and avoid spawning threads when
they wouldn't be beneficial.

I tested this patch using a raw profile produced by clang (147MB). Here is the
time taken to merge 4 copies together on my laptop:

  No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total
  With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total

Changes since the initial commit:

  - When handling odd-length inputs, call ThreadPool::wait() before merging the
    last profile. Should fix a race/off-by-one (see r275937).

Differential Revision: https://reviews.llvm.org/D22438

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275938 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 01:17:20 +00:00
Vedant Kumar
40a2c5247c Revert "[llvm-profdata] Speed up merging by using a thread pool"
This reverts commit r275921. It broke the ppc64be bot:

  http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/3537

I'm not sure why it broke, but based on the output, it looks like an
off-by-one (one profile left un-merged).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275937 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 00:57:09 +00:00
Wei Mi
92a8d601a3 Recommit the patch "Use uniforms set to populate VecValuesToIgnore".
For instructions in uniform set, they will not have vector versions so
add them to VecValuesToIgnore.
For induction vars, those only used in uniform instructions or consecutive
ptrs instructions have already been added to VecValuesToIgnore above. For
those induction vars which are only used in uniform instructions or
non-consecutive/non-gather scatter ptr instructions, the related phi and
update will also be added into VecValuesToIgnore set.

The change will make the vector RegUsages estimation less conservative.

Differential Revision: https://reviews.llvm.org/D20474

The recommit fixed the testcase global_alias.ll.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275936 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 00:50:43 +00:00
Matt Arsenault
4cead0b564 AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275934 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 00:35:03 +00:00
Sanjoy Das
aaf3f191a5 [LoopReroll] Reroll loops with unordered atomic memory accesses
Reviewers: hfinkel, jfb, reames

Subscribers: mcrosier, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D22385

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275932 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 00:23:54 +00:00
Matt Arsenault
f36ea238a4 AMDGPU: Fix test name and broken CHECK-LABEL
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275928 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 23:09:51 +00:00
Vedant Kumar
94471e314e [llvm-profdata] Speed up merging by using a thread pool
Add a "-j" option to llvm-profdata to control the number of threads
used. Auto-detect NumThreads when it isn't specified, and avoid spawning
threads when they wouldn't be beneficial.

I tested this patch using a raw profile produced by clang (147MB). Here is the
time taken to merge 4 copies together on my laptop:

  No thread pool: 112.87s user 5.92s system 97% cpu 2:01.08 total
  With 2 threads: 134.99s user 26.54s system 164% cpu 1:33.31 total

Differential Revision: https://reviews.llvm.org/D22438

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275921 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 22:02:39 +00:00
Artem Belevich
09080a9732 [NVPTX] Make sure we adjust alignment at all call sites
.. including calls from kernel functions that were
ignored by mistake before.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275920 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 21:58:48 +00:00
Dehao Chen
e7eb2d5c54 [PM] Convert Loop Strength Reduce pass to new PM
Summary: Convert Loop String Reduce pass to new PM

Reviewers: davidxl, silvas

Subscribers: junbuml, sanjoy, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D22468

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275919 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 21:41:50 +00:00
Teresa Johnson
385d70633e [PM] Port FunctionImport Pass to new PM
Summary: Port FunctionImport Pass to new PM.

Reviewers: mehdi_amini, davide

Subscribers: davidxl, llvm-commits

Differential Revision: https://reviews.llvm.org/D22475

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275916 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 21:22:24 +00:00
Wei Mi
fba236f858 Revert rL275912.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275915 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 21:14:43 +00:00
Wei Mi
1938056381 Use uniforms set to populate VecValuesToIgnore.
For instructions in uniform set, they will not have vector versions so
add them to VecValuesToIgnore.
For induction vars, those only used in uniform instructions or consecutive
ptrs instructions have already been added to VecValuesToIgnore above. For
those induction vars which are only used in uniform instructions or
non-consecutive/non-gather scatter ptr instructions, the related phi and
update will also be added into VecValuesToIgnore set.

The change will make the vector RegUsages estimation less conservative.

Differential Revision: https://reviews.llvm.org/D20474


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275912 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 20:59:53 +00:00
Sanjay Patel
cc7cb1cb65 add tests for missed sext transform
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275908 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 20:37:51 +00:00
Sanjay Patel
52f3fd0660 auto-generate checks
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275899 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 20:06:51 +00:00
Artem Belevich
67e73ac68f [NVPTX] Force minimum alignment of 4 for byval arguments of device-side functions.
Taking address of a byval variable in PTX is legal, but currently runs
into miscompilation by ptxas on sm_50+ (NVIDIA issue 1789042).
Work around the issue by enforcing minimum alignment on byval arguments
of device functions.

The change is a no-op on SASS level for sm_3x where ptxas already aligns
local copy by at least 4.

Differential Revision: https://reviews.llvm.org/D22428

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275893 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 19:54:56 +00:00
Michael Zolotukhin
1bf3e6c4f5 [LoopSimplify] Update LCSSA after separating nested loops.
Summary:
Usually LCSSA survives this transformation, but in some cases (see
attached test) it doesn't: values from the original loop after
separating might be used from the outer loop. Before the transformation
it was the same loop, so LCSSA phis were not required.

This fixes PR28272.

Reviewers: sanjoy, hfinkel, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D21665

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275891 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 19:44:19 +00:00
Vitaly Buka
2d1ffcc163 Revert "[ARM] Skip inline asm memory operands in DAGToDAGISel"
Breaks asan, see https://reviews.llvm.org/D22103

This reverts commit r275776.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275890 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-18 19:44:01 +00:00