Commit Graph

18780 Commits

Author SHA1 Message Date
Craig Topper
106bb1fe2d [AVX-512] Add more varied alignments to tests for storing the lower 128-bits of a 256 or 512-bit subvector extract.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286343 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-09 05:38:47 +00:00
Craig Topper
4c1b9a2ab6 [AVX-512] Use alignedstore256 in patterns that look for stores of the lower 256-bits of a 512-bit vector to use a 256-bit aligned store.
Previously we were only checking for 16 byte alignment instead of 32 byte alignment. Fixes PR30947.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286342 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-09 05:31:57 +00:00
Craig Topper
89f0495611 [AVX-512] Add test cases to demonstrate PR30947. We accidentally use 32 byte aligned store instructions when the original store was only 16 byte aligned if the store is from the lower bits of a subvector extract.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286341 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-09 05:31:53 +00:00
Craig Topper
f43d5aff8d [AVX-512] Make VBMI instruction set enabling imply that the BWI instruction set is also enabled.
Summary:
This is needed to make the v64i8 and v32i16 types legal for the 512-bit VBMI instructions. Fixes PR30912.

Reviewers: delena, zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26322

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286339 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-09 04:50:48 +00:00
Sanjay Patel
78c322d6f1 [ValueTracking] recognize obfuscated variants of umin/umax
The smallest tests that expose this are codegen tests (because SelectionDAGBuilder::visitSelect() uses matchSelectPattern
to create UMAX/UMIN nodes), but it's also possible to see the effects in IR alone with folds of min/max pairs.

If these were written as unsigned compares in IR, InstCombine canonicalizes the unsigned compares to signed compares. 
Ie, running the optimizer pessimizes the codegen for this case without this patch:

define <4 x i32> @umax_vec(<4 x i32> %x) {
  %cmp = icmp ugt <4 x i32> %x, <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
  %sel = select <4 x i1> %cmp, <4 x i32> %x, <4 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
  ret <4 x i32> %sel
}

$ ./opt umax.ll -S | ./llc -o - -mattr=avx

vpmaxud LCPI0_0(%rip), %xmm0, %xmm0

$ ./opt -instcombine umax.ll -S | ./llc -o - -mattr=avx

vpxor %xmm1, %xmm1, %xmm1
vpcmpgtd  %xmm0, %xmm1, %xmm1
vmovaps LCPI0_0(%rip), %xmm2    ## xmm2 = [2147483647,2147483647,2147483647,2147483647]
vblendvps %xmm1, %xmm0, %xmm2, %xmm0

Differential Revision: https://reviews.llvm.org/D26096


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286318 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-09 00:24:44 +00:00
Dan Gohman
e5d85d174b [WebAssembly] Convert stackified IMPLICIT_DEF into constant 0.
Since IMPLIFIT_DEF instructions are omitted in the output, when the output
of an IMPLICIT_DEF instruction is stackified, the resulting register lacks
an explicit push, leading to a push/pop mismatch. Fix this by converting
such IMPLICIT_DEFs into CONST_I32 0 instructions so that they have explicit
pushes.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286274 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 19:40:38 +00:00
Tim Northover
32edb7e1ce GlobalISel: support selecting fpext/fptrunc instructions on AArch64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286253 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 17:44:07 +00:00
Anton Korobeynikov
96132e4d7d Fix PR27500: on MSP430 the branch destination offset is measured in words, not bytes.
Summary: In addition, the branch instructions will have proper BB destinations, not offsets, like before.

Reviewers: asl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23718

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286252 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 17:19:59 +00:00
Simon Pilgrim
38950e5112 [X86][SSE] Regenerate test (just adds missing header)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286241 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 15:42:49 +00:00
Simon Pilgrim
abcddd5a2c [TargetLowering] Fix undef vector element issue with true/false result handling
Fixed an issue with vector usage of TargetLowering::isConstTrueVal / TargetLowering::isConstFalseVal boolean result matching.

The comment said we shouldn't handle constant splat vectors with undef elements. But the the actual code was returning false if the build vector contained no undef elements....

This patch now ignores the number of undefs (getConstantSplatNode will return null if the build vector is all undefs).

The change has also unearthed a couple of missed opportunities in AVX512 comparison code that will need to be addressed.

Differential Revision: https://reviews.llvm.org/D26031

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286238 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 15:07:01 +00:00
Simon Pilgrim
169b408a54 [VectorLegalizer] Expansion of CTLZ using CTPOP when possible
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286233 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 14:10:28 +00:00
Roger Ferrer Ibanez
b576a606d4 [AArch64] Fix incorrect CSEL node created
Under -enable-unsafe-fp-math, SELECT_CC lowering in AArch64
transforms floating point comparisons of the form "a == 0.0 ? 0.0 : x" to
"a == 0.0 ? a : x". But it incorrectly assumes that 'x' and 'a' have
the same type which can lead to a wrong CSEL node that crashes later
due to nonsensical copies.

Differential Revision: https://reviews.llvm.org/D26394



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286231 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 13:34:41 +00:00
Simon Dardis
00b1a65d84 [mips] Renable small data section test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286230 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 13:03:45 +00:00
Craig Topper
2b94bbb140 [AVX-512] Add an avx512f without avx512vl command line to vec_fp_to_int.ll and regenerate. This will make a change in a future patch easier to see. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286216 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 06:58:53 +00:00
Tim Northover
7b111890d1 GlobalISel: support selecting G_SELECT on AArch64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286185 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 00:45:29 +00:00
Tim Northover
648fd9b1f6 GlobalISel: constrain PHI registers on AArch64.
Self-referencing PHI nodes need their destination operands to be constrained
because nothing else is likely to do so. For now we just pick a register class
naively.

Patch mostly by Ahmed again.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286183 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-08 00:34:06 +00:00
Chad Rosier
a5af556e61 [AArch64] Remove dead check prefixes after r286110. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286174 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 23:13:59 +00:00
Chad Rosier
c6a1b2b827 [AArch64] Rename test to reflect changes after r286110. NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286173 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 23:13:55 +00:00
Stanislav Mekhanoshin
c312996e7a [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.

With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a forward propagation of a v_cmp 64 bit result
to an user is implemented. Additional side effect of this is that we may
consume less VGPRs in a cost of more SGPRs in case if holding of multiple
conditions is needed, and that is a clear win in most cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286171 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 23:04:50 +00:00
Sanjin Sijaric
978966fb97 [AArch64] Transfer memory operands when lowering vector load/store intrinsics
Summary:
Some vector loads and stores generated from AArch64 intrinsics alias each other
unnecessarily, preventing better scheduling.  We just need to transfer memory
operands during lowering.

Reviewers: mcrosier, t.p.northover, jmolloy

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D26313

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286168 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 22:39:02 +00:00
Derek Schuff
9ebd8435df [WebAssembly] Emit a BasePointer when we have overly-aligned stack objects
Because we shift the stack pointer by an unknown amount, we need an
additional pointer. In the case where we have variable-size objects
as well, we can't reuse the frame pointer, thus three pointers.

Patch by Jacob Gravelle

Differential Revision: https://reviews.llvm.org/D26263

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286160 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 22:00:48 +00:00
Matt Arsenault
f577de357a AMDGPU: Remove unnecessary and on conditional branch
The comment explaining why this was necessary is incorrect
in its description of v_cmp's behavior for inactive workitems.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286134 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 19:09:33 +00:00
Matt Arsenault
e5fd9c09ad AMDGPU: Preserve vcc undef flags when inverting branch
If the branch was on a read-undef of vcc, passes that used
analyzeBranch to invert the branch condition wouldn't preserve
the undef flag resulting in a verifier error.

Fixes verifier failures in a future commit.

Also fix verifier error when inserting copy for vccz
corruption bug.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286133 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 19:09:27 +00:00
Richard Smith
bbc05615fe Add -O0 support for @llvm.invariant.group.barrier by discarding it if it gets to ISel.
Differential Revision: https://reviews.llvm.org/D26292


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286119 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 16:47:20 +00:00
Amara Emerson
813d12083c This patch adds support for 16 bit floating point registers to the inline asm register selection on AArch64.
Without this patch, register allocation for the example below fails.

define half @test(half %a1, half %a2) #0 {
entry:
  %0 = tail call half asm "sqrshl ${0:h}, ${1:h}, ${2:h}", "=w,w,w" (half %a1, half %a2) #1
  ret half %0
}

Patch by Florian Hahn.

Differential Revision: https://reviews.llvm.org/D25080



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286111 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 15:42:12 +00:00
Chad Rosier
ea453ce258 [AArch64] Removed the narrow load merging code in the ld/st optimizer.
This feature has been disabled for some time now, so remove cruft.

Differential Revision: https://reviews.llvm.org/D26248

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286110 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 15:27:22 +00:00
James Molloy
ba4796904e [Thumb1] Move padding earlier when synthesizing TBBs off of the PC
When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding.

If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset.

In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286107 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 13:38:21 +00:00
Simon Pilgrim
657bbc552a [X86][AVX512] Add AVX512VL/AVX512BWVL vector truncation tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286105 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 13:34:29 +00:00
Simon Pilgrim
bf0c64de36 [X86][SSE] Drop unnecessary -mcpu argument from trunc tests
cpu/triple duplication

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286104 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 13:28:20 +00:00
Craig Topper
e69277c6bf [AVX-512] Remove masked pmovzx/pmovsx builtins and autoupgrade them to selects and native zext/sext.
This mostly reuses earlier autoupgrade support for the sse and avx equivalents. Just needed to add the code to add the select.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286092 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 02:12:57 +00:00
Craig Topper
9fd28c59b7 [AVX-512] Remove 128/256 masked pshufb intrinsics. Autoupgrade them to legacy intrinsics and a select.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286089 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 00:13:39 +00:00
Saleem Abdulrasool
10d3cd3f52 ARM: lower fpowi appropriately for Windows ARM
This handles the last case of the builtin function calls that we would
generate code which differed from Microsoft's ABI.  Rather than
generating a call to `__pow{d,s}i2` we now promote the parameter to a
float or double and invoke `powf` or `pow` instead.

Addresses PR30825!

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286082 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 19:46:54 +00:00
Simon Pilgrim
dc0f6aec55 [SelectionDAG] Add support for vector demandedelts in XOR opcodes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286075 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 16:49:19 +00:00
Simon Pilgrim
20ea1e9a7d [X86] Add knownbits vector xor test
In preparation for demandedelts support

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286074 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 16:36:29 +00:00
Craig Topper
90419a0ef5 [AVX-512] Remove intrinsics for 128/256-bit masked variable shift. Instead upgrade them to a select and the older AVX2 intrinsic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286073 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 16:29:19 +00:00
Craig Topper
4734b821e2 [AVX-512] Remove intrinsics for 128/256-bit masked shift by immediate. Instead upgrade them to a select and the older SSE/AVX2 intrinsic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286072 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 16:29:14 +00:00
Simon Pilgrim
67ef8244cb [SelectionDAG] Add support for vector demandedelts in OR opcodes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286071 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 16:29:09 +00:00
Craig Topper
cd57e732e4 [AVX-512] Remove intrinsics for 128/256-bit masked shift by single element in xmm. Instead upgrade them to a select and the older SSE/AVX2 intrinsic.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286070 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 16:29:08 +00:00
Craig Topper
151fabe906 [AVX-512] Remove a 512-bit test cases from the avx512vl test file. It already exists in the avx512f test file.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286069 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 16:29:03 +00:00
Simon Pilgrim
c7f268ce57 [X86] Add knownbits vector or test
In preparation for demandedelts support

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286068 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 16:05:59 +00:00
Craig Topper
bd04d5b102 [X86] Add a few more fptoui test cases to the vec_fp_to_int.ll. The codegen for these test cases will be improved for AVX512 in a future commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286063 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 07:50:25 +00:00
Craig Topper
e7c8559144 [AVX-512] Add missing EVEX version of pattern for (v2f64 (extloadv2f32 addr:)) -> VCVTPS2PDZ128rm
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286059 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 04:12:52 +00:00
Craig Topper
ac00b70a61 [AVX-512] Add avx512vl command line to the fpext test and add -show-mc-encoding to show where we aren't using EVEX instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286058 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 04:12:49 +00:00
Craig Topper
197ca31f15 [AVX-512] Lower AVX cvtpd2ps intrinsic to ISD::FP_ROUND so it can use EVEX instruction when available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286057 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 04:12:46 +00:00
Craig Topper
c67db1e21e [AVX-512] Lower SSE/AVX cvtdq2ps intrinsics directly to ISD::SINT_TO_FP so they can use EVEX instructions when available.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286056 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 04:12:42 +00:00
Craig Topper
04b2a60236 [AVX-512] Add -show-mc-encoding to legacy vector intrinsic tests so we can see when VEX or EVEX encoded instructions are being emitted. Make sure the tests all have an avx2 command line and an skx command line.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286055 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-06 02:03:58 +00:00
Justin Lebar
de1867fd43 [LoopStrengthReduce] Don't use a DenseSet<int64_t> when we might add any valid int64_t to the set.
Summary:
SmallSetVector uses DenseSet, but that means we need to reserve some
values for the empty and tombstone keys.

It seems to me we should have a general way to let us store full-range
ints inside of DenseSets, and furthermore that we probably shouldn't
silently let you add ints into DenseSets without explicitly promising
that they're in range.  But that's a battle for another day; for now,
just fix this code, since we currently do something Very Bad when
compiling ffmpeg.

Fixes PR30914.

Reviewers: jeremyhu

Subscribers: llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D26323

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286038 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-05 16:47:25 +00:00
Krzysztof Parzyszek
eefaf3bd00 [Hexagon] Account for <def,read-undef> when validating moves for predication
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286009 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-04 20:41:03 +00:00
Zvi Rackover
9933d268bb [X86] Broadcast from memory intructions aren't unfoldable
Broadcast from memory instructions should be treated as moves. They can't be unfolded.

Fixes pr30693.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285998 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-04 15:15:19 +00:00
Zvi Rackover
85309916d9 Add bugpoint-reduced reproducer for pr30693
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285997 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-04 14:53:22 +00:00