The smallest tests that expose this are codegen tests (because SelectionDAGBuilder::visitSelect() uses matchSelectPattern
to create UMAX/UMIN nodes), but it's also possible to see the effects in IR alone with folds of min/max pairs.
If these were written as unsigned compares in IR, InstCombine canonicalizes the unsigned compares to signed compares.
Ie, running the optimizer pessimizes the codegen for this case without this patch:
define <4 x i32> @umax_vec(<4 x i32> %x) {
%cmp = icmp ugt <4 x i32> %x, <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
%sel = select <4 x i1> %cmp, <4 x i32> %x, <4 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>
ret <4 x i32> %sel
}
$ ./opt umax.ll -S | ./llc -o - -mattr=avx
vpmaxud LCPI0_0(%rip), %xmm0, %xmm0
$ ./opt -instcombine umax.ll -S | ./llc -o - -mattr=avx
vpxor %xmm1, %xmm1, %xmm1
vpcmpgtd %xmm0, %xmm1, %xmm1
vmovaps LCPI0_0(%rip), %xmm2 ## xmm2 = [2147483647,2147483647,2147483647,2147483647]
vblendvps %xmm1, %xmm0, %xmm2, %xmm0
Differential Revision: https://reviews.llvm.org/D26096
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286318 91177308-0d34-0410-b5e6-96231b3b80d8
Since IMPLIFIT_DEF instructions are omitted in the output, when the output
of an IMPLICIT_DEF instruction is stackified, the resulting register lacks
an explicit push, leading to a push/pop mismatch. Fix this by converting
such IMPLICIT_DEFs into CONST_I32 0 instructions so that they have explicit
pushes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286274 91177308-0d34-0410-b5e6-96231b3b80d8
Fixed an issue with vector usage of TargetLowering::isConstTrueVal / TargetLowering::isConstFalseVal boolean result matching.
The comment said we shouldn't handle constant splat vectors with undef elements. But the the actual code was returning false if the build vector contained no undef elements....
This patch now ignores the number of undefs (getConstantSplatNode will return null if the build vector is all undefs).
The change has also unearthed a couple of missed opportunities in AVX512 comparison code that will need to be addressed.
Differential Revision: https://reviews.llvm.org/D26031
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286238 91177308-0d34-0410-b5e6-96231b3b80d8
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.
This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.
Differential Revision: https://reviews.llvm.org/D25910
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286233 91177308-0d34-0410-b5e6-96231b3b80d8
Under -enable-unsafe-fp-math, SELECT_CC lowering in AArch64
transforms floating point comparisons of the form "a == 0.0 ? 0.0 : x" to
"a == 0.0 ? a : x". But it incorrectly assumes that 'x' and 'a' have
the same type which can lead to a wrong CSEL node that crashes later
due to nonsensical copies.
Differential Revision: https://reviews.llvm.org/D26394
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286231 91177308-0d34-0410-b5e6-96231b3b80d8
Self-referencing PHI nodes need their destination operands to be constrained
because nothing else is likely to do so. For now we just pick a register class
naively.
Patch mostly by Ahmed again.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286183 91177308-0d34-0410-b5e6-96231b3b80d8
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.
With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a forward propagation of a v_cmp 64 bit result
to an user is implemented. Additional side effect of this is that we may
consume less VGPRs in a cost of more SGPRs in case if holding of multiple
conditions is needed, and that is a clear win in most cases.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286171 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Some vector loads and stores generated from AArch64 intrinsics alias each other
unnecessarily, preventing better scheduling. We just need to transfer memory
operands during lowering.
Reviewers: mcrosier, t.p.northover, jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26313
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286168 91177308-0d34-0410-b5e6-96231b3b80d8
Because we shift the stack pointer by an unknown amount, we need an
additional pointer. In the case where we have variable-size objects
as well, we can't reuse the frame pointer, thus three pointers.
Patch by Jacob Gravelle
Differential Revision: https://reviews.llvm.org/D26263
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286160 91177308-0d34-0410-b5e6-96231b3b80d8
The comment explaining why this was necessary is incorrect
in its description of v_cmp's behavior for inactive workitems.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286134 91177308-0d34-0410-b5e6-96231b3b80d8
If the branch was on a read-undef of vcc, passes that used
analyzeBranch to invert the branch condition wouldn't preserve
the undef flag resulting in a verifier error.
Fixes verifier failures in a future commit.
Also fix verifier error when inserting copy for vccz
corruption bug.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286133 91177308-0d34-0410-b5e6-96231b3b80d8
When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding.
If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset.
In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286107 91177308-0d34-0410-b5e6-96231b3b80d8
This mostly reuses earlier autoupgrade support for the sse and avx equivalents. Just needed to add the code to add the select.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286092 91177308-0d34-0410-b5e6-96231b3b80d8
This handles the last case of the builtin function calls that we would
generate code which differed from Microsoft's ABI. Rather than
generating a call to `__pow{d,s}i2` we now promote the parameter to a
float or double and invoke `powf` or `pow` instead.
Addresses PR30825!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286082 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
SmallSetVector uses DenseSet, but that means we need to reserve some
values for the empty and tombstone keys.
It seems to me we should have a general way to let us store full-range
ints inside of DenseSets, and furthermore that we probably shouldn't
silently let you add ints into DenseSets without explicitly promising
that they're in range. But that's a battle for another day; for now,
just fix this code, since we currently do something Very Bad when
compiling ffmpeg.
Fixes PR30914.
Reviewers: jeremyhu
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D26323
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286038 91177308-0d34-0410-b5e6-96231b3b80d8