I noticed that we were failing to narrow an x86 ymm math op in a case similar
to the 'madd' test diff. That is because a bitcast is sitting between the math
and the extract subvector and thwarting our pattern matching for narrowing:
t56: v8i32 = add t59, t58
t68: v4i64 = bitcast t56
t73: v2i64 = extract_subvector t68, Constant:i64<2>
t96: v4i32 = bitcast t73
There are a few wins and neutral diffs in the other tests.
Differential Revision: https://reviews.llvm.org/D61806
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360541 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts r360171 (git commit a9d6c32eafc645c55b07eb50698c428e14c0bffd). A repro showing the asan/msan failures is forthcoming.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360481 91177308-0d34-0410-b5e6-96231b3b80d8
To find the candidates to merge stores we iterate over all nodes in a chain
for each store, which leads to quadratic compile times for large basic blocks
with a large number of stores.
Reviewers: niravd, spatel, craig.topper
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D61511
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360357 91177308-0d34-0410-b5e6-96231b3b80d8
When simplifying TokenFactors, we potentially iterate over all
operands of a large number of TokenFactors. This causes quadratic
compile times in some cases and the large token factors cause additional
scalability problems elsewhere.
This patch adds some limits to the number of nodes explored for the
cases mentioned above.
Reviewers: niravd, spatel, craig.topper
Reviewed By: niravd
Differential Revision: https://reviews.llvm.org/D61397
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360171 91177308-0d34-0410-b5e6-96231b3b80d8
The problem was that we were creating a CMOV64rr <TargetFrameIndex>, <TargetFrameIndex>. The entire point of a TFI is that address code is not generated, so there's no way to legalize/lower this. Instead, simply prevent it's creation.
Arguably, we shouldn't be using *Target*FrameIndices in StatepointLowering at all, but that's a much deeper change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360090 91177308-0d34-0410-b5e6-96231b3b80d8
The original patch was committed at rL359398 and reverted at rL359695 because of
infinite looping.
This includes a fix to check for a vector splat of "1.0" to avoid the infinite loop.
Original commit message:
This was originally part of D61028, but it's an independent diff.
If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.
The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.
Differential Revision: https://reviews.llvm.org/D61149
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359793 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit fb9a5307a94e6f1f850e4d89f79103b123f16279 (rL359398)
because it can cause an infinite loop due to opposing combines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359695 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Extract the logic for doing reassociations
from DAGCombiner::reassociateOps into a helper
function DAGCombiner::reassociateOpsCommutative,
and use that helper to trigger reassociation
on the original operand order, or the commuted
operand order.
Codegen is not identical since the operand order will
be different when doing the reassociations for the
commuted case. That causes some unfortunate churn in
some test cases. Apart from that this should be NFC.
Reviewers: spatel, craig.topper, tstellar
Reviewed By: spatel
Subscribers: dmgreen, dschuff, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, hiraditya, aheejin, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61199
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359476 91177308-0d34-0410-b5e6-96231b3b80d8
This was originally part of D61028, but it's an independent diff.
If we try the repeated divisor reciprocal transform before producing an estimate sequence,
then we have an opportunity to use scalar fdiv. On x86, the trade-off is 1 divss vs. 5
vector FP ops in the default estimate sequence. On recent chips (Skylake, Ryzen), the
full-precision division is only 3 cycle throughput, so that's probably the better perf
default option and avoids problems from x86's inaccurate estimates.
The last 2 tests show that users still have the option to override the defaults by using
the function attributes for reciprocal estimates, but those patterns are potentially made
faster by converting the vector ops (including ymm ops) to scalar math.
Differential Revision: https://reviews.llvm.org/D61149
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359398 91177308-0d34-0410-b5e6-96231b3b80d8
As detailed on PR40758, Bobcat/Jaguar can perform vector immediate shifts on the same pipes as vector ANDs with the same latency - so it doesn't make sense to replace a shl+lshr with a shift+and pair as it requires an additional mask (with the extra constant pool, loading and register pressure costs).
Differential Revision: https://reviews.llvm.org/D61068
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359293 91177308-0d34-0410-b5e6-96231b3b80d8
If we have a vector FP division with a splatted divisor, use the existing transform
that converts 'x/y' into 'x * (1.0/y)' to allow more conversions. This can then
potentially be converted into a scalar FP division by existing combines (rL358984)
as seen in the tests here.
That can be a potentially big perf difference if scalar fdiv has better timing
(including avoiding possible frequency throttling for vector ops).
Differential Revision: https://reviews.llvm.org/D61028
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359147 91177308-0d34-0410-b5e6-96231b3b80d8
If we only match build vectors, we can miss some patterns
that use shuffles as seen in the affected tests.
Note that the underlying calls within getSplatSourceVector()
have the potential for compile-time explosion because of
exponential recursion looking through binop opcodes, but
currently the list of supported opcodes is very limited.
Both of those problems should be addressed in follow-up
patches.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358984 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The DAGCombiner is rewriting (canonicalizing) an ISD::ADD
with no common bits set in the operands as an ISD::OR node.
This could sometimes result in "missing out" on some
combines that normally are performed for ADD. To be more
specific this could happen if we already have rewritten an
ADD into OR, and later (after legalizations or combines)
we expose patterns that could have been optimized if we
had seen the OR as an ADD (e.g. reassociations based on ADD).
To make the DAG combiner less sensitive to if ADD or OR is
used for these "no common bits set" ADD/OR operations we
now apply most of the ADD combines also to an OR operation,
when value tracking indicates that the operands have no
common bits set.
Reviewers: spatel, RKSimon, craig.topper, kparzysz
Reviewed By: spatel
Subscribers: arsenm, rampitec, lebedev.ri, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59758
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358965 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious.
shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask
preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358526 91177308-0d34-0410-b5e6-96231b3b80d8
The checks in `canFoldInAddressingMode` tested for addressing modes that have a
base register but didn't set the `HasBaseReg` flag to true (it's false by
default). This patch fixes that. Although the omission of the flag was
technically incorrect it had no known observable impact, so no tests were
changed by this patch.
Differential Revision: https://reviews.llvm.org/D60314
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358502 91177308-0d34-0410-b5e6-96231b3b80d8
// shuffle (concat X, undef), (concat Y, undef), Mask -->
// concat (shuffle X, Y, Mask0), (shuffle X, Y, Mask1)
The ARM changes with 'vtrn' and narrowed 'vuzp' are improvements.
The x86 changes look neutral or better. There's one test with an
extra instruction, but that could be reversed for a subtarget with
the right attributes. But by default, we want to avoid the 256-bit
op when possible (in my motivating benchmark, a handful of ymm ops
sprinkled into a sequence of xmm ops are triggering frequency
throttling on Haswell resulting in significantly worse perf).
Differential Revision: https://reviews.llvm.org/D60545
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358291 91177308-0d34-0410-b5e6-96231b3b80d8
// bo (build_vec ...undef, x, undef...), (build_vec ...undef, y, undef...) -->
// build_vec ...undef, (bo x, y), undef...
The lifetime of the nodes in these examples is different for variables versus constants,
but they are all build vectors briefly, so I'm proposing to catch them in this form to
handle all of the leading examples in the motivating test file.
Before we have build vectors, we might have insert_vector_element. After that, we might
have scalar_to_vector and constant pool loads.
It's going to take more work to ensure that FP vector operands are getting simplified
with undef elements, so this transform can apply more widely. In a non-loose FP environment,
we are likely simplifying FP elements to NaN values rather than undefs.
We also need to allow more opcodes down this path. Eg, we don't handle FP min/max flavors
yet.
Differential Revision: https://reviews.llvm.org/D60514
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358172 91177308-0d34-0410-b5e6-96231b3b80d8
This lines up with what we do for regular subtract and it matches up better with X86 assumptions in isel patterns that add with immediate is more canonical than sub with immediate.
Differential Revision: https://reviews.llvm.org/D60020
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358027 91177308-0d34-0410-b5e6-96231b3b80d8
There are a variety of vector patterns that may be profitably reduced to a
scalar op when scalar ops are performed using a subset (typically, the
first lane) of the vector register file.
For x86, this is true for float/double ops and element 0 because
insert/extract is just a sub-register rename.
Other targets should likely enable the hook in a similar way.
Differential Revision: https://reviews.llvm.org/D60150
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357760 91177308-0d34-0410-b5e6-96231b3b80d8
Use consistent variable names down the SimplifyDemanded* call stack so debugging isn't such a annoyance.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357602 91177308-0d34-0410-b5e6-96231b3b80d8
There are 3 changes to make this correspond to the same transform in instcombine:
1. Remove the legality check - we can't create anything less legal than we started with.
2. Ease the use restriction, so we only bail out if both operands have >1 use.
3. Ease the use restriction for binops with a repeated operand (eg, mul x, x).
As discussed in D60150, there's a scalarization opportunity that will be made
easier by allowing this transform more generally.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357580 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.
Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight
Reviewed By: jyknight
Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58070
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357283 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations without fully pruning unused result values. This results
in nodes that are never added to the worklist and therefore can not be
pruned.
Add a node inserter for the combiner to make sure such nodes have the
chance of being pruned. This allows a number of additional peephole
optimizations.
Reviewers: efriedma, RKSimon, craig.topper, jyknight
Reviewed By: jyknight
Subscribers: msearles, jyknight, sdardis, nemanjai, javed.absar, hiraditya, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58068
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357279 91177308-0d34-0410-b5e6-96231b3b80d8
After investigating the examples from D59777 targeting an SSE4.1 machine,
it looks like a very different problem due to how we map illegal types (256-bit in these cases).
We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand.
We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that
generality means it is limited to patterns with a one-use constraint, and the examples here have
2 uses. We don't need any uses or legality limitations for a simplification (no new value is
created).
It looks like we miss this pattern in IR too.
In one of the zext examples here, we have shuffle masks like this:
Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7>
Shuf = vector_shuffle<4,u,6,7,u,u,u,u>
...so that's moving the high half of the 1st vector into the low half. But the high half of the
1st vector is already identical to the low half.
Differential Revision: https://reviews.llvm.org/D59961
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357258 91177308-0d34-0410-b5e6-96231b3b80d8
This is a sibling to rL357178 that I noticed we'd hit if we chose
an alternate transform in D59818.
%z = zext i8 %x to i32
%dec = add i32 %z, -1
%r = sext i32 %dec to i64
=>
%z2 = zext i8 %x to i64
%r = add i64 %z2, -1
https://rise4fun.com/Alive/kPP
The x86 vector diffs show a slight regression, so there's a chance
that we should limit this and the previous transform to scalars.
But given that we allowed vectors before, I'm matching that behavior
here. We should change both transforms together if that's the right
thing to do.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357254 91177308-0d34-0410-b5e6-96231b3b80d8
If scalar truncates are free, attempt to pre-truncate build_vectors source operands.
Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering.
Differential Revision: https://reviews.llvm.org/D59654
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357161 91177308-0d34-0410-b5e6-96231b3b80d8
Rework BaseIndexOffset and isAlias to fully work with lifetime nodes
and fold in lifetime alias analysis.
This is mostly NFC.
Reviewers: courbet
Reviewed By: courbet
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59794
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357070 91177308-0d34-0410-b5e6-96231b3b80d8
getAsCarry() checks that the input argument is a carry-producing node before
allowing a transformation to addcarry. This patch adds a check to make sure
that the carry-producing node is legal. If it is not, it may not remain in a
form that is manageable by the target backend. The test case caused a
compilation failure during instruction selection for this reason on SystemZ.
Patch by Ulrich Weigand.
Review: Sanjay Patel
https://reviews.llvm.org/D59822
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357052 91177308-0d34-0410-b5e6-96231b3b80d8