As discussed in D70148 (and caused a revert of the original commit):
if we insert at the select, then we can produce invalid IR because
the replacement for the compare may have uses before the select.
Summary:
Most of IR instructions got better code size estimations after commit 47a5c36b.
So default parameters values should be updated to improve inlining and
unrolling for the target.
Reviewers: rampitec, arsenm
Reviewed By: rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70391
This implements a version of the predicateLoopExits transform from IndVarSimplify extended to exploit widenable conditions - and thus be much wider in scope of legality. The code structure ends up being almost entirely different, so I chose to duplicate this into the LoopPredication pass instead of trying to reuse the code in the IndVars.
The core notions of the transform are as follows:
If we have a widenable condition which controls entry into the loop, we're allowed to widen it arbitrarily. Given that, it's simply a *profitability* question as to what conditions to fold into the widenable branch.
To avoid pass ordering issues, we want to avoid widening cases that would otherwise be dischargeable. Or... widen in a form which can still be discharged. Thus, we phrase the transform as selecting one analyzeable exit from the set of analyzeable exits to keep. This avoids creating pass ordering complexities.
Since none of the above proves that we actually exit through our analyzeable exits - we might exit through something else entirely - we limit ourselves to cases where a) the latch is analyzeable and b) the latch is predicted taken, and c) the exit being removed is statically cold.
Differential Revision: https://reviews.llvm.org/D69830
If you're writing C code using the ACLE MVE intrinsics that passes the
result of a vcmp as input to a predicated intrinsic, e.g.
mve_pred16_t pred = vcmpeqq(v1, v2);
v_out = vaddq_m(v_inactive, v3, v4, pred);
then clang's codegen for the compare intrinsic will create calls to
`@llvm.arm.mve.pred.v2i` to convert the output of `icmp` into an
`mve_pred16_t` integer representation, and then the next intrinsic
will call `@llvm.arm.mve.pred.i2v` to convert it straight back again.
This will be visible in the generated code as a `vmrs`/`vmsr` pair
that move the predicate value pointlessly out of `p0` and back into it again.
To prevent that, I've added InstCombine rules to remove round trips of
the form `v2i(i2v(x))` and `i2v(v2i(x))`. Also I've taught InstCombine
about the known and demanded bits of those intrinsics. As a result,
you now get just the generated code you wanted:
vpt.u16 eq, q1, q2
vaddt.u16 q0, q3, q4
Reviewers: ostannard, MarkMurrayARM, dmgreen
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70313
Currently we miss folds with undef and identity values for binary ops
that do not fold to undef in general.
We can generalize the identity simplifications and do them before
checking for undef in particular.
Alive checks:
* OR - https://rise4fun.com/Alive/8OsK
* AND - https://rise4fun.com/Alive/e3tE
This will also allow us to remove some now redundant cases throughout
the function, but I would like to do this as follow-up. That should make
tracking down potential issues easier.
Reviewers: spatel, RKSimon, lebedev.ri
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D70169
Similar to/extension of D70208 (rGee0882bdf866), but this one
may finally allow closing motivating bugs.
This is another step towards having FMF apply only to FP values
rather than those + fcmp. See PR38086 for one of the original
discussions/motivations:
https://bugs.llvm.org/show_bug.cgi?id=38086
And the test here is derived from PR39535:
https://bugs.llvm.org/show_bug.cgi?id=39535
Currently, we lose FMF when converting any phi to select in
SimplifyCFG. There are a small number of similar changes needed
to correct within SimplifyCFG, so it should be quick to patch
this pass up.
FMF was extended to select and phi with:
D61917
D67564
It doesn't seem that there are any perf/param knobs that can be turned
to create selects for the FP variants of the tests, but that may not
always be true in the future. If it changes, we should propagate FMF.
Working on top of D69252, this adds canonicalisation patterns for ssub.with.overflow to ssub.sats.
Differential Revision: https://reviews.llvm.org/D69753
This adds to D69245, adding extra signed patterns for folding from a
sadd_with_overflow to a sadd_sat. These are more complex than the
unsigned patterns, as the overflow can occur in either direction.
For the add case, the positive overflow can only occur if both of the
values are positive (same for both the values being negative). So there
is an extra select on whether to use the positive or negative overflow
limit.
Differential Revision: https://reviews.llvm.org/D69252
This is another step towards having FMF apply only to FP values
rather than those + fcmp. See PR38086 for one of the original
discussions/motivations:
https://bugs.llvm.org/show_bug.cgi?id=38086
And the test here is derived from PR39535:
https://bugs.llvm.org/show_bug.cgi?id=39535
Currently, we lose FMF when converting any phi to select in
SimplifyCFG. There are a small number of similar changes needed
to correct within SimplifyCFG, so it should be quick to patch
this pass up.
FMF was extended to select and phi with:
D61917
D67564
Differential Revision: https://reviews.llvm.org/D70208
This patch introduces a function pass to inject the scalar-to-vector
mappings stored in the TargetLIbraryInfo (TLI) into the Vector
Function ABI (VFABI) variants attribute.
The test is testing the injection for three vector libraries supported
by the TLI (Accelerate, SVML, MASSV).
The pass does not change any of the analysis associated to the
function.
Differential Revision: https://reviews.llvm.org/D70107
This is a follow up of d90804d, to also flag fmcp instructions as instructions
that we do not support in tail-predicated vector loops.
Differential Revision: https://reviews.llvm.org/D70295
Summary:
When scalarizing PHI nodes we might try to examine/rewrite
InsertElement nodes in predecessors. If those predecessors
are unreachable from entry, then the IR in those blocks could
have unexpected properties resulting in infinite loops in
Scatterer::operator[].
By simply treating values originating from instructions in
unreachable blocks as undef we do not need to analyse them
further.
This fixes PR41723.
Reviewers: bjope
Reviewed By: bjope
Subscribers: bjope, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70171
This reverts commit e511c4b0dff1692c267addf17dce3cebe8f97faa:
Temporarily Revert:
"[SLP] Generalization of stores vectorization."
"[SLP] Fix -Wunused-variable. NFC"
"[SLP] Vectorize jumbled stores."
after fixing the problem with compile time.
There's a discussion about changing a shufflevector
transform in:
https://bugs.llvm.org/show_bug.cgi?id=43958
It would protect against our current undef/poison
behavior, and these are all tests that could be affected.
The vectoriser queries TTI->preferPredicateOverEpilogue to determine if
tail-folding is preferred for a loop, but it was not respecting loop hint
'predicate' that can disable this, which has now been added. This showed that
we were incorrectly initialising loop hint 'vectorize.predicate.enable' with 0
(i.e. FK_Disabled) but this should have been FK_Undefined, which has been
fixed.
Differential Revision: https://reviews.llvm.org/D70125
This is a resubmission of bbb29738b58aaf6f6518269abdcf8f64131665a9 that
was reverted due to clang tests failures. It includes the fix and
additional IR tests for the missed case.
Summary:
In case when all incoming values of a PHI are equal pointers, this
transformation inserts a definition of such a pointer right after
definition of the base pointer and replaces with this value both PHI and
all it's incoming pointers. Primary goal of this transformation is
canonicalization of this pattern in order to enable optimizations that
can't handle PHIs. Non-inbounds pointers aren't currently supported.
Reviewers: spatel, RKSimon, lebedev.ri, apilipenko
Reviewed By: apilipenko
Tags: #llvm
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D68128
Summary:
This fixes PR43081, where the transformation of `strchr(p, 0) -> p +
strlen(p)` can cause a segfault, if `-fno-builtin-strlen` is used. In
that case, `emitStrLen` returns nullptr, which CreateGEP is not designed
to handle. Also add the minimized code from the PR as a test case.
Reviewers: xbolva00, spatel, jdoerfert, efriedma
Reviewed By: efriedma
Subscribers: lebedev.ri, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70143
The bug manifests as replacing a reduction operand with an undef
value.
The problem appears to be limited to cases where a min/max reduction
has extra uses of the compare operand to the select.
In the general case, we are tracking "ExternallyUsedValues" and
an "IgnoreList" of the reduction operations, but those may not apply
to the final compare+select in a min/max reduction.
For that, we use replaceAllUsesWith (RAUW) to ensure that the new
vectorized reduction values are transferred to all subsequent users.
Differential Revision: https://reviews.llvm.org/D70148
Summary:
The option allows to disable specific target library builtin functions,
instead of -disable-simplify-libcalls, which disables all of them.
This is a prerequisite for D70143, which fixes PR43081.
Reviewers: xbolva00, spatel, jdoerfert, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70193
As noted by the FIXME comment, this is not correct based on our current FMF semantics.
We should be propagating FMF from the final value in a sequence (in this case the
'select'). So the behavior even without this patch is wrong, but we did not allow FMF
on 'select' until recently.
But if we do the correct thing right now in this patch, we'll inevitably introduce
regressions because we have not wired up FMF propagation for 'phi' and 'select' in
other passes (like SimplifyCFG) or other places in InstCombine. I'm not seeing a
better incremental way to make progress.
That said, the potential extra damage over the existing wrong behavior from this
patch is very limited. AFAIK, the only way to have different FMF on IR in the same
function is if we have LTO inlined IR from 2 modules that were compiled using
different fast-math settings.
As seen in the tests, we may actually see some improvements with this patch because
adding the FMF to the 'select' allows matching to min/max intrinsics that were
previously missed (in the common case, the 'fcmp' and 'select' should have identical
FMF to begin with).
Next steps in the transition:
Make similar changes in instcombine as needed.
Enable phi-to-select FMF propagation in SimplifyCFG.
Remove dependencies on fcmp with FMF.
Deprecate FMF on fcmp.
Differential Revision: https://reviews.llvm.org/D69720
I think we have to be a bit more careful when it comes to moving
ops across shuffles, if the op does restrict undef. For example, without
this patch, we would move 'and %v, <0, 0, -1, -1>' over a
'shufflevector %a, undef, <undef, undef, 1, 2>'. As a result, the first
2 lanes of the result are undef after the combine, but they really
should be 0, unless I am missing something.
For ops that do fold to undef on undef operands, the current behavior
should be fine. I've add conservative check OpDoesRestrictUndef, maybe
there's a better existing utility?
Reviewers: spatel, RKSimon, lebedev.ri
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D70093
This implements TTI hook 'preferPredicateOverEpilogue' for MVE. This is a
first version and it operates on single block loops only. With this change, the
vectoriser will now determine if tail-folding scalar remainder loops is
possible/desired, which is the first step to generate MVE tail-predicated
vector loops.
This is disabled by default for now. I.e,, this is depends on option
-disable-mve-tail-predication, which is off by default.
I will follow up on this soon with a patch for the vectoriser to respect loop
hint 'vectorize.predicate.enable'. I.e., with this loop hint set to Disabled,
we don't want to tail-fold and we shouldn't query this TTI hook, which is
done in D70125.
Differential Revision: https://reviews.llvm.org/D69845
This caused miscompiles of Chromium (https://crbug.com/1023818). The reduced
repro is small enough to fit here:
$ cat /tmp/a.c
unsigned char f(unsigned char *p) {
unsigned char result = 0;
for (int shift = 0; shift < 1; ++shift)
result |= p[0] << (shift * 8);
return result;
}
$ bin/clang -O2 -S -o - /tmp/a.c | grep -A4 f:
f: # @f
.cfi_startproc
# %bb.0: # %entry
xorl %eax, %eax
retq
That's nicely optimized, but I don't think it's the right result :-)
> Same as D60846 but with a fix for the problem encountered there which
> was a missing context adjustment in the handling of PHI nodes.
>
> The test that caused D60846 to be reverted was added in e15ab8f277c7.
>
> Reviewers: nikic, nlopes, mkazantsev,spatel, dlrobertson, uabelho, hakzsam
>
> Subscribers: hiraditya, bollu, llvm-commits
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D69571
This reverts commit 57dd4b03e4806bbb4760ab6150940150d884df20.
In case when all incoming values of a PHI are equal pointers, this
transformation inserts a definition of such a pointer right after
definition of the base pointer and replaces with this value both PHI and
all it's incoming pointers. Primary goal of this transformation is
canonicalization of this pattern in order to enable optimizations that
can't handle PHIs. Non-inbounds pointers aren't currently supported.
Reviewers: spatel, RKSimon, lebedev.ri, apilipenko
Reviewed By: apilipenko
Tags: #llvm
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D68128
Don't try to canonicalize loads to scalable vector types to loads
of integers.
This removes one assertion when trying to use a TypeSize as a parameter
to DataLayout::isLegalInteger. It does not handle the second part of the
function (which looks at bitcasts).
This patch also contains a NFC fix for Load Analysis, where a variable
initialization that would cause the same assertion is moved closer to
its use. This allows us to run the new test for InstCombine without
having to teach LocationSize to play nicely with scalable vectors.
Differential Revision: https://reviews.llvm.org/D70075
Currently we have limited support for outer loops with multiple basic
blocks after the inner loop exit. But the current checks for creating
PHIs for loop exit values only assumes the header and latches of the
outer loop. It is better to just skip incoming values defined in the
original inner loops. Those are handled earlier.
Reviewers: efriedma, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D70059
Summary:
This patch introduces align attribute deduction for callsite argument, function argument, function returned and floating value based on must-be-executed-context.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69797
Summary: This patch introduces a new heuristic for guiding operand reordering. The new "look-ahead" heuristic can look beyond the immediate predecessors. This helps break ties when the immediate predecessors have identical opcodes (see lit test for examples).
Reviewers: RKSimon, ABataev, dtemirbulatov, Ayal, hfinkel, rnk
Reviewed By: RKSimon, dtemirbulatov
Subscribers: xbolva00, Carrot, hiraditya, phosek, rnk, rcorcs, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60897
Summary:
This patch redefines freeze instruction from being UnaryOperator to a subclass of UnaryInstruction.
ConstantExpr freeze is removed, as discussed in the previous review.
FreezeOperator is not added because there's no ConstantExpr freeze.
`freeze i8* null` test is added to `test/Bindings/llvm-c/freeze.ll` as well, because the null pointer-related bug in `tools/llvm-c/echo.cpp` is now fixed.
InstVisitor has visitFreeze now because freeze is not unaryop anymore.
Reviewers: whitequark, deadalnix, craig.topper, jdoerfert, lebedev.ri
Reviewed By: craig.topper, lebedev.ri
Subscribers: regehr, nlopes, mehdi_amini, hiraditya, steven_wu, dexonsmith, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69932
The tail-call-kind-ness is known by the ObjCARC analysis and can be
enforced while lowering the intrinsics to calls.
This allows us to get the requested tail calls at -O0 without trying to
preserve the attributes throughout passes that change code even at -O0
,like the Always Inliner, where the ObjCOpt pass doesn't run.
Differential Revision: https://reviews.llvm.org/D69980