1897 Commits

Author SHA1 Message Date
Graham Hunter
95bfb1902d [LV][AArch64] Allow (limited) interleaving for scalable vectors
This patch uses the (de)interleaving intrinsics introduced in
D141924 to handle vectorization of interleaving groups with a
factor of 2 for scalable vectors.

Reviewed By: fhahn, reames

Differential Revision: https://reviews.llvm.org/D145163
2023-06-09 11:42:10 +01:00
Nikita Popov
143ed21b26 Revert "[LCSSA] Remove unused ScalarEvolution argument (NFC)"
This reverts commit 5362a0d859d8e96b3f7c0437b7866e17a818a4f7.

In preparation for reverting a dependent revision.
2023-06-05 16:45:38 +02:00
Florian Hahn
e19297471a
[LV] Check if value was already not uniform for previous VF.
If the value was already known to not be uniform for the previous
(smaller VF), it cannot be uniform for the larger VF.

This slightly reduces compile-time, once uniformity checks are becoming
a bit more expensive due to using SCEV rewriting (D148841).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D151658
2023-06-04 20:31:01 +01:00
Florian Hahn
e48b1e87a3
[LV] Split off invariance check from isUniform (NFCI).
After 572cfa3fde5433, isUniform now checks VF based uniformity instead of
just invariance as before.

As follow-up cleanup suggested in D148841, separate the invariance check
out and update callers that currently check only for invariance.

This also moves the implementation of isUniform from LoopAccessAnalysis
to LoopVectorizationLegality, as LoopAccesAnalysis doesn't use the more
general isUniform.
2023-06-01 19:09:11 +01:00
Florian Hahn
572cfa3fde
[LV] Use SCEV for uniformity analysis across VF
This patch uses SCEV to check if a value is uniform across a given VF.

The basic idea is to construct SCEVs where the AddRecs of the loop are
adjusted to reflect the version in the vectorized loop (Step multiplied
by VF). We construct a SCEV for the value of the vector lane 0
(offset 0) compare it to the expressions for lanes 1 to the last vector
lane (VF - 1). If they are equal, consider the expression uniform.

While re-writing expressions, we also need to catch expressions we
cannot determine uniformity (e.g. SCEVUnknown).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D148841
2023-05-31 16:01:00 +01:00
Florian Hahn
8098f2577e
[LV] Use Legal::isUniform to detect uniform pointers.
Update collectLoopUniforms to identify uniform pointers using
Legal::isUniform. This is more powerful and  brings pointer
classification here in sync with setCostBasedWideningDecision
which uses isUniformMemOp. The existing mis-match in reasoning
can causes crashes due to D134460, which is fixed by this patch.

Fixes https://github.com/llvm/llvm-project/issues/60831.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D150991
2023-05-30 16:42:55 +01:00
Florian Hahn
b750862107
[LV] Use early exit for stores storing the ptr operand. (NFC)
Cleanup suggested in D150991.
2023-05-30 12:14:12 +01:00
Alexander Timofeev
bad4de1ae7 Don't disable loop unroll for vectorized loops on AMDGPU target
We've got a performance regression after the https://reviews.llvm.org/D115261.
Despite the loop being vectorized unroll is still required.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D149281
2023-05-25 22:54:41 +02:00
Craig Topper
6006d43e2d LLVM_FALLTHROUGH => [[fallthrough]]. NFC
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D150996
2023-05-24 12:40:10 -07:00
Florian Hahn
55903151a2
[VPlan] Use isUniformAfterVec in VPReplicateRecipe::execute.
I was unable to find a case where this actually changes generated code,
but it enables the bug fix in D144434. It also brings codegen in line
with the handling of stores to uniform addresses in the cost model
(D134460).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D144491
2023-05-19 18:15:21 +01:00
Florian Hahn
701f7230cd
[VPlan] Use VPRecipeWithIRFlags for VPReplicateRecipe, retire poison map
Update VPReplicateRecipe to use VPRecipeWithIRFlags for IR flag
handling. Retire separate MayGeneratePoisonRecipes map.

Depends on D149082.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D150027
2023-05-15 11:49:20 +01:00
Florian Hahn
f40a7901d1
[LV] Move selecting vectorization factor logic to LVP (NFC).
Split off from D143938. This moves the planning logic to select the
vectorization factor to LoopVectorizationPlanner as a step towards only
computing costs for individual VFs in LoopVectorizationCostModel and do
planning in LVP.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D150197
2023-05-13 12:28:14 +01:00
Florian Hahn
7472f1da96
[VPlan] Change LoopVectorizationPlanner::TTI to be const reference (NFC) 2023-05-13 12:27:57 +01:00
Florian Hahn
0418d0242b
[LV] Move getVScaleForTuning out of LoopVectorizationCostModel (NFC).
Split off refactoring from D150197 to reduce diff.
2023-05-13 10:17:13 +01:00
Philip Reames
592199c8fe [LV] Use interface routines instead of internal variables
This makes a (possible) change to the internal representation easier in the future, and makes the code easier to read now.
2023-05-12 16:27:12 -07:00
Florian Hahn
bf279a0f8e
[VPlan] Remove dangling comment and newlines (NFC).
Apply missed cleanups.
2023-05-11 22:06:56 +01:00
Florian Hahn
3d4eed0133
[LV] Reuse SCEV expansion results for epilogue vectorization.
When generating code for the epilogue vector loop, we need to re-use the
expansion results for induction steps generated for the main vector
loop, as the pre-header of the epilogue vector loop may not dominate the
vector preheader of the epilogue.

This fixes a reported crash. Note that this is a workaround which should
be removed soon once induction resume value creation is handled in VPlan
directly.
2023-05-11 22:00:07 +01:00
Philip Reames
7fbfcc653f [LV/LAA] Use PSE to identify stride multiplies which simplify [mostly nfc]
LV/LAA will speculate that (some) strided access patterns have unit stride, and insert runtime checks if required.

LV cost models a multiply by such a stride as free.  We did this by keeping around the StrideSet structure, just to check if one of the operands were one of the strides we speculated.

We can instead just ask PredicatedScalarEvolution if either of the operands are one (after predicates are applied).  We get mostly the same result - PSE can prove it in more cases in theory - and simpler code.
2023-05-11 11:16:04 -07:00
Florian Hahn
236a0e82df
[LV] Use VPValue to get expanded value for SCEV step expressions.
Update skeleton creation logic to use SCEV expansion results from
expanding the pre-header. This avoids another set of SCEV expansions
that may happen after the CFG has been modified.

Fixes #58811.

Depends on D147964.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147965
2023-05-11 16:49:19 +01:00
Florian Hahn
127b00b25c
[VPlan] Record IR flags on VPWidenRecipe directly (NFC).
This patch introduces a VPRecipeWithIRFlags class to record various IR
flags for a recipe. This allows de-coupling of IR flags from the
underlying instructions. The main benefit is that it allows dropping of
IR flags from recipes directly, without the need to go through
State::MayGeneratePoisonRecipes. The plan is to remove
MayGeneratePoisonRecipes once all relevant recipes are transitioned.

It also allows dropping IR flags during VPlan-to-VPlan transforms, which
will be used in a follow-up patch to implement truncateToMinimalBitwidths
as VPlan-to-VPlan transform.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149079
2023-05-08 17:28:50 +01:00
Florian Hahn
823d35fd3b
[VPlan] Use RecipeBuilder to look up member when fixing IG (NFC).
Recipes for interleave group members are recorded directly in the
RecipeBuilder. Use it directly instead of going indirectly through
VPlan's Value->VPValue mapping.
2023-05-07 18:02:27 +01:00
Florian Hahn
e3afe0b89d
[VPlan] Add VPWidenCastRecipe, split off from VPWidenRecipe (NFCI).
To generate cast instructions, the result type is needed. To allow
creating widened casts without underlying instruction, introduce a new
VPWidenCastRecipe that also holds the result type.

This functionality will be used in a follow-up patch to
implement truncateToMinimalBitwidths as VPlan-to-VPlan transform.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D149081
2023-05-05 13:20:16 +01:00
Florian Hahn
b85a402dd8
[VPlan] Introduce new entry block to VPlan for early SCEV expansion.
This patch adds a new preheader block the VPlan to place SCEV expansions
expansions like the trip count. This preheader block is disconnected
at the moment, as the bypass blocks of the skeleton are not yet modeled
in VPlan.

The preheader block is executed before skeleton creation, so the SCEV
expansion results can be used during skeleton creation. At the moment,
the trip count expression and induction steps are expanded in the new
preheader. The remainder of SCEV expansions will be moved gradually in
the future.

D147965 will update skeleton creation to use the steps expanded in the
pre-header to fix #58811.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147964
2023-05-04 14:00:13 +01:00
Florian Hahn
79692750d2
[LV] Use VPValue for SCEV expansion in fixupIVUsers.
The step is already expanded in the VPlan. Use this expansion instead.
This is a step towards modeling fixing up IV users in VPlan.

 It also fixes a crash casued by SCEV-expanding the Step expression in
fixupIVUsers, where the IR is in an incomplete state

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147963
2023-05-04 09:25:59 +01:00
Nikita Popov
5362a0d859 [LCSSA] Remove unused ScalarEvolution argument (NFC)
After D149435, LCSSA formation no longer needs access to
ScalarEvolution, so remove the argument from the utilities.
2023-05-02 12:17:05 +02:00
Florian Hahn
6303fa369c
[VPlan] Remove DeadInsts arg from VPInstructionsToVPRecipes (NFC)
The argument isn't used. VPlan-based dead recipe removal can be used
instead.
2023-05-01 15:03:29 +01:00
Florian Hahn
0b24436591
[LV] Clarify comment for selectVectorizationFactor (NFC).
The comment is stale, as UserVF is handled before selectVectorizationFactor
is called. Clarify the comment by remove the mention of UserVF.

Suggested as independent improvement in D143938.
2023-04-30 21:12:15 +01:00
Florian Hahn
a431402fd2
[LV] Remove loop arg from CM::isCandidateForEpilogueVectorization (NFC)
LVP operates on the loop it stores in TheLoop. Use it instead of the
argument, to be in line with other member functions.

Suggested as independent improvement in D143938.
2023-04-30 21:11:12 +01:00
Florian Hahn
6fa07a87ab
[LV] Document selectEpilogueVectorizationFactor (NFC).
Add missing documentation for selectEpilogueVectorizationFactor.

Suggested as independent improvement in D143938.
2023-04-30 21:09:24 +01:00
Florian Hahn
8d3ff24e11
[LV] Sink collect* calls to LVP::plan() (NFC).
Move calls of collect* helpers closer to where the cost-model is used.
Should help simplifying D142669 & D142670.

Differential Revision: https://reviews.llvm.org/D142674
2023-04-30 11:41:22 +01:00
Florian Hahn
4583d7ef7c
[LV] Rename Preheader -> VecPreheader (NFC).
Clarify variable name as suggested in D147964 to reduce diff.
2023-04-28 22:15:47 +01:00
Nikita Popov
1745341296 [LoopVectorize] Preserve SCEV
As far as I can tell, LoopVectorize preserves SCEV, mainly by dint
of forgetting the loop being vectorized. We should mark it as
preserved in the pass manager.

This is a very small compile-time improvement.

Differential Revision: https://reviews.llvm.org/D149147
2023-04-26 09:43:54 +02:00
Philip Reames
09d879d060 [SCEV] Common code for computing trip count in a fixed type [NFC-ish]
This is a follow on to D147117 and D147355. In both cases, we were adding special cases to compute zext(BTC+1) instead of zext(BTC)+1 when the BTC+1 computation was known not to overflow.

Differential Revision: https://reviews.llvm.org/D148661
2023-04-25 12:04:42 -07:00
David Green
1869a9c225 [LV] Use the known trip count when costing non-tail folded VFs
Now that we store the ScalarCost in the VectorizationFactor it is possible to
use it to get a slightly more accurate cost in isMoreProfitable between two
vector factors. This extends the logic added in D101726 to non-tail-folded
cases, using the costs of `VecCost * (TripCount / VF) + ScalarCost * (TripCount % VF)`
to compare VFs where the TripCount is known and we are not folding the tail.

This shouldn't alter very much as small trip counts are usually not vectorized,
but does seem to help in the testcase where 4 * VF4 is chosen as profitable
compared to 2 * VF8 + 4 * scalar.

Differential Revision: https://reviews.llvm.org/D147720
2023-04-24 22:02:30 +01:00
Florian Hahn
3157f03a34
[VPlan] Add VPValue::isLiveIn() (NFC).
This helps to clarify checks in multiple places.

Suggested as cleanup in D147892.
2023-04-24 17:51:12 +01:00
Florian Hahn
6f999769b9
[VPlan] Remove unnecessary includes from VPlan.h (NFC).
Clean up some unnecessary includes from VPlan.h, which is imported in
multiple files.
2023-04-24 16:10:46 +01:00
Florian Hahn
6b8d19d2b5
Recommit "[VPlan] Switch to checking sinking legality for recurrences in VPlan."
This reverts the revert commit 3d8ed8b5192a59104bfbd5bf7ac84d035ee0a4a5.

The new version of the patch adds a set to avoid duplicating work in
isFixedOrderRecurrence, which was previously done through the removed
SinkAfter map.

Original commit message:
    Building on D142885 and D142589, retire the SinkAfter map from the
    recurrence handling code. It is replaced by checking whether it is
    possible to sink all users of a recurrence directly in VPlan. This
    results in simpler code overall and allows to handle additional cases
    (see the improvements in @test_crash).

    Depends on D142885.
    Depends on D142589.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D142886
2023-04-20 09:31:16 +01:00
Florian Hahn
ff0ec4f42e
Recommit "[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI)."
This reverts the revert commit 8c2276f89887d0a27298a1bbbd2181fa54bbb509.

The updated patch re-orders the getDefiningRecipe check in getVPalue to avoid
a use-after-free.

Original commit message:

    Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
    1) Value2VPValue and 2) VPExternalDefs.

    This duplication is unnecessary and there are already cases where
    external defs are added to Value2VPValue. This patch replaces all uses
    of VPExternalDefs with Value2VPValue.

    It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
    and addVPValue (to addExternalVPValue).

    At the moment, this is NFC, but will enable additional simplifications
    in D147783.

    Depends on D147891.

    Reviewed By: Ayal

    Differential Revision: https://reviews.llvm.org/D147892
2023-04-18 10:29:31 +01:00
Vitaly Buka
8c2276f898 Revert "[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI)."
Asan detects heap-use-after-free, see D147892.

This reverts commit 4fc190351e5af901b6107d162d07e1fbca90934f.
This reverts commit 668045eb77628be13e448ffbb855473ffca1cc43.
2023-04-17 17:24:10 -07:00
Manoj Gupta
3d8ed8b519 Revert "[VPlan] Switch to checking sinking legality for recurrences in VPlan."
This reverts commit 7fc0b3049df532fce726d1ff6869a9f6e3183780.

Causes a clang hang when building xz utils, github issue #62187.
2023-04-17 12:19:36 -07:00
Shraiysh Vaishay
7021182d6b [nfc][llvm] Replace pointer cast functions in PointerUnion by llvm casting functions.
This patch replaces the uses of PointerUnion.is function by llvm::isa,
PointerUnion.get function by llvm::cast, and PointerUnion.dyn_cast by
llvm::dyn_cast_if_present. This is according to the FIXME in
the definition of the class PointerUnion.

This patch does not remove them as they are being used in other
subprojects.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D148449
2023-04-17 13:40:51 -05:00
Florian Hahn
4fc190351e
[VPlan] Remove uneeded NeedsVectorIV from VPWidenIntOrFpInduction.
After recent improvements, all instances of
VPWidenIntOrFpInductionRecipe should needs a vector IV and there's no
need for a separate field.
2023-04-17 13:38:00 +01:00
Bjorn Pettersson
a20f7efbc5 Remove several no longer needed includes. NFCI
Mostly removing includes of InitializePasses.h and Pass.h in
passes that no longer has support for the legacy PM.
2023-04-17 13:54:19 +02:00
David Sherwood
69ee653313 [LoopVectorize] Take vscale into account when deciding to create epilogues
In LoopVectorizationCostModel::isEpilogueVectorizationProfitable we
check to see if the chosen main vector loop VF >= 16. If so, we
decide to create a vector epilogue loop. However, this doesn't
take VScaleForTuning into account because we could be targeting a
CPU where vscale > 1, and hence the runtime VF would be a multiple
of the known minimum value.

This patch multiplies scalable VFs by VScaleForTuning and several
tests have been updated that now produce vector epilogues.

Differential Revision: https://reviews.llvm.org/D147522
2023-04-17 10:49:40 +00:00
Florian Hahn
83ab5708d1
[LV] Don't sink scalar instructions that may read from memory.
The current sinking code doesn't prevent us from sinking a load past an
aliasing store. Skip sinking instructions that may read from memory to
avoid a mis-compile.

See @minimal_bit_widths_with_aliasing_store for an example where 2 loads
are sunk past aliasing stores before this fix.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147259
2023-04-17 09:30:25 +01:00
Kazu Hirata
c83c4b58d1 [Transforms] Apply fixes from performance-for-range-copy (NFC) 2023-04-16 08:25:28 -07:00
Florian Hahn
668045eb77
[VPlan] Unify Value2VPValue and VPExternalDefs maps (NFCI).
Before this patch, a VPlan contained 2 mappings for Values -> VPValue:
1) Value2VPValue and 2) VPExternalDefs.

This duplication is unnecessary and there are already cases where
external defs are added to Value2VPValue. This patch replaces all uses
of VPExternalDefs with Value2VPValue.

It clarifies the naming of getOrAddVPValue (to getOrAddExternalVPValue)
and addVPValue (to addExternalVPValue).

At the moment, this is NFC, but will enable additional simplifications
in D147783.

Depends on D147891.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D147892
2023-04-16 15:38:31 +01:00
Florian Hahn
7fc0b3049d
[VPlan] Switch to checking sinking legality for recurrences in VPlan.
Building on D142885 and D142589, retire the SinkAfter map from the
recurrence handling code. It is replaced by checking whether it is
possible to sink all users of a recurrence directly in VPlan. This
results in simpler code overall and allows to handle additional cases
(see the improvements in @test_crash).

Depends on D142885.
Depends on D142589.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D142886
2023-04-13 22:00:52 +01:00
Craig Topper
4b47d875a1 [LV] Optimize trip count SCEV.
To calculate the trip count we need to add 1 to the backedge
taken count. If we need to widen the backedge count, it's better
to do the add before the widening if we can guarantee it won't
overflow.

The code here is based on similar code I found in
LoopIdiomRecognize.

This is the vectorizer version of this InstCombine patch D142783.
Looking at the IR diffs, this does look like it gets more cases
than the InstCombine patch.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D147355
2023-04-12 16:17:58 -07:00
Florian Hahn
68afaa3f48
[LV] Use std::make_optional to fix build failure after 082a0046.
Some compilers require std::make_optional(std::move()) to force construction
of the std::optional return value. This should fix the build failure in
  https://lab.llvm.org/buildbot#builders/67/builds/10991
2023-04-11 17:56:15 +01:00