Precursor to fixing a regression with SLP vectorizer for supporting SELECT shuffles (vs the current ALTERNATE)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334714 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed on PR33744, this patch relaxes ShuffleKind::SK_Alternate which requires shuffle masks to only match an alternating pattern from its 2 sources:
e.g. v4f32: <0,5,2,7> or <4,1,6,3>
This seems far too restrictive as most SIMD hardware which will implement it using a general blend/bit-select instruction, so replaces it with SK_Select, permitting elements from either source as long as they are inline:
e.g. v4f32: <0,5,2,7>, <4,1,6,3>, <0,1,6,7>, <4,1,2,3> etc.
This initial patch just updates the name and cost model shuffle mask analysis, later patch reviews will update SLP to better utilise this - it still limits itself to SK_Alternate style patterns.
Differential Revision: https://reviews.llvm.org/D47985
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334513 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed on D47985, identity shuffle masks should probably be free.
I've limited this to the case where the input and output types all match - but we could probably accept all cases.
Differential Revision: https://reviews.llvm.org/D47986
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334506 91177308-0d34-0410-b5e6-96231b3b80d8
Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support.
Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts.
This is a step towards adding support for vXi16 vector rotates.
Differential Revision: https://reviews.llvm.org/D47546
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334023 91177308-0d34-0410-b5e6-96231b3b80d8
This enables us to detect more fast path sdiv cases under cost analysis.
This patch also enables us to handle non-uniform-constant pow2 cases for X86 SDIV costs.
Found while working on D46276
Future patches can then extend the vectorizers to more fully support non-uniform pow2 cases.
Differential Revision: https://reviews.llvm.org/D46637
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332969 91177308-0d34-0410-b5e6-96231b3b80d8
With custom lowering for vector MULLH{S,U}, it is now profitable to
vectorize a divide by constant loop for the custom types (v16i8, v8i16,
and v4i32). The cost if based on TargetLowering::Build{S,U}DIV which
uses a multiply by constant plus adjustment to express a divide by
constant.
Both {u,s}mull{2} are expressed as Instruction::Mul and shifts by
Instruction::AShr.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331873 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds a new shuffle kind useful for transposing a 2xn matrix. These
transpose shuffle masks read corresponding even- or odd-numbered vector
elements from two n-dimensional source vectors and write each result into
consecutive elements of an n-dimensional destination vector. The transpose
shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such,
this patch also considers transpose shuffles in the AArch64 implementation of
getShuffleCost.
Differential Revision: https://reviews.llvm.org/D45982
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330941 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds a cost model test case for vector shuffles having transpose
masks. The given costs are inaccurate and will be updated in a follow-on patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330625 91177308-0d34-0410-b5e6-96231b3b80d8
We're mostly testing with generic isa attributes, but PR36550 will require testing of specific target's scheduler models as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330056 91177308-0d34-0410-b5e6-96231b3b80d8
The script allows the auto-generation of checks for cost model tests to speed up their creation and help improve coverage, which will help a lot with PR36550.
If the need arises we can add support for other analyze passes as well, but the cost models was the one I needed to get done - at the moment it just warns that any other analysis mode is unsupported.
I've regenerated a couple of x86 test files to show the effect.
Differential Revision: https://reviews.llvm.org/D45272
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329390 91177308-0d34-0410-b5e6-96231b3b80d8
Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont.
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D44644
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328451 91177308-0d34-0410-b5e6-96231b3b80d8
This patch provides an implementation of getArithmeticReductionCost for
AArch64. We can specialize the cost of add reductions since they are computed
using the 'addv' instruction.
Differential Revision: https://reviews.llvm.org/D44490
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327702 91177308-0d34-0410-b5e6-96231b3b80d8
This patch considers the experimental vector reduce intrinsics in the default
implementation of getIntrinsicInstrCost. The cost of these intrinsics is
computed with getArithmeticReductionCost and getMinMaxReductionCost. This patch
also adds a test case for AArch64 that indicates the costs we currently compute
for vector reduce intrinsics. These costs are inaccurate and will be updated in
a follow-on patch.
Differential Revision: https://reviews.llvm.org/D44489
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327698 91177308-0d34-0410-b5e6-96231b3b80d8
Agner's tables indicate that for SSE42+ targets (Core2 and later) we can reduce the FADD/FSUB/FMUL costs down to 1, which should fix the Himeno benchmark.
Note: the AVX512 FDIV costs look rather dodgy, but this isn't part of this patch.
Differential Revision: https://reviews.llvm.org/D43733
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326133 91177308-0d34-0410-b5e6-96231b3b80d8
There are too many perf regressions resulting from this, so we need to
investigate (and add tests for) targets like ARM and AArch64 before
trying to reinstate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325658 91177308-0d34-0410-b5e6-96231b3b80d8
This change was mentioned at least as far back as:
https://bugs.llvm.org/show_bug.cgi?id=26837#c26
...and I found a real program that is harmed by this:
Himeno running on AMD Jaguar gets 6% slower with SLP vectorization:
https://bugs.llvm.org/show_bug.cgi?id=36280
...but the change here appears to solve that bug only accidentally.
The div/rem costs for x86 look very wrong in some cases, but that's already true,
so we can fix those in follow-up patches. There's also evidence that more cost model
changes are needed to solve SLP problems as shown in D42981, but that's an independent
problem (though the solution may be adjusted after this change is made).
Differential Revision: https://reviews.llvm.org/D43079
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325515 91177308-0d34-0410-b5e6-96231b3b80d8