They are all covered by the SSE4.2 intrinsics test with SSE4.2, AVX, and AVX512 command lines.
Merge sse42.ll into the other intrinsics test. Rename sse42_64.ll to be named like other intrinsic tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295707 91177308-0d34-0410-b5e6-96231b3b80d8
The new method introduced under "-lsr-exp-narrow" option (currenlty set to true).
Summary:
The method is based on registers number mathematical expectation and should be
generally closer to optimal solution.
Please see details in comments to
"LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function
(in lib/Transforms/Scalar/LoopStrengthReduce.cpp).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D29862
From: Evgeny Stupachenko <evstupac@gmail.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295704 91177308-0d34-0410-b5e6-96231b3b80d8
They are all covered by the SSE2 intrinsics test with SSE2, AVX, and AVX512 command lines.
Also remove an unneeded lfence intrinsic test since it was already covered.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295700 91177308-0d34-0410-b5e6-96231b3b80d8
They are all covered by the SSE intrinsics test with SSE, AVX, and AVX512 command lines.
Also remove an unneeded sfence intrinsic test since it was already covered.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295699 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency.
This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do.
Reviewers: zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30181
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295697 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Currently, BranchFolder drops DebugLoc for branch instructions in some places. For example, for the test code attached, the branch instruction of 'entry' block has a DILocation of
```
!12 = !DILocation(line: 6, column: 3, scope: !11)
```
, but this information is gone when then block is lowered because BranchFolder misses it. This patch is a fix for this issue.
Reviewers: qcolombet, aprantl, craig.topper, MatzeB
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29902
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295684 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Each OperandPredicateMatcher shouldn't need to know how to generate the expression
to reference a MachineOperand. The OperandMatcher should provide it.
In addition to separating responsibilities, this also lays some groundwork for
decoupling source patterns from destination patterns to allow invented operands
or operands provided by GlobalISel's equivalent to the ComplexPattern<> class.
Depends on D29709
Reviewers: t.p.northover, ab, rovka, qcolombet, aditya_nandakumar
Reviewed By: ab
Subscribers: dberris, kristof.beyls, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D29710
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295668 91177308-0d34-0410-b5e6-96231b3b80d8
Its more profitable to go through memory (1 cycles throughput)
than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index.
IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer)
For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles.
Removing the VINSERT node, we don't need it any more.
Differential Revision: https://reviews.llvm.org/D29690
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295660 91177308-0d34-0410-b5e6-96231b3b80d8
Replaces existing approach that could only search BUILD_VECTOR nodes.
Requires getTargetConstantBitsFromNode to discriminate cases with all/partial UNDEF bits in each element - this should also be useful when we get around to supporting getTargetShuffleMaskIndices with UNDEF elements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295613 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed on D27692, this permits another domain to be used to combine a shuffle at high depths.
We currently set the required depth at 4 or more combined shuffles, this is probably too high for most targets but is a good starting point and already helps avoid a number of costly variable shuffles.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295608 91177308-0d34-0410-b5e6-96231b3b80d8
The instructions are marked commutable, but without special handling we don't get the immediate correct.
While here also remove the masked memory forms that aren't commutable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295602 91177308-0d34-0410-b5e6-96231b3b80d8