I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently.
This happens because we were transforming any 'setb' - even when we only wanted a single-bit result.
This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it
is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that
existing behavior in this patch.
Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files
where this transform still fires.
The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register
stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate
issue.
Differential Revision: https://reviews.llvm.org/D30611
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297586 91177308-0d34-0410-b5e6-96231b3b80d8
They are all covered by the SSE4.2 intrinsics test with SSE4.2, AVX, and AVX512 command lines.
Merge sse42.ll into the other intrinsics test. Rename sse42_64.ll to be named like other intrinsic tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295707 91177308-0d34-0410-b5e6-96231b3b80d8
They are all covered by the SSE2 intrinsics test with SSE2, AVX, and AVX512 command lines.
Also remove an unneeded lfence intrinsic test since it was already covered.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295700 91177308-0d34-0410-b5e6-96231b3b80d8
They are all covered by the SSE intrinsics test with SSE, AVX, and AVX512 command lines.
Also remove an unneeded sfence intrinsic test since it was already covered.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295699 91177308-0d34-0410-b5e6-96231b3b80d8
There are cases of AVX-512 instructions that have two possible encodings. This is the case with instructions that use vector registers with low indexes of 0 - 15 and do not use the zmm registers or the mask k registers.
The EVEX encoding prefix requires 4 bytes whereas the VEX prefix can take only up to 3 bytes. Consequently, using the VEX encoding for these instructions results in a code size reduction of ~2 bytes even though it is compiled with the AVX-512 features enabled.
Reviewers: Craig Topper, Zvi Rackoover, Elena Demikhovsky
Differential Revision: https://reviews.llvm.org/D27901
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290663 91177308-0d34-0410-b5e6-96231b3b80d8
This was accidentallly broken in r285515 when we started lowering the intrinsic to an ISD node. Should fix PR31241.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@288578 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file.
Reviewers: zvi, delena, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26660
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287083 91177308-0d34-0410-b5e6-96231b3b80d8
-Don't print the 'x' suffix for the 128-bit reg/mem VEX encoded instructions in Intel syntax. This is consistent with the EVEX versions.
-Don't print the 'y' suffix for the 256-bit reg/reg VEX encoded instructions in Intel or AT&T syntax. This is consistent with the EVEX versions.
-Allow the 'x' and 'y' suffixes to be used for the reg/mem forms when we're assembling using Intel syntax.
-Allow the 'x' and 'y' suffixes on the reg/reg EVEX encoded instructions in Intel or AT&T syntax. This is consistent with what VEX was already allowing.
This should fix at least some of PR28850.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286787 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves.
Reviewers: delena, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26330
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286344 91177308-0d34-0410-b5e6-96231b3b80d8
As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.
This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.
We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).
Reapplied with fix for PR28657 - removed intrinsic definitions (clang companion patch to be be submitted shortly).
Differential Revision: https://reviews.llvm.org/D22460
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276416 91177308-0d34-0410-b5e6-96231b3b80d8
As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.
This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.
We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).
Differential Revision: https://reviews.llvm.org/D22460
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276281 91177308-0d34-0410-b5e6-96231b3b80d8
D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and VCVTTPD2DQ truncating conversions with generic IR instead.
It turns out that the behaviour of these intrinsics is different enough from generic IR that this will cause problems, INF/NAN/out of range values are guaranteed to result in a 0x80000000 value - which plays havoc with constant folding which converts them to either zero or UNDEF. This is also an issue with the scalar implementations (which were already generic IR and what I was trying to match).
This patch changes both scalar and packed versions back to using x86-specific builtins.
It also deals with the other scalar conversion cases that are runtime rounding mode dependent and can have similar issues with constant folding.
A companion clang patch is at D22105
Differential Revision: https://reviews.llvm.org/D22106
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275981 91177308-0d34-0410-b5e6-96231b3b80d8
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.
This fixes PR28146.
The original commit tried inserting an 8bit-subreg into a GR32 (not GR32_ABCD)
which was not appreciated by fast regalloc on 32-bit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274802 91177308-0d34-0410-b5e6-96231b3b80d8
xorl + setcc is generally the preferred sequence due to the partial register
stall setcc + movzbl suffers from. As a bonus, it also encodes one byte smaller.
This fixes PR28146.
Differential Revision: http://reviews.llvm.org/D21774
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274692 91177308-0d34-0410-b5e6-96231b3b80d8
This patch allows target shuffles to be combined to single input immediate permute instructions - (V)PSHUFD/VPERMILPD/VPERMILPS - allowing more general pattern matching than what we current do and improves the likelihood of memory folding compared to existing patterns which tend to reuse the input in multiple arguments.
Further permute instructions (V)PSHUFLW/(V)PSHUFHW/(V)PERMQ/(V)PERMPD may be added in the future but its proven tricky to create tests cases for them so far. (V)PSHUFLW/(V)PSHUFHW is already handled quite well in combineTargetShuffle so it may be that removing some of that code may allow us to perform more of the combining in one place without duplication.
Differential Revision: http://reviews.llvm.org/D21148
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273999 91177308-0d34-0410-b5e6-96231b3b80d8
This patch removes the llvm intrinsics (V)CVTTPS2DQ and VCVTTPD2DQ truncation (round to zero) conversions and auto-upgrades to FP_TO_SINT calls instead.
Note: I looked at updating CVTTPD2DQ as well but this still requires a lot more work to correctly lower.
Differential Revision: http://reviews.llvm.org/D20860
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271510 91177308-0d34-0410-b5e6-96231b3b80d8
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed.
Differential Revision: http://reviews.llvm.org/D20686
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271131 91177308-0d34-0410-b5e6-96231b3b80d8
This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused.
A companion patch (D20684) removes/auto-upgrade the clang intrinsics.
Differential Revision: http://reviews.llvm.org/D20686
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270973 91177308-0d34-0410-b5e6-96231b3b80d8
This isn't the complete fix, but it handles the trivial examples of duplicate vzero* ops in PR27823:
https://llvm.org/bugs/show_bug.cgi?id=27823
...and amusingly, the bogus cases already exist as regression tests, so let's take this baby step.
We'll need to do more in the general case where there's legitimate AVX usage in the function + there's
already a vzero in the code.
Differential Revision: http://reviews.llvm.org/D20477
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270378 91177308-0d34-0410-b5e6-96231b3b80d8
For these basic tests of the intrinsic, make sure the mask can't simplify to movss, blend-with-zero or something else
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258941 91177308-0d34-0410-b5e6-96231b3b80d8
autogenerated.
Also update existing test cases which appear to be generated by it and
weren't modified (other than addition of the header) by rerunning it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253917 91177308-0d34-0410-b5e6-96231b3b80d8