This patch handles back-end folding of generic patterns created by lowering the
X86 rounding intrinsics to native IR in cases where the instruction isn't a
straightforward packed values rounding operation, but a masked operation or a
scalar operation.
Differential Revision: https://reviews.llvm.org/D45203
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335037 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.
Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar
Reviewed By: spatel
Subscribers: wdng, nhaehnle
Differential Revision: https://reviews.llvm.org/D47909
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334996 91177308-0d34-0410-b5e6-96231b3b80d8
2 of these tests were clearly not doing what the comments
said they were doing.
The last test was added at rL177933 with no assertions
(presumably it used to crash). But either we don't have
that problem anymore, or this test is folded sooner,
so we don't hit the bug that was fixed by disabling late
FP constant creation. Looking at this as part of reviewing
D48289.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334977 91177308-0d34-0410-b5e6-96231b3b80d8
Some of the calls to hasSingleUseFromRoot were passing the load itself. If the load's chain result has a user this would count against that. By getting the true parent of the match and ensuring any intermediate between the match and the load have a single use we can avoid this case. isLegalToFold will take care of checking users of the load's data output.
This fixed at least fma-scalar-memfold.ll to succed without the peephole pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334908 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: wdng, nhaehnle
Differential Revision: https://reviews.llvm.org/D47954
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334862 91177308-0d34-0410-b5e6-96231b3b80d8
An earlier commit prevented folds from the peephole pass by checking for IMPLICIT_DEF. But later in the pipeline IMPLICIT_DEF just becomes and Undef flag on the input register so we need to check for that case too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334848 91177308-0d34-0410-b5e6-96231b3b80d8
We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334802 91177308-0d34-0410-b5e6-96231b3b80d8
I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions.
There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334800 91177308-0d34-0410-b5e6-96231b3b80d8
isVectorClearMaskLegal() is the TLI hook used by the generic
DAGCombiner::XformToShuffleWithZero().
We've grown to accomodate/expect this transform to shuffle
(disabling it more generally results in many regressions).
So I'm narrowly excluding the 256-bit types that clearly
are not worthwhile for AVX1.
I think in most cases we are able to recover by converting
the shuffle back into 'and' ops, but the cases in:
https://bugs.llvm.org/show_bug.cgi?id=37749
...show that there are cracks.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334759 91177308-0d34-0410-b5e6-96231b3b80d8
The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334728 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.
This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll
Reviewers: RKSimon, gbedwell, spatel
Reviewed By: spatel
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D47993
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334685 91177308-0d34-0410-b5e6-96231b3b80d8
When using clang --save-stats -mllvm -time-passes, both timers and stats
end up in the same json file.
We could end up with things like:
{
"asm-printer.EmittedInsts": 1,
"time.pass.Virtual Register Map.wall": 2.9015541076660156e-04,
"time.pass.Virtual Register Map.user": 2.0500000000000379e-04,
"time.pass.Virtual Register Map.sys": 8.5000000000001741e-05,
}
This patch makes use of the pass argument name (if available) in the
JSON key to end up with things like:
{
"asm-printer.EmittedInsts": 1,
"time.pass.virtregmap.wall": 2.9015541076660156e-04,
"time.pass.virtregmap.user": 2.0500000000000379e-04,
"time.pass.virtregmap.sys": 8.5000000000001741e-05,
}
This also helps avoiding to write another JSON printer to handle all the
cases that we could have in our pass names.
Differential Revision: https://reviews.llvm.org/D48109
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334649 91177308-0d34-0410-b5e6-96231b3b80d8
We're constant folding here, so we shouldn't check uses. This matches
the IR optimizer behavior.
The x86 test shows the expected win. The AArch64 test shows something
else. This only seems to happen if the "generic" AArch64 CPU model is
used by MachineCombiner, so I'll file a bug report to follow-up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334608 91177308-0d34-0410-b5e6-96231b3b80d8
The equivalent AArch64 test added at rL334556 isn't showing
the expected output from the DAGCombiner code change that
would fix this example. That's a machine combiner bug from
what I see.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334605 91177308-0d34-0410-b5e6-96231b3b80d8
Add a helper function to expand constrained FP operations as needed.
Note that the Strict POWI operation is not handled in this patch since
the format is slightly different from the others.
Differential Revision: https://reviews.llvm.org/D47491
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334603 91177308-0d34-0410-b5e6-96231b3b80d8
This shortcoming was noted in D47330, and the test diffs show we already
had other examples where we failed to fold to a SHRUNKBLEND:
/// Dynamic (non-constant condition) vector blend where only the sign bits
/// of the condition elements are used. This is used to enforce that the
/// condition mask is not valid for generic VSELECT optimizations.
This patch implements an idea from D48043 and would obsolete that patch
because it catches more cases (notable the AVX1 case that was missed there).
All we're doing is allowing the existing transform to fire more often by
removing the post-legalize constraint. All of the relevant feature checks
and other predicates are left as-is.
Differential Revision: https://reviews.llvm.org/D48078
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334592 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fmul.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: nhaehnle, wdng
Differential Revision: https://reviews.llvm.org/D47911
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334514 91177308-0d34-0410-b5e6-96231b3b80d8
We were missing packed isel folding patterns for all of sse41, avx, and avx512.
For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns.
Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch.
Some of this was spotted in the review for D47993.
This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334460 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This fixes most of the scheduling info for SKX vector operations.
I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM.
The before/after llvm-exegesis analysis are in the phabricator diff.
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D47721
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334407 91177308-0d34-0410-b5e6-96231b3b80d8
This patch started off much more general and ambitious, but it's been a nightmare
seeing all the ways x86 vector codegen can go wrong.
So the code is still structured to allow extending easily, but it's currently
limited in several ways:
1. Only handle cases with an extending load.
2. Only handle cases with a zero constant compare.
3. Ignore setcc with vector bitmask (SetCCWidth != 1) - so AVX512 should be unaffected.
The motivating case from PR37427:
https://bugs.llvm.org/show_bug.cgi?id=37427
...is the 1st test, and that shows the expected win - we eliminated the unnecessary
intermediate cast.
There's a clear regression in the last test (sgt_zero_fp_select) because we longer
recognize a 'SHRUNKBLEND' opportunity. I think that general problem is also present
in sgt_zero, so I'll try to fix that in a follow-up. We need to match a sign-bit
setcc from a sign-extended operand and remove it.
Differential Revision: https://reviews.llvm.org/D47330
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334378 91177308-0d34-0410-b5e6-96231b3b80d8
Extension to D46954 (PR37426), this patch adds support for v8i16/v16i16 rotations in a similar manner - the conversion of the shift/rotate amount to a multiplication factor and the use of PMULLW to shift left and PMULHUW (ISD::MULHU) to shift the wrapped bits back around to be ORd together.
Differential Revision: https://reviews.llvm.org/D47822
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334309 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fsub.
Reviewers: spatel, hfinkel, wristow, arsenm
Reviewed By: spatel
Subscribers: wdng
Differential Revision: https://reviews.llvm.org/D47910
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334306 91177308-0d34-0410-b5e6-96231b3b80d8
As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334303 91177308-0d34-0410-b5e6-96231b3b80d8
Same fix as rL334110: I noticed while working on zero-idiom + dependency-breaking support (PR36671) that most of our binary instruction schedule tests were reusing the same src registers, which would cause the tests to fail once we enable scalar zero-idiom support on btver2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334302 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we're custom lowering vector rotates for SSE in general we should be testing the combines with them as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334290 91177308-0d34-0410-b5e6-96231b3b80d8