Introduce LoopVectorizationPlanner.executePlan(), replacing ILV.vectorize() and
refactoring ILV.vectorizeLoop(). Method collectDeadInstructions() is moved from
ILV to LVP. These changes facilitate building VPlans and using them to generate
code, following https://reviews.llvm.org/D28975 and its tentative breakdown.
Method ILV.createEmptyLoop() is renamed ILV.createVectorizedLoopSkeleton() to
improve clarity; it's contents remain intact.
Differential Revision: https://reviews.llvm.org/D32200
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302790 91177308-0d34-0410-b5e6-96231b3b80d8
It turned out that MSan was incorrectly calculating the shadow for int comparisons: it was done by truncating the result of (Shadow1 OR Shadow2) to i1, effectively rendering all bits except LSB useless.
This approach doesn't work e.g. in the case where the values being compared are even (i.e. have the LSB of the shadow equal to zero).
Instead, if CreateShadowCast() has to cast a bigger int to i1, we replace the truncation with an ICMP to 0.
This patch doesn't affect the code generated for SPEC 2006 binaries, i.e. there's no performance impact.
For the test case reported in PR32842 MSan with the patch generates a slightly more efficient code:
orq %rcx, %rax
jne .LBB0_6
, instead of:
orl %ecx, %eax
testb $1, %al
jne .LBB0_6
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302787 91177308-0d34-0410-b5e6-96231b3b80d8
manages to form a VSELECT with a non-i1 element type condition. Those
are technically allowed in SDAG (at least, the generic type legalization
logic will form them and I wouldn't want to try to audit everything te
preclude forming them) so we need to be able to lower them.
This isn't too hard to implement. We mark VSELECT as custom so we get
a chance in C++, add a fast path for i1 conditions to get directly
handled by the patterns, and a fallback when we need to manually force
the condition to be an i1 that uses the vptestm instruction to turn
a non-mask into a mask.
This, unsurprisingly, generates awful code. But it at least doesn't
crash. This was actually impacting open source packages built with LLVM
for AVX-512 in the wild, so quickly landing a patch that at least stops
the immediate bleeding.
I think I've found where to fix the codegen quality issue, but less
confident of that change so separating it out from the thing that
doesn't change the result of any existing test case but causes mine to
not crash.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302785 91177308-0d34-0410-b5e6-96231b3b80d8
This is the same as r292827 for AArch64: we widen 8- and 16-bit ADD, SUB
and MUL to 32 bits since we only have TableGen patterns for 32 bits.
See the commit message for r292827 for more details.
At this point we could just remove some of the tests for regbankselect
and instruction-select, since we're not going to see any narrow
operations at those levels anymore. Instead I decided to update them
with G_ANYEXT/G_TRUNC operations, so we can validate the full sequences
generated by the legalizer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302782 91177308-0d34-0410-b5e6-96231b3b80d8
G_ANYEXT can be introduced by the legalizer when widening scalars. Add
support for it in the register bank info (same mapping as everything
else) and in the instruction selector.
When selecting it, we treat it as a COPY, just like G_TRUNC. On this
occasion we get rid of some assertions in selectCopy so we can reuse it.
This shouldn't be a problem at the moment since we're not supporting any
complicated cases (e.g. FPR, different register banks). We might want to
separate the paths when we do.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302778 91177308-0d34-0410-b5e6-96231b3b80d8
Turns out udivrem can write its output to the same location as one of its inputs so the extra temporary isn't needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302772 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Move getX86ConditionCode() from X86FastISel.cpp to X86InstrInfo.cpp so it can be used by GloabalIsel instruction selector.
This is a pre-commit for a patch I'm working on to support G_ICMP. NFC.
Reviewers: zvi, guyblank, delena
Reviewed By: guyblank, delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33038
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302767 91177308-0d34-0410-b5e6-96231b3b80d8
This time it actually occurred to me to change the #defines
to actually test the pre-processed out codepath. Hopefully
this time it works.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302752 91177308-0d34-0410-b5e6-96231b3b80d8
// (X ^ C1) | C2 --> (X | C2) ^ (C1&~C2)
This canonicalization was added at:
https://reviews.llvm.org/rL7264
By moving xors out/down, we can more easily combine constants. I'm adding
tests that do not change with this patch, so we can verify that those kinds
of transforms are still happening.
This is no-functional-change-intended because there's a later fold:
// (X^C)|Y -> (X|Y)^C iff Y&C == 0
...and demanded-bits appears to guarantee that any fold that would have
hit the fold we're removing here would be caught by that 2nd fold.
Similar reasoning was used in:
https://reviews.llvm.org/rL299384
The larger motivation for removing this code is that it could interfere with
the fix for PR32706:
https://bugs.llvm.org/show_bug.cgi?id=32706
Ie, we're not checking if the 'xor' is actually a 'not', so we could reverse
a 'not' optimization and cause an infinite loop by altering an 'xor X, -1'.
Differential Revision: https://reviews.llvm.org/D33050
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302733 91177308-0d34-0410-b5e6-96231b3b80d8
Flat instructions gain an immediate offset, and 2 new
sets of segment specific flat instructions are added.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302729 91177308-0d34-0410-b5e6-96231b3b80d8
r271020 added an early out to skip the signed multiply portion of ConstantRange::multiply. The comment says we don't need to do signed multiply if the range is only positive numbers, but the implemented check only ensures that the start of the range is positive. It doesn't look at the end of the range.
This patch checks the end of the range instead. Because Upper is one more than the end we have to see if its positive or if its one past the last positive number.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302717 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Allow consecutive stores whose values come from consecutive loads to
merged in the presense of other uses of the loads. Previously this was
disallowed as in general the merged load cannot be shared with the
other uses. Merging N stores into 1 may cause as many as N redundant
loads. However in the context of caching this should have neglible
affect on memory pressure and reduce instruction count making it
almost always a win.
Fixes PR32086.
Reviewers: spatel, jyknight, andreadb, hfinkel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30471
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302712 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a ubsan bot failure after r302597, which made getProfileCount
non-static, but ended up invoking it on a null ProfileSummaryInfo object
in some cases from buildModuleSummaryIndex.
Most testing passed because the non-static getProfileCount currently
doesn't access any member variables, but I found this when testing a
follow on patch (D32877) that adds a member variable access.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302705 91177308-0d34-0410-b5e6-96231b3b80d8
This lets toString take advantage of the degenerate case checks in udivrem and is just generally cleaner.
One minor downside of this is that the divisor APInt now needs to be the same size as Tmp which requires an additional allocation. But we were doing a poor job of reusing allocations before so the new code should still be an improvement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302704 91177308-0d34-0410-b5e6-96231b3b80d8
Surprisingly, I don't think these are redundant for InstSimplify.
They were just misplaced as InstCombine tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302684 91177308-0d34-0410-b5e6-96231b3b80d8
For stores, check if the stored value is defined by a floating point
instruction and if yes, we return a default mapping with FPR instead
of GPR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302679 91177308-0d34-0410-b5e6-96231b3b80d8
The new experimental reduction intrinsics can now be used, so I'm enabling this
for AArch64. We will need this for SVE anyway, so it makes sense to do this for
NEON reductions as well.
The existing code to match shufflevector patterns are replaced with a direct
lowering of the reductions to AArch64-specific nodes. Tests updated with the
new, simpler, representation.
Differential Revision: https://reviews.llvm.org/D32247
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302678 91177308-0d34-0410-b5e6-96231b3b80d8
The first test in this file is duplicated exactly in and.ll -> test33.
We have commuted and vector variants there too.
The second test is a composite of 2 folds. The first fold is tested
independently in add.ll -> flip_and_mask (including vector variant).
After that transform fires, the IR is identical to the first transform.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302676 91177308-0d34-0410-b5e6-96231b3b80d8
The script at utils/update_test_checks.py has (had?) a bug when variables
start with the same sequence of letters (clearly, not all of the time).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302674 91177308-0d34-0410-b5e6-96231b3b80d8
This adds a few missing instructions for the assembler and
disassembler. Those should be the last missing general-
purpose (Chapter 7) instructions for the z10 ISA.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302667 91177308-0d34-0410-b5e6-96231b3b80d8
This adds the remaining general arithmetic instructions
for assembler / disassembler use. Most of these are not
useful for codegen; a few might be, and those are listed
in the README.txt for future improvements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302665 91177308-0d34-0410-b5e6-96231b3b80d8
The previous code was discarding the error message from
createBinary() by calling errorToErrorCode().
This meant that such error were always reported unhelpfully
as "Invalid data was encountered while parsing the file".
Other tools such as llvm-objdump already produce a more
the error message in this case.
Differential Revision: https://reviews.llvm.org/D32985
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302664 91177308-0d34-0410-b5e6-96231b3b80d8