The test case included in r279125 exposed existing undefined behavior in the
SLP vectorizer that it did not introduce. This patch reapplies the original
patch, but modifies the test case to avoid hitting the undefined behavior. This
allows us to close PR28330 while keeping the UBSan bot happy. The undefined
behavior the original test uncovered will be addressed in a follow-on patch.
Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279370 91177308-0d34-0410-b5e6-96231b3b80d8
This is a partial enablement (move the ConstantInt guard down) because there are many
different folds here and one of the later ones will require reworking 'isSignBitCheck'.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279339 91177308-0d34-0410-b5e6-96231b3b80d8
The intended transform is:
// Simplify icmp eq (or (ptrtoint P), (ptrtoint Q)), 0
// -> and (icmp eq P, null), (icmp eq Q, null).
P and Q are both pointer types, but may have different types. We need
two calls to getNullValue() to make the icmps.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279271 91177308-0d34-0410-b5e6-96231b3b80d8
CGSCC use a WeakVH to track call sites. RAUW a call within a function
can result in that WeakVH getting confused about whether or not the call
site is still around.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279268 91177308-0d34-0410-b5e6-96231b3b80d8
Of course, we really need to refactor and fix all of the cmp predicates,
but this one is interesting because without it, we later perform an
information-losing transform of icmp (shl 1, Y), C, and we can't recover
the better fold.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279263 91177308-0d34-0410-b5e6-96231b3b80d8
These are implicitly included as part of larger test cases, but they don't
exist stand-alone (and don't happen for vectors...).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279257 91177308-0d34-0410-b5e6-96231b3b80d8
The new version has several advantages:
1) IMSHO it's more readable and neater
2) It handles loads and stores properly
3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.
With this change we can now finally sink load-modify-store idioms such as:
if (a)
return *b += 3;
else
return *b += 4;
=>
%z = load i32, i32* %y
%.sink = select i1 %a, i32 5, i32 7
%b = add i32 %z, %.sink
store i32 %b, i32* %y
ret i32 %b
When this works for switches it'll be even more powerful.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279229 91177308-0d34-0410-b5e6-96231b3b80d8
We abort building vectorizable trees in some cases (e.g., if the maximum
recursion depth is reached, if the region size is too large, etc.). If this
happens for a reduction, we can be left with a root entry that needs to be
gathered. For these cases, we need make sure we actually set VectorizedValue to
the resulting vector.
This patch ensures we properly set VectorizedValue, and it also ensures the
insertelement sequence generated for the gathers is inserted at the correct
location.
Reference: https://llvm.org/bugs/show_bug.cgi?id=28330
Differential Revison: https://reviews.llvm.org/D23410
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279125 91177308-0d34-0410-b5e6-96231b3b80d8
It causes a regression on our internal benchmark. Introduce cvp-dont-process flag and set it off by default while investigating the regression.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279082 91177308-0d34-0410-b5e6-96231b3b80d8
Also, add a scalar test to demonstrate one of the intermediate folds that
is necessary to accomplish the existing, multi-step test. And simplify
the vector tests to only check the final piece of that multi-step transform.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278995 91177308-0d34-0410-b5e6-96231b3b80d8
minimal and boring form than the old pass manager's version.
This pass does the very minimal amount of work necessary to inline
functions declared as always-inline. It doesn't support a wide array of
things that the legacy pass manager did support, but is alse ... about
20 lines of code. So it has that going for it. Notably things this
doesn't support:
- Array alloca merging
- To support the above, bottom-up inlining with careful history
tracking and call graph updates
- DCE of the functions that become dead after this inlining.
- Inlining through call instructions with the always_inline attribute.
Instead, it focuses on inlining functions with that attribute.
The first I've omitted because I'm hoping to just turn it off for the
primary pass manager. If that doesn't pan out, I can add it here but it
will be reasonably expensive to do so.
The second should really be handled by running global-dce after the
inliner. I don't want to re-implement the non-trivial logic necessary to
do comdat-correct DCE of functions. This means the -O0 pipeline will
have to be at least 'always-inline,global-dce', but that seems
reasonable to me. If others are seriously worried about this I'd like to
hear about it and understand why. Again, this is all solveable by
factoring that logic into a utility and calling it here, but I'd like to
wait to do that until there is a clear reason why the existing
pass-based factoring won't work.
The final point is a serious one. I can fairly easily add support for
this, but it seems both costly and a confusing construct for the use
case of the always inliner running at -O0. This attribute can of course
still impact the normal inliner easily (although I find that
a questionable re-use of the same attribute). I've started a discussion
to sort out what semantics we want here and based on that can figure out
if it makes sense ta have this complexity at O0 or not.
One other advantage of this design is that it should be quite a bit
faster due to checking for whether the function is a viable candidate
for inlining exactly once per function instead of doing it for each call
site.
Anyways, hopefully a reasonable starting point for this pass.
Differential Revision: https://reviews.llvm.org/D23299
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278896 91177308-0d34-0410-b5e6-96231b3b80d8
It is pretty easy to get it down to O(nlogn + mlogm). This
implementation has the added benefit of automatically deduplicating
entries between the two sets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278837 91177308-0d34-0410-b5e6-96231b3b80d8
I have audited all the callers of concatenate and none require duplicate
entries to service concatenation.
These duplicates serve no purpose but to needlessly embiggen the IR.
N.B. Layering getMostGenericAliasScope on top of concatenate makes it
O(nlogn + mlogm) instead of O(n*m).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278836 91177308-0d34-0410-b5e6-96231b3b80d8