parts of the AA interface out of the base class of every single AA
result object.
Because this logic reformulates the query in terms of some other aspect
of the API, it would easily cause O(n^2) query patterns in alias
analysis. These could in turn be magnified further based on the number
of call arguments, and then further based on the number of AA queries
made for a particular call. This ended up causing problems for Rust that
were actually noticable enough to get a bug (PR26564) and probably other
places as well.
When originally re-working the AA infrastructure, the desire was to
regularize the pattern of refinement without losing any generality.
While I think it was successful, that is clearly proving to be too
costly. And the cost is needless: we gain no actual improvement for this
generality of making a direct query to tbaa actually be able to
re-use some other alias analysis's refinement logic for one of the other
APIs, or some such. In short, this is entirely wasted work.
To the extent possible, delegation to other API surfaces should be done
at the aggregation layer so that we can avoid re-walking the
aggregation. In fact, this significantly simplifies the logic as we no
longer need to smuggle the aggregation layer into each alias analysis
(or the TargetLibraryInfo into each alias analysis just so we can form
argument memory locations!).
However, we also have some delegation logic inside of BasicAA and some
of it even makes sense. When the delegation logic is baking in specific
knowledge of aliasing properties of the LLVM IR, as opposed to simply
reformulating the query to utilize a different alias analysis interface
entry point, it makes a lot of sense to restrict that logic to
a different layer such as BasicAA. So one aspect of the delegation that
was in every AA base class is that when we don't have operand bundles,
we re-use function AA results as a fallback for callsite alias results.
This relies on the IR properties of calls and functions w.r.t. aliasing,
and so seems a better fit to BasicAA. I've lifted the logic up to that
point where it seems to be a natural fit. This still does a bit of
redundant work (we query function attributes twice, once via the
callsite and once via the function AA query) but it is *exactly* twice
here, no more.
The end result is that all of the delegation logic is hoisted out of the
base class and into either the aggregation layer when it is a pure
retargeting to a different API surface, or into BasicAA when it relies
on the IR's aliasing properties. This should fix the quadratic query
pattern reported in PR26564, although I don't have a stand-alone test
case to reproduce it.
It also seems general goodness. Now the numerous AAs that don't need
target library info don't carry it around and depend on it. I think
I can even rip out the general access to the aggregation layer and only
expose that in BasicAA as it is the only place where we re-query in that
manner.
However, this is a non-trivial change to the AA infrastructure so I want
to get some additional eyes on this before it lands. Sadly, it can't
wait long because we should really cherry pick this into 3.8 if we're
going to go this route.
Differential Revision: http://reviews.llvm.org/D17329
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262490 91177308-0d34-0410-b5e6-96231b3b80d8
As noted in the code comment, I don't think we can do the same transform that we do for
*scalar* integers comparisons to *vector* integers comparisons because it might pessimize
the general case.
Exhibit A for an incomplete integer comparison ISA remains x86 SSE/AVX: it only has EQ and GT
for integer vectors.
But we should now recognize all the variants of this construct and produce the optimal code
for the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262424 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: SampleProfile pass needs to be performed after InstructionCombiningPass, which helps eliminate un-inlinable function calls.
Reviewers: davidxl, dnovillo
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17742
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262419 91177308-0d34-0410-b5e6-96231b3b80d8
Most portions of InstCombine properly propagate fast math flags, but
apparently the vector scalarization section was overlooked.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262376 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This adds the beginning of an update API to preserve MemorySSA. In particular,
this patch adds a way to remove memory SSA accesses when instructions are
deleted.
It also adds relevant unit testing infrastructure for MemorySSA's API.
(There is an actual user of this API, i will make that diff dependent on this one. In practice, a ton of opt passes remove memory instructions, so it's hopefully an obviously useful API :P)
Reviewers: hfinkel, reames, george.burgess.iv
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17157
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262362 91177308-0d34-0410-b5e6-96231b3b80d8
This patch fixes calculating correct value for builtin_object_size function
when pointer is used only in builtin_object_size function call and never
after that.
Patch by Strahinja Petrovic.
Differential Revision: http://reviews.llvm.org/D17337
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262337 91177308-0d34-0410-b5e6-96231b3b80d8
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392http://reviews.llvm.org/rL260145
is that customers using the AVX intrinsics in C will benefit from combines when
the load mask is constant:
__m128 mload_zeros(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(0));
}
__m128 mload_fakeones(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(1));
}
__m128 mload_ones(float *f) {
return _mm_maskload_ps(f, _mm_set1_epi32(0x80000000));
}
__m128 mload_oneset(float *f) {
return _mm_maskload_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0));
}
...so none of the above will actually generate a masked load for optimized code.
This is the masked load counterpart to:
http://reviews.llvm.org/rL262064
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262269 91177308-0d34-0410-b5e6-96231b3b80d8
We can actually have dependences between accesses with different
underlying types. Bail in this case.
A test will follow shortly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262267 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
I re-benchmarked this and results are similar to original results in
D13259:
On ARM64:
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -59.27%
SingleSource/Benchmarks/Polybench/stencils/adi -19.78%
On x86:
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog -27.14%
And of course the original ~20% gain on SPECint_2006/456.hmmer with Loop
Distribution.
In terms of compile time, there is ~5% increase on both
SingleSource/Benchmarks/Misc/oourafft and
SingleSource/Benchmarks/Linkpack/linkpack-pc. These are both very tiny
loop-intensive programs where SCEV computations dominates compile time.
The reason that time spent in SCEV increases has to do with the design
of the old pass manager. If a transform pass does not preserve an
analysis we *invalidate* the analysis even if there was *no*
modification made by the transform pass.
This means that currently we don't take advantage of LLE and LV sharing
the same analysis (LAA) and unfortunately we recompute LAA *and* SCEV
for LLE.
(There should be a way to work around this limitation in the case of
SCEV and LAA since both compute things on demand and internally cache
their result. Thus we could pretend that transform passes preserve
these analyses and manually invalidate them upon actual modification.
On the other hand the new pass manager is supposed to solve so I am not
sure if this is worthwhile.)
Reviewers: hfinkel, dberlin
Subscribers: dberlin, reames, mssimpso, aemerson, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16300
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262250 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The PS4 linker seems to handle this fine.
Hi David, it seems that indeed most ELF linkers support
__{start,stop}_SECNAME, as our proprietary linker does as well.
This follows the pattern of r250679 w.r.t. the testing.
Maggie, Phillip, Paul: I've tested this with the PS4 SDK 3.5 toolchain
prerelease and it seems to work fine.
Reviewers: davidxl
Subscribers: probinson, phillip.power, MaggieYi
Differential Revision: http://reviews.llvm.org/D17672
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262112 91177308-0d34-0410-b5e6-96231b3b80d8
merged into a loop that was subsequently unrolled (or otherwise nuked).
In this case it can't merge in the ASTs for any remaining nested loops,
it needs to re-add their instructions dircetly.
The fix is very isolated, but I've pulled the code for merging blocks
into the AST into a single place in the process. The only behavior
change is in the case which would have crashed before.
This fixes a crash reported by Mikael Holmen on the list after r261316
restored much of the loop pass pipelining and allowed us to actually do
this kind of nested transformation sequenc. I've taken that test case
and further reduced it into the somewhat twisty maze of loops in the
included test case. This does in fact trigger the bug even in this
reduced form.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262108 91177308-0d34-0410-b5e6-96231b3b80d8
The intended effect of this patch in conjunction with:
http://reviews.llvm.org/rL259392http://reviews.llvm.org/rL260145
is that customers using the AVX intrinsics in C will benefit from combines when
the store mask is constant:
void mstore_zero_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(0), v);
}
void mstore_fake_ones_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(1), v);
}
void mstore_ones_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set1_epi32(0x80000000), v);
}
void mstore_one_set_elt_mask(float *f, __m128 v) {
_mm_maskstore_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0), v);
}
...so none of the above will actually generate a masked store for optimized code.
Differential Revision: http://reviews.llvm.org/D17485
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262064 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Check that we're using SCEV for the same loop we're simulating. Otherwise, we might try to use the iteration number of the current loop in SCEV expressions for inner/outer loops IVs, which is clearly incorrect.
Reviewers: chandlerc, hfinkel
Subscribers: sanjoy, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17632
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261958 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is the first simple attempt to reduce number of coverage-
instrumented blocks.
If a basic block dominates all its successors, then its coverage
information is useless to us. Ingore such blocks if
santizer-coverage-prune-tree option is set.
Differential Revision: http://reviews.llvm.org/D17626
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261949 91177308-0d34-0410-b5e6-96231b3b80d8
The cleanupret instruction has an invariant that it's 'from' operand be
a cleanuppad. This invariant was violated when we removed a dead block
which removed a cleanuppad leaving behind a cleanupret with an undef
'from' operand.
This was solved in r261731 by staving off the removal of the dead block
to a later pass.
However, it occured to me that we do not need to do this.
Instead, we can simply avoid processing the cleanupret if it has an
undef 'from' operand because we know that it will be removed soon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261754 91177308-0d34-0410-b5e6-96231b3b80d8
This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated.
Reviewed By: hfinkel
Differential Revision: http://reviews.llvm.org/D16180
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261736 91177308-0d34-0410-b5e6-96231b3b80d8
DeleteDeadBlock was called indiscriminately, leading to cleanuprets with
undef cleanuppad references.
Instead, try to drain the BB of most of it's instructions if it is
unreachable. We can then remove the BB if it solely consists of a
terminator (and maybe some phis).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261731 91177308-0d34-0410-b5e6-96231b3b80d8
Note: The 'and' case in foldCastedBitwiseLogic() is inheriting one extra
check from the nearly identical 'or' case:
if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))
But I'm not sure how to expose that difference in a regression test.
Without that check, the 'or' path will infinite loop on:
test/Transforms/InstCombine/zext-or-icmp.ll
because the zext-or-icmp fold is attempting a reverse transform.
The refactoring should extend to the 'xor' case next to solve part of
PR26702.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261707 91177308-0d34-0410-b5e6-96231b3b80d8
It is problematic if the inlinee has a cleanupret which unwinds to
caller and we inline it into a call site which doesn't unwind.
If the funclet unwinds anywhere other than to the caller,
then we will give the funclet two unwind destinations.
This will result in a verifier failure.
Seeing as how the caller wasn't an invoke (which would locally unwind)
and that the funclet cannot unwind to caller, we must conclude that an
'unwind to caller' cleanupret is dynamically unreachable.
This fixes PR26698.
Differential Revision: http://reviews.llvm.org/D17536
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261656 91177308-0d34-0410-b5e6-96231b3b80d8
This is a straight cut and paste of the existing code and is intended to
be the first step in solving part of PR26702:
https://llvm.org/bugs/show_bug.cgi?id=26702
We should be able to reuse most of this and delete the nearly identical
existing code in visitOr(). Then, we can enhance visitXor() to use the
same code too.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261649 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When we completely unroll a loop, it's pretty easy to update DT in-place and
thus avoid rebuilding it. DT recalculation is one of the most time-consuming
tasks in loop-unroll, so avoiding it at least in case of full unroll should be
beneficial.
On some extreme (but still real-world) tests this patch improves compile time by
~2x.
Reviewers: escha, jmolloy, hfinkel, sanjoy, chandlerc
Subscribers: joker.eph, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D17473
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261595 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Since this is an IR pass it's nice to be able to write tests without
llc. This is the counterpart of the llc test under
CodeGen/PowerPC/loop-data-prefetch.ll.
Reviewers: hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17464
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261578 91177308-0d34-0410-b5e6-96231b3b80d8
The issue was that we only required LCSSA rebuilding if the immediate
parent-loop had values used outside of it. The fix is to enaable the
same logic for all outer loops, not only immediate parent.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261575 91177308-0d34-0410-b5e6-96231b3b80d8
This flag was part of a migration to a new means of handling vectors-of-points which was described in the llvm-dev thread "FYI: Relocating vector of pointers". The old code path has been off by default for a while without complaints, so time to cleanup.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261569 91177308-0d34-0410-b5e6-96231b3b80d8
This change reverts "246133 [RewriteStatepointsForGC] Reduce the number of new instructions for base pointers" and a follow on bugfix 12575.
As pointed out in pr25846, this code suffers from a memory corruption bug. Since I'm (empirically) not going to get back to this any time soon, simply reverting the problematic change is the right answer.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261565 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Previously we had a notion of convergent functions but not of convergent
calls. This is insufficient to correctly analyze calls where the target
is unknown, e.g. indirect calls.
Now a call is convergent if it targets a known-convergent function, or
if it's explicitly marked as convergent. As usual, we can remove
convergent where we can prove that no convergent operations are
performed in the call.
Reviewers: chandlerc, jingyue
Subscribers: hfinkel, jhen, tra, llvm-commits
Differential Revision: http://reviews.llvm.org/D17317
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261544 91177308-0d34-0410-b5e6-96231b3b80d8
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.
Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators. (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)
There should be NFC here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261498 91177308-0d34-0410-b5e6-96231b3b80d8
Stop relying on `getNodePtrUnchecked()` being useful on invalid
iterators. This function is documented to be for internal use only, and
the pointer type will eventually have to change to remove UB from
ilist_iterator. Instead, check the iterator before it has been
invalidated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261497 91177308-0d34-0410-b5e6-96231b3b80d8
This is inspired by PR24804 -- had this assert been there before,
isolating the root cause for PR24804 would have been far easier.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261481 91177308-0d34-0410-b5e6-96231b3b80d8
Cleanuppads may be merged together if one is the only predecessor of the
other in which case a simple transform can be performed: replace the
a cleanupret with a branch and remove an unnecessary cleanuppad.
Differential Revision: http://reviews.llvm.org/D17459
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261390 91177308-0d34-0410-b5e6-96231b3b80d8
This patch enables the vectorization of first-order recurrences. A first-order
recurrence is a non-reduction recurrence relation in which the value of the
recurrence in the current loop iteration equals a value defined in the previous
iteration. The load PRE of the GVN pass often creates these recurrences by
hoisting loads from within loops.
In this patch, we add a new recurrence kind for first-order phi nodes and
attempt to vectorize them if possible. Vectorization is performed by shuffling
the values for the current and previous iterations. The vectorization cost
estimate is updated to account for the added shuffle instruction.
Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D16197
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261346 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If we don't have the first and last access of an interleaved load group,
the first and last wide load in the loop can do an out of bounds
access. Even though we discard results from speculative loads,
this can cause problems, since it can technically generate page faults
(or worse).
We now discard interleaved load groups that don't have the first and
load in the group.
Reviewers: hfinkel, rengolin
Subscribers: rengolin, llvm-commits, mzolotukhin, anemet
Differential Revision: http://reviews.llvm.org/D17332
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261331 91177308-0d34-0410-b5e6-96231b3b80d8
routine.
We were getting this wrong in small ways and generally being very
inconsistent about it across loop passes. Instead, let's have a common
place where we do this. One minor downside is that this will require
some analyses like SCEV in more places than they are strictly needed.
However, this seems benign as these analyses are complete no-ops, and
without this consistency we can in many cases end up with the legacy
pass manager scheduling deciding to split up a loop pass pipeline in
order to run the function analysis half-way through. It is very, very
annoying to fix these without just being very pedantic across the board.
The only loop passes I've not updated here are ones that use
AU.setPreservesAll() such as IVUsers (an analysis) and the pass printer.
They seemed less relevant.
With this patch, almost all of the problems in PR24804 around loop pass
pipelines are fixed. The one remaining issue is that we run simplify-cfg
and instcombine in the middle of the loop pass pipeline. We've recently
added some loop variants of these passes that would seem substantially
cleaner to use, but this at least gets us much closer to the previous
state. Notably, the seven loop pass managers is down to three.
I've not updated the loop passes using LoopAccessAnalysis because that
analysis hasn't been fully wired into LoopSimplify/LCSSA, and it isn't
clear that those transforms want to support those forms anyways. They
all run late anyways, so this is harmless. Similarly, LSR is left alone
because it already carefully manages its forms and doesn't need to get
fused into a single loop pass manager with a bunch of other loop passes.
LoopReroll didn't use loop simplified form previously, and I've updated
the test case to match the trivially different output.
Finally, I've also factored all the pass initialization for the passes
that use this technique as well, so that should be done regularly and
reliably.
Thanks to James for the help reviewing and thinking about this stuff,
and Ben for help thinking about it as well!
Differential Revision: http://reviews.llvm.org/D17435
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261316 91177308-0d34-0410-b5e6-96231b3b80d8
more places to prevent gratuitous re-"runs" of these passes.
The passes themselves don't do any work when run, but we keep spending
time scheduling and running these needlessly when we really don't need
to do so.
This is the first patch towards fixing the really horrible loop pass
pipeline fragmentation pointed out by Sanjoy in PR24804.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261302 91177308-0d34-0410-b5e6-96231b3b80d8
Commit r259357 was reverted because it caused PR26629. We were assuming all
roots of a vectorizable tree could be truncated to the same width, which is not
the case in general. This commit reapplies the patch along with a fix and a new
test case to ensure we don't regress because of this issue again. This should
fix PR26629.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261212 91177308-0d34-0410-b5e6-96231b3b80d8
convert one test to use this.
This is a particularly significant milestone because it required
a working per-function AA framework which can be queried over each
function from within a CGSCC transform pass (and additionally a module
analysis to be accessible). This is essentially *the* point of the
entire pass manager rewrite. A CGSCC transform is able to query for
multiple different function's analysis results. It works. The whole
thing appears to actually work and accomplish the original goal. While
we were able to hack function attrs and basic-aa to "work" in the old
pass manager, this port doesn't use any of that, it directly leverages
the new fundamental functionality.
For this to work, the CGSCC framework also has to support SCC-based
behavior analysis, etc. The only part of the CGSCC pass infrastructure
not sorted out at this point are the updates in the face of inlining and
running function passes that mutate the call graph.
The changes are pretty boring and boiler-plate. Most of the work was
factored into more focused preperatory patches. But this is what wires
it all together.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261203 91177308-0d34-0410-b5e6-96231b3b80d8
This function is used to check whether a dbg.value intrinsic has already
been inserted, but without comparing the DIExpression, it would erroneously
fire on split aggregates and only the first scalar would survive.
Found via http://reviews.llvm.org/D16867.
<rdar://problem/24456528>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261145 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Store and loads unpacked by instcombine do not always have the right alignement. This explicitely compute the alignement and set it.
Reviewers: dblaikie, majnemer, reames, hfinkel, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17326
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261139 91177308-0d34-0410-b5e6-96231b3b80d8
When support for objc_unsafeClaimAutoreleasedReturnValue has been added to the
ARC optimizer in r258970, one case was missed which would lead the optimizer
to execute an llvm_unreachable. In this case, just handle ClaimRV in the same
way we handle RetainRV.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261134 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel (even if
it is not totally "free": projects like clang are reusing product
from the "compile phase" for multiple link, think about
libLLVMSupport reused for opt, llc, etc.).
This pipeline is based on the proposal in D13443 for full LTO. We
didn't move forward on this proposal because the LTO link was far too
long after that. We believe that we can afford it with ThinLTO.
The ThinLTO pipeline integrates in the regular O2/O3 flow:
- The compile phase perform the inliner with a somehow lighter
function simplification. (TODO: tune the inliner thresholds here)
This is intendend to simplify the IR and get rid of obvious things
like linkonce_odr that will be inlined.
- The link phase will run the pipeline from the start, extended with
some specific passes that leverage the augmented knowledge we have
during LTO. Especially after the inliner is done, a sequence of
globalDCE/globalOpt is performed, followed by another run of the
"function simplification" passes. It is not clear if this part
of the pipeline will stay as is, as the split model of ThinLTO
does not allow the same benefit as FullLTO without added tricks.
The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO and it will evolve, but this should provide a good starting
point.
Reviewers: tejohnson
Differential Revision: http://reviews.llvm.org/D17115
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261029 91177308-0d34-0410-b5e6-96231b3b80d8
It is intended to contains the passes run over a function after the
inliner is done with a function and before it moves to its callers.
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@261028 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
While shrinking types according to the required bits, we can
encounter insert/extract element instructions. This will cause us to
reach an llvm_unreachable statement.
This change adds support for truncating insert/extract element
operations, and adds a regression test.
Reviewers: jmolloy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D17078
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260893 91177308-0d34-0410-b5e6-96231b3b80d8
LICM starts with an *empty* AST, and then merges in each sub-loop. While the
add code is appropriate for sub-loop 2 and up, it's utterly unnecessary for
sub-loop 1. If the AST starts off empty, we can just clone/move the contents
of the subloop into the containing AST.
Reviewed-by: Philip Reames <listmail@philipreames.com>
Differential Revision: http://reviews.llvm.org/D16753
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260892 91177308-0d34-0410-b5e6-96231b3b80d8
than the SCC object, and have it scan the instruction stream directly
rather than relying on call records.
This makes the behavior of this routine consistent between libc routines
and LLVM intrinsics for libc routines. We can go and start teaching it
about those being norecurse, but we should behave the same for the
intrinsic and the libc routine rather than differently. I chatted with
James Molloy and the inconsistency doesn't seem intentional and likely
is due to intrinsic calls not being modelled in the call graph analyses.
This also fixes a bug where we would deduce norecurse on optnone
functions, when generally we try to handle optnone functions as-if they
were replaceable and thus unanalyzable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260813 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Export the CloneDebugInfoMetadata utility, which clones all debug info
associated with a function into the first module. Also use this function
in CloneModule on each function we clone (the CloneFunction entrypoint
already does this).
Without this, cloning a module will lead to DI quality regressions,
especially since r252219 reversed the Function <-> DISubprogram edge
(before we could get lucky and have this edge preserved if the
DISubprogram itself was, e.g. due to location metadata).
This was verified to fix missing debug information in julia and
a unittest to verify the new behavior is included.
Patch by Yichao Yu! Thanks!
Reviewers: loladiro, pcc
Differential Revision: http://reviews.llvm.org/D17165
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260791 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Performing this optimization duplicates the call to the convergent
function and adds new control-flow dependencies, which is a no-no.
Reviewers: jingyue
Subscribers: broune, hfinkel, tra, resistor, joker.eph, arsenm, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17128
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260730 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Calls to convergent functions can be duplicated, but only if the
duplicates are not control-flow dependent on any additional values.
Loop rotation doesn't meet the bar.
Reviewers: jingyue
Subscribers: mzolotukhin, llvm-commits, arsenm, joker.eph, resistor, tra, hfinkel, broune
Differential Revision: http://reviews.llvm.org/D17127
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260729 91177308-0d34-0410-b5e6-96231b3b80d8
The attached patch removes all of the block local code for performing X-load forwarding by reusing the code used in the non-local case.
The motivation here is to remove duplication and in the process increase our test coverage of some fairly tricky code. I have some upcoming changes I'll be proposing in this area and wanted to have the code cleaned up a bit first.
Note: The review for this mostly happened in email which didn't make it to phabricator on the 258882 commit thread.
Differential Revision: http://reviews.llvm.org/D16608
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260711 91177308-0d34-0410-b5e6-96231b3b80d8
In short, before r252926 we were comparing an unsigned (StoreSize) against an a
APInt (Stride), which is fine and well. After we were zero extending the Stride
and then converting to an unsigned, which is not the same thing. Obviously,
Stides can also be negative. This commit just restores the original behavior.
AFAICT, it's not possible to write a test case to expose the issue because
the code already has checks to make sure the StoreSize can't overflow an
unsigned (which prevents the Stride from overflowing an unsigned as well).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260706 91177308-0d34-0410-b5e6-96231b3b80d8
For some cases, InstCombine replaces the sequence of xor/sub instruction
followed by cmp instruction into a single cmp instruction.
However, this replacement may result suboptimal result especially when
the xor/sub has more than one use, as discussed in
bug 26465 (https://llvm.org/bugs/show_bug.cgi?id=26465).
This patch make the replacement happen only when xor/sub has only one
use.
Differential Revision: http://reviews.llvm.org/D16915
Patch by Taewook Oh!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260695 91177308-0d34-0410-b5e6-96231b3b80d8
node set rather than walking the SCC directly.
This directly exposes the functions and has already had null entries
filtered out. We also don't need need to handle optnone as it has
already been handled in the caller -- we never try to remove convergent
when there are optnone functions in the SCC.
With this change, the code for removing convergent should work with the
new pass manager and a different SCC analysis.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260668 91177308-0d34-0410-b5e6-96231b3b80d8
with the test for a non-convergent intrinsic call.
While it is possible to use the call records to search for function
calls, we're going to do an instruction scan anyways to find the
intrinsics, we can handle both cases while scanning instructions. This
will also make the logic more amenable to the new pass manager which
doesn't use the same call graph structure.
My next patch will remove use of CallGraphNode entirely and allow this
code to work with both the old and new pass manager. Fortunately, it
should also get strictly simpler without changing functionality.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260666 91177308-0d34-0410-b5e6-96231b3b80d8
MSan adds a constructor to each translation unit that calls
__msan_init, and does nothing else. The idea is to run __msan_init
before any instrumented code. This results in multiple constructors
and multiple .init_array entries in the final binary, one per
translation unit. This is absolutely unnecessary; one would be
enough.
This change moves the constructors to a comdat group in order to drop
the extra ones.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260632 91177308-0d34-0410-b5e6-96231b3b80d8
Original commit message:
[InstCombine] Fold IntToPtr and PtrToInt into preceding loads.
Currently we only fold a BitCast into a Load when the BitCast is its
only user.
Do the same for any no-op cast.
Patch by Philip Pfaffe!
Differential Revision: http://reviews.llvm.org/D9152
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260612 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r260604.
I didn't intend to push this now.
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260606 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel.
This pipeline is based on the proposal in D13443 for full LTO. We ]
didn't move forward on this proposal because the link was far too long
after that.
This patch refactor the "function simplification" passes that are part
of the inliner loop in a helper function (this part is NFC and can be
commited separately to simplify the diff). The ThinLTO pipeline
integrates in the regular O2/O3 flow:
- The compile phase perform the inliner with a somehow lighter
function simplification. (TODO: tune the inliner thresholds here)
This is intendend to simplify the IR and get rid of obvious things
like linkonce_odr that will be inlined.
- The link phase will run the pipeline from the start, extended with
some specific passes that leverage the augmented knowledge we have
during LTO. Especially after the inliner is done, a sequence of
globalDCE/globalOpt is performed, followed by another run of the
"function simplification" passes.
The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO but this should provide a good starting point.
Reviewers: tejohnson
Subscribers: joker.eph, llvm-commits, dexonsmith
Differential Revision: http://reviews.llvm.org/D17115
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260604 91177308-0d34-0410-b5e6-96231b3b80d8
It is intended to contains the passes run over a function after the
inliner is done with a function and before it moves to its callers.
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260603 91177308-0d34-0410-b5e6-96231b3b80d8
When optimizing a extractvalue(load), we generate a load from the
aggregate type. This load didn't have alignment set and so would
get the alignment of the type. This breaks when the type is packed
and so the alignment should be lower.
For example, loading { int, int } would give us alignment of 4, but
the original load from this type may have an alignment of 1 if packed.
Reviewed by David Majnemer
Differential revision: http://reviews.llvm.org/D17158
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260587 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When a PHI is used only to be compared with zero, it is possible to replace an
incoming value with any non-zero constant if the incoming value can be proved as
a known nonzero value. For example, in below code, we can replace the incoming value %v with
any non-zero constant based on the fact that the PHI is only used to be compared with zero
and %v is a known non-zero value:
%v = select %cond, 1, 2
%p = phi [%v, BB] ...
%c = icmp eq, %p, 0
Reviewers: mcrosier, jmolloy, sanjoy
Subscribers: hfinkel, mcrosier, majnemer, llvm-commits, haicheng, bmakam, mssimpso, gberry
Differential Revision: http://reviews.llvm.org/D16240
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260530 91177308-0d34-0410-b5e6-96231b3b80d8
Make sure we split ":" from the end of the global function id (which
is <path>:<function> for local functions) instead of the beginning to
avoid splitting at the wrong place for Windows file paths that contain
a ":".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260469 91177308-0d34-0410-b5e6-96231b3b80d8
The current function importer will walk the callgraph, importing
transitively any callee that is below the threshold. This can
lead to import very deep which is costly in compile time and not
necessarily beneficial as most of the inline would happen in
imported function and not necessarilly in user code.
The actual factor has been carefully chosen by flipping a coin ;)
Some tuning need to be done (just at the existing limiting threshold).
Reviewers: tejohnson
Differential Revision: http://reviews.llvm.org/D17082
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260466 91177308-0d34-0410-b5e6-96231b3b80d8
There is not reason to pass an array of "char *" to rebuild a set if
the client already has one.
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260462 91177308-0d34-0410-b5e6-96231b3b80d8
It looks like clang has a couple of test cases which caught the fact LVI was not slightly more precise after 260439. When looking at the failures, it struck me as wasteful to be querying nullness of a constant via LVI, so instead of tweaking the clang tests, let's just stop querying constants from this source.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260451 91177308-0d34-0410-b5e6-96231b3b80d8
This restores commit r260408, along with a fix for a bot failure.
The bot failure was caused by dereferencing a unique_ptr in the same
call instruction parameter list where it was passed via std::move.
Apparently due to luck this was not exposed when I built the compiler
with clang, only with gcc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260442 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch uses the lower 64-bits of the MD5 hash of a function name as
a GUID in the function index, instead of storing function names. Any
local functions are first given a global name by prepending the original
source file name. This is the same naming scheme and GUID used by PGO in
the indexed profile format.
This change has a couple of benefits. The primary benefit is size
reduction in the combined index file, for example 483.xalancbmk's
combined index file was reduced by around 70%. It should also result in
memory savings for the index file in memory, as the in-memory map is
also indexed by the hash instead of the string.
Second, this enables integration with indirect call promotion, since the
indirect call profile targets are recorded using the same global naming
convention and hash. This will enable the function importer to easily
locate function summaries for indirect call profile targets to enable
their import and subsequent promotion.
The original source file name is recorded in the bitcode in a new
module-level record for use in the ThinLTO backend pipeline.
Reviewers: davidxl, joker.eph
Subscribers: llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D17028
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260408 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
As discussed on IRC, move the ThinLTOGlobalProcessing code out of
the linker, and into TransformUtils. The name of the class is changed
to FunctionImportGlobalProcessing.
Reviewers: joker.eph, rafael
Subscribers: joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D17081
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260395 91177308-0d34-0410-b5e6-96231b3b80d8
This patch uses one bit in profile version to differentiate Clang
instrumentation and IR level instrumentation profiles.
PGOInstrumenation generates a COMDAT variable __llvm_profile_raw_version so
that the compiler runtime can set the right profile kind.
For Maco-O platform, we generate the variable as linkonce_odr linkage as
COMDAT is not supported.
PGOInstrumenation now checks this bit to make sure it's an IR level
instrumentation profile.
The patch was submitted as r260164 but reverted due to a Darwin test breakage.
Original Differential Revision: http://reviews.llvm.org/D15540
Differential Revision: http://reviews.llvm.org/D17020
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260385 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Remove the convergent attribute on any functions which provably do not
contain or invoke any convergent functions.
After this change, we'll be able to modify clang to conservatively add
'convergent' to all functions when compiling CUDA.
Reviewers: jingyue, joker.eph
Subscribers: llvm-commits, tra, jhen, hfinkel, resistor, chandlerc, arsenm
Differential Revision: http://reviews.llvm.org/D17013
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260319 91177308-0d34-0410-b5e6-96231b3b80d8
This pass implements whole program optimization of virtual calls in cases
where we know (via bitset information) that the list of callees is fixed. This
includes the following:
- Single implementation devirtualization: if a virtual call has a single
possible callee, replace all calls with a direct call to that callee.
- Virtual constant propagation: if the virtual function's return type is an
integer <=64 bits and all possible callees are readnone, for each class and
each list of constant arguments: evaluate the function, store the return
value alongside the virtual table, and rewrite each virtual call as a load
from the virtual table.
- Uniform return value optimization: if the conditions for virtual constant
propagation hold and each function returns the same constant value, replace
each virtual call with that constant.
- Unique return value optimization for i1 return values: if the conditions
for virtual constant propagation hold and a single vtable's function
returns 0, or a single vtable's function returns 1, replace each virtual
call with a comparison of the vptr against that vtable's address.
Differential Revision: http://reviews.llvm.org/D16795
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260312 91177308-0d34-0410-b5e6-96231b3b80d8
We introduced gc.relocates of vector-of-pointer types a couple of weeks back. Somehow, I missed updating the InstCombine rule to account for this. If we hit this code path with a vector-of-pointers gc.relocate, we'd crash on a cast<PointerType>.
I also took the chance to do a bit of code style cleanup.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260279 91177308-0d34-0410-b5e6-96231b3b80d8
FunctionAttrs does an "optimistic" analysis of SCCs as a unit, which
means normally it is able to disregard calls from an SCC into itself.
However, calls and invokes with operand bundles are allowed to have
memory effects not fully described by the memory effects on the call
target, so we can't be optimistic around operand-bundled calls from an
SCC into itself.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260244 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Passes that call `getAnalysisIfAvailable<T>` also need to call
`addUsedIfAvailable<T>` in `getAnalysisUsage` to indicate to the
legacy pass manager that it uses `T`. This contract was being
violated by passes that used `createLegacyPMAAResults`. This change
fixes this by exposing a helper in AliasAnalysis.h,
`addUsedAAAnalyses`, that is complementary to createLegacyPMAAResults
and does the right thing when called from `getAnalysisUsage`.
Reviewers: chandlerc
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17010
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260183 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Unrolling Analyzer is already pretty complicated, and it becomes harder and harder to exercise it with usual IR tests, as with them we can only check the final decision: whether the loop is unrolled or not. This change factors this framework out from LoopUnrollPass to analyses, which allows to use unit tests.
The change itself is supposed to be NFC, except adding a couple of tests.
I plan to add more tests as I add new functionality and find/fix bugs.
Reviewers: chandlerc, hfinkel, sanjoy
Subscribers: zzheng, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D16623
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260169 91177308-0d34-0410-b5e6-96231b3b80d8
This patch uses one bit in profile version to differentiate Clang
instrumentation and IR level instrumentation profiles.
PGOInstrumenation generates a COMDAT variable __llvm_profile_raw_version so
that the compiler runtime can set the right profile kind.
PGOInstrumenation now checks this bit to make sure it's an IR level
instrumentation profile.
Differential Revision: http://reviews.llvm.org/D15540
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260146 91177308-0d34-0410-b5e6-96231b3b80d8
sanitizer issue. The PredicatedScalarEvolution's copy constructor
wasn't copying the Generation value, and was leaving it un-initialized.
Original commit message:
[SCEV][LAA] Add no wrap SCEV predicates and use use them to improve strided pointer detection
Summary:
This change adds no wrap SCEV predicates with:
- support for runtime checking
- support for expression rewriting:
(sext ({x,+,y}) -> {sext(x),+,sext(y)}
(zext ({x,+,y}) -> {zext(x),+,sext(y)}
Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.
We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.
Reviewers: mzolotukhin, sanjoy, anemet
Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel
Differential Revision: http://reviews.llvm.org/D15412
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260112 91177308-0d34-0410-b5e6-96231b3b80d8
Change a return statement of ComputeValueKnownInPredecessors() to be the same as
the rest return statements of the function. Otherwise, it might return true with
an empty Result when the current basic block has no predecessors and trigger the
first assert of JumpThreading::ProcessThreadableEdges().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260110 91177308-0d34-0410-b5e6-96231b3b80d8
We shouldn't assert when there are no memchecks, since we
can have SCEV checks. There is already an assert covering
the case where there are no SCEV checks or memchecks.
This also changes the LAA pointer wrapping versioning test
to use the loop versioning pass (this was how I managed to
trigger the assert in the loop versioning pass).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260086 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change adds no wrap SCEV predicates with:
- support for runtime checking
- support for expression rewriting:
(sext ({x,+,y}) -> {sext(x),+,sext(y)}
(zext ({x,+,y}) -> {zext(x),+,sext(y)}
Note that we are sign extending the increment of the SCEV, even for
the zext case. This is needed to cover the fairly common case where y would
be a (small) negative integer. In order to do this, this change adds two new
flags: nusw and nssw that are applicable to AddRecExprs and permit the
transformations above.
We also change isStridedPtr in LAA to be able to make use of
these predicates. With this feature we should now always be able to
work around overflow issues in the dependence analysis.
Reviewers: mzolotukhin, sanjoy, anemet
Subscribers: mzolotukhin, sanjoy, llvm-commits, rengolin, jmolloy, hfinkel
Differential Revision: http://reviews.llvm.org/D15412
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260085 91177308-0d34-0410-b5e6-96231b3b80d8
As discussed in https://github.com/google/sanitizers/issues/398, with current
implementation of poisoning globals we can have some CHECK failures or false
positives in case of mixing instrumented and non-instrumented code due to ASan
poisons innocent globals from non-sanitized binary/library. We can use private
aliases to avoid such errors. In addition, to preserve ODR violation detection,
we introduce new __odr_asan_gen_XXX symbol for each instrumented global that
indicates if this global was already registered. To detect ODR violation in
runtime, we should only check the value of indicator and report an error if it
isn't equal to zero.
Differential Revision: http://reviews.llvm.org/D15642
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260075 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When alias analysis is uncertain about the aliasing between any two accesses,
it will return MayAlias. This uncertainty from alias analysis restricts LICM
from proceeding further. In cases where alias analysis is uncertain we might
use loop versioning as an alternative.
Loop Versioning will create a version of the loop with aggressive aliasing
assumptions in addition to the original with conservative (default) aliasing
assumptions. The version of the loop making aggressive aliasing assumptions
will have all the memory accesses marked as no-alias. These two versions of
loop will be preceded by a memory runtime check. This runtime check consists
of bound checks for all unique memory accessed in loop, and it ensures the
lack of memory aliasing. The result of the runtime check determines which of
the loop versions is executed: If the runtime check detects any memory
aliasing, then the original loop is executed. Otherwise, the version with
aggressive aliasing assumptions is used.
The pass is off by default and can be enabled with command line option
-enable-loop-versioning-licm.
Reviewers: hfinkel, anemet, chatur01, reames
Subscribers: MatzeB, grosser, joker.eph, sanjoy, javed.absar, sbaranga,
llvm-commits
Differential Revision: http://reviews.llvm.org/D9151
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259986 91177308-0d34-0410-b5e6-96231b3b80d8
In r255133 (reapplied r253126) we started to avoid redundant
recomputation of LCSSA after loop-unrolling. This patch moves one step
further in this direction - now we can avoid it for much wider range of
loops, as we start to look at IR and try to figure out if the
transformation actually breaks LCSSA phis or makes it necessary to
insert new ones.
Differential Revision: http://reviews.llvm.org/D16838
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259869 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Passing the rematerialized values map to insertRematerializationStores by
value looks to be a simple oversight; update it to pass by reference.
Reviewers: reames, sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16911
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259867 91177308-0d34-0410-b5e6-96231b3b80d8
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.
This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.
The original commit triggered regressions in Polly tests. The regressions
exposed two problems which have been fixed in current version.
1. Polly will generate a new function based on the old one. To generate an
instruction for the new function, it builds SCEV for the old instruction,
applies some tranformation on the SCEV generated, then expands the transformed
SCEV and insert the expanded value into new function. Because SCEV expansion
may reuse value cached in ExprValueMap, the value in old function may be
inserted into new function, which is wrong.
In SCEVExpander::expand, there is a logic to check the cached value to
be used should dominate the insertion point. However, for the above
case, the check always passes. That is because the insertion point is
in a new function, which is unreachable from the old function. However
for unreachable node, DominatorTreeBase::dominates thinks it will be
dominated by any other node.
The fix is to simply add a check that the cached value to be used in
expansion should be in the same function as the insertion point instruction.
2. When the SCEV is of scConstant type, expanding it directly is cheaper than
reusing a normal value cached. Although in the cached value set in ExprValueMap,
there is a Constant type value, but it is not easy to find it out -- the cached
Value set is not sorted according to the potential cost. Existing reuse logic
in SCEVExpander::expand simply chooses the first legal element from the cached
value set.
The fix is that when the SCEV is of scConstant type, don't try the reuse
logic. simply expand it.
Differential Revision: http://reviews.llvm.org/D12090
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259736 91177308-0d34-0410-b5e6-96231b3b80d8
D16251)
Summary:
This is a simpler fix to the problem than the dominator approach in
http://reviews.llvm.org/D16251. It adds only values into the gather() while loop
that have been seen before.
The actual endless loop is in the constant compare gather() routine in
Utils/SimplifyCFG.cpp. The same value ret.0.off0.i is pushed back into the
queue:
%.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i
Here is what happens at the IR level:
for.cond.i: ; preds = %if.end6.i,
%if.end.i54
%ix.0.i = phi i32 [ 0, %if.end.i54 ], [ %inc.i55, %if.end6.i ]
%ret.0.off0.i = phi i1 [false, %if.end.i54], [%.ret.0.off0.i, %if.end6.i] <<<
%cmp2.i = icmp ult i32 %ix.0.i, %11
br i1 %cmp2.i, label %for.body.i, label %LBJ_TmpSimpleNeedExt.exit
if.end6.i: ; preds = %for.body.i
%cmp10.i = icmp ugt i32 %conv.i, %add9.i
%.ret.0.off0.i = or i1 %ret.0.off0.i, %cmp10.i <<<
When if.end.i54 gets eliminated which removes the definition of ret.0.off0.i.
The result is the expression %.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i
(Note the first ‘or’ operand is now %.ret.0.off0.i, and *NOT* %ret.0.off0.i).
And
now there is use of .ret.0.off0.i before a definition which triggers the
“endless” loop in gather():
while(!DFT.empty()) {
V = DFT.pop_back_val(); // V is .ret.0.off0.i
if (Instruction *I = dyn_cast<Instruction>(V)) {
// If it is a || (or && depending on isEQ), process the operands.
if (I->getOpcode() == (isEQ ? Instruction::Or : Instruction::And)) {
DFT.push_back(I->getOperand(1)); // This is now .ret.0.off0.i also
DFT.push_back(I->getOperand(0));
continue; // “endless loop” for .ret.0.off0.i
}
Reviewers: reames, ahatanak
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16839
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259730 91177308-0d34-0410-b5e6-96231b3b80d8
Bail out if we have a PHI on an EHPad that gets a value from a
CatchSwitchInst. Because the CatchSwitchInst cannot be split, there is
no good place to stick any instructions.
This fixes PR26373.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259702 91177308-0d34-0410-b5e6-96231b3b80d8
According to git bisect, this is the root cause of a miscompile for Regex in
libLLVMSupport. I am still working on reducing a test case.
The actual bug may be elsewhere and this commit just exposed it.
Anyway, at the moment, to reproduce, follow these steps:
1. Build clang and libLTO in release mode.
2. Create a new build directory <stage2> and cd into it.
3. Use clang and libLTO from #1 to build llvm-extract in Release mode + asserts
using -O2 -flto
4. Run llvm-extract -ralias '.*bar' -S test/Other/extract-alias.ll
Result:
program doesn't contain global named '.*bar'!
Expected result:
@a0a0bar = alias void ()* @bar
@a0bar = alias void ()* @bar
declare void @bar()
Note: In step #3, if you don't use lto or asserts, the miscompile disappears.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259674 91177308-0d34-0410-b5e6-96231b3b80d8
Current SCEV expansion will expand SCEV as a sequence of operations
and doesn't utilize the value already existed. This will introduce
redundent computation which may not be cleaned up throughly by
following optimizations.
This patch introduces an ExprValueMap which is a map from SCEV to the
set of equal values with the same SCEV. When a SCEV is expanded, the
set of values is checked and reused whenever possible before generating
a sequence of operations.
Differential Revision: http://reviews.llvm.org/D12090
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259662 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
LoopVersioning is a transform utility that transform passes can use to
run-time disambiguate may-aliasing accesses. I'd like to also expose as
pass to allow it to be unit-tested.
I am planning to add support for non-aliasing annotation in
LoopVersioning and I'd like to be able to write tests directly using
this pass.
(After that feature is done, the pass could also be used to look for
optimization opportunities that are hidden behind incomplete alias
information at compile time.)
The pass drives LoopVersioning in its default way which is to fully
disambiguate may-aliasing accesses no matter how many checks are
required.
Reviewers: hfinkel, ashutosh.nema, sbaranga
Subscribers: zzheng, mssimpso, llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D16612
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259610 91177308-0d34-0410-b5e6-96231b3b80d8
This miscompile came about because we tried to use a transform which was
only appropriate for xor operators when addition was present.
This fixes PR26407.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259375 91177308-0d34-0410-b5e6-96231b3b80d8
In the future, we will vectorize recurrences other than reductions. This patch
renames a few variables and updates their associated comments to enable them to
be reused for non-reduction PHI nodes.
This change was requested in the review for D16197.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259364 91177308-0d34-0410-b5e6-96231b3b80d8
The previous patch caused PR26364. The fix is to ensure that we don't enter a
cycle when iterating over use-def chains.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259357 91177308-0d34-0410-b5e6-96231b3b80d8
These sets perform linear searching in small mode so it is never a good
idea to use SmallSize/N bigger than 32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259283 91177308-0d34-0410-b5e6-96231b3b80d8
Loop transformations can sometimes fail because the loop, while in
valid rotated LCSSA form, is not in a canonical CFG form. This is
an extremely simple pass that just merges obviously redundant
blocks, which can be used to fix some known failure cases. In the
future, it may be enhanced with more cases (and have code shared with
SimplifyCFG).
This allows us to run LoopSimplifyCFG -> LoopRotate -> LoopUnroll,
so that SimplifyCFG cleans up the loop before Rotate tries to run.
Not currently used in the pass manager, since this pass doesn't do
anything unless you can hook it up in an LPM with other loop passes.
It'll be added once Chandler cleans up things to allow this.
Tested in a custom pipeline out of tree to confirm it works in
practice (in addition to the included trivial test).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259256 91177308-0d34-0410-b5e6-96231b3b80d8
We would infinite loop because we created a shufflevector that was wider than
needed and then failed to combine that with the insertelement. When subsequently
visiting the extractelement from that shuffle, we see that it's unnecessary,
delete it, and trigger another visit to the insertelement.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259236 91177308-0d34-0410-b5e6-96231b3b80d8
- Locally declare struct, and call it BaseDerivedPair
- Use a lambda to compare, instead of a singleton with uninitialized
fields
- Add a constructor to BaseDerivedPair and use SmallVector::emplace_back
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259208 91177308-0d34-0410-b5e6-96231b3b80d8
Just adding an assert which makes invariants between AnalyzeLoadsFromClobberingLoads and GetLoadValueForLoad slightly more clear.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259145 91177308-0d34-0410-b5e6-96231b3b80d8
The full diff for the test directory may be hard to read because of the
filename clash; so here's all that happened as far as the tests are
concerned:
```
cd test/Transforms/RewriteStatepointsForGC
git rm *ll
git mv deopt-bundles/* ./
rmdir deopt-bundles
find . -name '*.ll' | xargs gsed -i 's/-rs4gc-use-deopt-bundles //g'
```
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259129 91177308-0d34-0410-b5e6-96231b3b80d8
These changes are aimed at bringing PlaceSafepoints up to code with the
LLVM coding guidelines:
- Fix variable naming
- Use DenseSet instead of std::set
- Remove dead code
- Minor local code simplifications
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259112 91177308-0d34-0410-b5e6-96231b3b80d8
This change permanently clamps -spp-no-statepoints to true (the code
deletion will come later). Tests that specifically tested
PlaceSafepoint's ability to wrap calls in gc.statepoint have been moved
to RS4GC's test suite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259096 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the instruction we're hoisting out of a loop into its preheader is
guaranteed to have executed in the loop, then the metadata associated
with the instruction (e.g. !range or !dereferenceable) is valid in the
preheader. This is because once we're in the preheader, we know we're
eventually going to reach the location the metadata was valid at.
This change makes LICM smarter around this, and helps it recognize cases
like these:
```
do {
int a = *ptr; !range !0
...
} while (i++ < N);
```
to
```
int a = *ptr; !range !0
do {
...
} while (i++ < N);
```
Earlier we'd drop the `!range` metadata after hoisting the load from
`ptr`.
Reviewers: igor-laevsky
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16669
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@259053 91177308-0d34-0410-b5e6-96231b3b80d8
This is a fix for:
https://llvm.org/bugs/show_bug.cgi?id=26308
With the switch to using the TTI cost model in:
http://reviews.llvm.org/rL228826
...it became possible to hit a zero-cost cycle of instructions (gep -> phi -> gep...),
so we need a cap for the recursion in DominatesMergePoint().
A recursion depth parameter was already added for a different reason in:
http://reviews.llvm.org/rL255660
...so we can just set a limit for it.
I pulled "10" out of the air and made it an independent parameter that we can play with.
It might be higher than it needs to be given the currently low default value of
PHINodeFoldingThreshold (2). That's the starting cost value that we enter the recursion
with, and most instructions have cost set to TCC_Basic (1), so I don't think we're going
to speculate more than 2 instructions with the current parameters.
As noted in the review and the TODO comment, we can do better than just limiting recursion
depth.
Differential Revision: http://reviews.llvm.org/D16637
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258971 91177308-0d34-0410-b5e6-96231b3b80d8
ObjC ARC Optimizer.
The main implication of this is:
1. Ensuring that we treat it conservatively in terms of optimization.
2. We put the ASM marker on it so that the runtime can recognize
objc_unsafeClaimAutoreleasedReturnValue from releaseRV.
<rdar://problem/21567064>
Patch by Michael Gottesman!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258970 91177308-0d34-0410-b5e6-96231b3b80d8
This patch is the second attempt to reapply commit r258404. There was bug in
the initial patch and subsequent fix (mentioned below).
The initial patch caused an assertion because we were computing smaller type
sizes for instructions that cannot be demoted. The fix first determines the
instructions that will be demoted, and then applies the smaller type size to
only those instructions.
This should fix PR26239 and PR26307.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258929 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is a revised version of D13974, and the following quoted summary are from D13974
"This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with
its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch."
D13974 was committed but failed one lnt test. The bug was that we only checked the condition from loop exit's incoming block was a loop invariant. But there could be another condition from loop header to that incoming block not being a loop invariant. This would produce miscompiled code.
This patch fixes the issue by checking if the incoming block is loop header, and if not, don't perform the rewrite. The could be further improved by recursively checking all conditions leading to loop exit block, but I'd like to check in this simple version first and improve it with future patches.
Reviewers: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16570
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258912 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r258903 which reverted r255660. r258903 was an
accidental commit and should not have been committed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258905 91177308-0d34-0410-b5e6-96231b3b80d8
SimplifyCFG tries to turn complex branch conditions into a switch.
Some of it's logic attempts to reason about bitwise arithmetic produced
by InstCombine. InstCombine can turn things like (X == 2) || (X == 3)
into (X & 1) == 2 and so SimplifyCFG tries to detect when this occurs so
that it can produce a switch instruction.
However, the legality checking was not sufficient to determine whether
or not this had occured. Correctly check this case by requiring that
the right-hand side of the comparison be a power of two.
This fixes PR26323.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258904 91177308-0d34-0410-b5e6-96231b3b80d8
AvailableValue is the part that represents the potential rematerialization. AvailableValueInBlock is simply a pair of an AvailableValue and a BB which we might materialize it in.
This is motivated by http://reviews.llvm.org/D16608. The intent is that we'll have a single function which handles the local case which both local and non-local will use to identify available values. Once that's done, the local case can rematerialize at the use site and the non-local case can do the SSA construction as it does currently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258882 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This patch is provided in preparation for removing autoconf on 1/26. The proposal to remove autoconf on 1/26 was discussed on the llvm-dev thread here: http://lists.llvm.org/pipermail/llvm-dev/2016-January/093875.html
"I felt a great disturbance in the [build system], as if millions of [makefiles] suddenly cried out in terror and were suddenly silenced. I fear something [amazing] has happened."
- Obi Wan Kenobi
Reviewers: chandlerc, grosbach, bob.wilson, tstellarAMD, echristo, whitequark
Subscribers: chfast, simoncook, emaste, jholewinski, tberghammer, jfb, danalbert, srhines, arsenm, dschuff, jyknight, dsanders, joker.eph, llvm-commits
Differential Revision: http://reviews.llvm.org/D16471
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258861 91177308-0d34-0410-b5e6-96231b3b80d8
Previously the RedoInsts was processed at the end of the block.
However it was possible that it left behind some instructions that
were not canonicalized.
This should guarantee that any previous instruction in the basic
block is canonicalized before we process a new instruction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258830 91177308-0d34-0410-b5e6-96231b3b80d8
This is a recommit of r258620 which causes PR26293.
The original message:
Now LIR can turn following codes into memset:
typedef struct foo {
int a;
int b;
} foo_t;
void bar(foo_t *f, unsigned n) {
for (unsigned i = 0; i < n; ++i) {
f[i].a = 0;
f[i].b = 0;
}
}
void test(foo_t *f, unsigned n) {
for (unsigned i = 0; i < n; i += 2) {
f[i] = 0;
f[i+1] = 0;
}
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258777 91177308-0d34-0410-b5e6-96231b3b80d8
* __cfi_check gets a 3rd argument: ubsan handler data
* Instead of trapping on failure, call __cfi_check_fail which must be
present in the module (generated in the frontend).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258746 91177308-0d34-0410-b5e6-96231b3b80d8
We had the same code duplicated for each type of Def. We also have the entire block duplicated between the local and non-local case, but let's start with local cleanup.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258740 91177308-0d34-0410-b5e6-96231b3b80d8
We were hitting an assertion because we were computing smaller type sizes for
instructions that cannot be demoted. The fix first determines the instructions
that will be demoted, and then applies the smaller type size to only those
instructions.
This should fix PR26239.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258705 91177308-0d34-0410-b5e6-96231b3b80d8
Use existing functionality provided in changeToUnreachable instead of
reinventing it in LoopSimplify.
No functionality change is intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258663 91177308-0d34-0410-b5e6-96231b3b80d8
SCCP has code identical to changeToUnreachable's behavior, switch it
over to just call changeToUnreachable.
No functionality change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258654 91177308-0d34-0410-b5e6-96231b3b80d8
InstCombine and SCCP both want to remove dead code in a very particular
way but using identical means to do so. Share the code between the two.
No functionality change is intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258653 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of RAUW with undef, replace the first non-token instruction with
unreachable.
This fixes PR26263.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258611 91177308-0d34-0410-b5e6-96231b3b80d8
PruneEH had functionality idential to removeUnwindEdge.
Consolidate around removeUnwindEdge.
No functionality change is intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258609 91177308-0d34-0410-b5e6-96231b3b80d8
The intrinsic target prefix should match the target name
as it appears in the triple.
This is not yet complete, but gets most of the important ones.
llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
for compatability for now.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258557 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Make sure that any new and optimized objects created during GlobalOPT copy all the attributes from the base object.
A good example of improper behavior in the current implementation is section information associated with the GlobalObject. If a section was set for it, and GlobalOpt is creating/modifying a new object based on this one (often copying the original name), without this change new object will be placed in a default section, resulting in inappropriate properties of the new variable.
The argument here is that if customer specified a section for a variable, any changes to it that compiler does should not cause it to change that section allocation.
Moreover, any other properties worth representation in copyAttributesFrom() should also be propagated.
Reviewers: jmolloy, joker-eph, joker.eph
Subscribers: slarin, joker.eph, rafael, tobiasvk, llvm-commits
Differential Revision: http://reviews.llvm.org/D16074
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258556 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This change adds a `-spp-no-statepoints` flag to PlaceSafepoints that
bypasses the code that wraps newly introduced polls and existing calls
in gc.statepoint. With `-spp-no-statepoints` enabled, PlaceSafepoints
effectively becomes a safpeoint **poll** insertion pass.
The eventual goal is to "constant fold" this option, along with
`-rs4gc-use-deopt-bundles` to `true`, once clients using gc.statepoint
are okay doing so.
Reviewers: pgavlin, reames, JosephTremoulet
Subscribers: sanjoy, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16439
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@258551 91177308-0d34-0410-b5e6-96231b3b80d8