Fixes PR18753 and PR18782.
This is necessary for LICM to preserve LCSSA correctly and efficiently.
There is still some active discussion about whether we should be using
LCSSA, but we can't just immediately stop using it and we *need* LICM to
preserve it while we are using it. We can restore the old SSAUpdater
driven code if and when there is a serious effort to remove the reliance
on LCSSA from all of the loop passes.
However, this also serves as a great example of why LCSSA is very nice
to have. This change significantly simplifies the process of sinking
instructions for LICM, and makes it quite a bit less expensive.
It wouldn't even be as complex as it is except that I had to start the
process of removing the big recursive LCSSA formation hammer in order to
switch even this much of the re-forming code to asserting that LCSSA was
preserved. I'll fully remove that next just to tidy things up until the
LCSSA debate settles one way or the other.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201148 91177308-0d34-0410-b5e6-96231b3b80d8
The addressing mode matcher checks at some point the profitability of folding an
instruction into the addressing mode. When the instruction to be folded has
several uses, it checks that the instruction can be folded in each use.
To do so, it creates a new matcher for each use and check if the instruction is
in the list of the matched instructions of this new matcher.
The new matchers may promote some instructions and this has to be undone to keep
the state of the original matcher consistent.
A test case will follow.
<rdar://problem/16020230>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201121 91177308-0d34-0410-b5e6-96231b3b80d8
The crux of the issue is that LCSSA doesn't preserve stateful alias
analyses. Before r200067, LICM didn't cause LCSSA to run in the LTO pass
manager, where LICM runs essentially without any of the other loop
passes. As a consequence the globalmodref-aa pass run before that loop
pass manager was able to survive the loop pass manager and be used by
DSE to eliminate stores in the function called from the loop body in
Adobe-C++/loop_unroll (and similar patterns in other benchmarks).
When LICM was taught to preserve LCSSA it had to require it as well.
This caused it to be run in the loop pass manager and because it did not
preserve AA, the stateful AA was lost. Most of LLVM's AA isn't stateful
and so this didn't manifest in most cases. Also, in most cases LCSSA was
already running, and so there was no interesting change.
The real kicker is that LCSSA by its definition (injecting PHI nodes
only) trivially preserves AA! All we need to do is mark it, and then
everything goes back to working as intended. It probably was blocking
some other weird cases of stateful AA but the only one I have is
a 1000-line IR test case from loop_unroll, so I don't really have a good
test case here.
Hopefully this fixes the regressions on performance that have been seen
since that revision.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201104 91177308-0d34-0410-b5e6-96231b3b80d8
Before conditional store vectorization/unrolling we had only one
vectorized/unrolled basic block. After adding support for conditional store
vectorization this will not only be one block but multiple basic blocks. The
last block would have the back-edge. I updated the code to use a vector of basic
blocks instead of a single basic block and fixed the users to use the last entry
in this vector. But, I forgot to add the basic blocks to this vector!
Fixes PR18724.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201028 91177308-0d34-0410-b5e6-96231b3b80d8
The bitcast instruction during constant materialization was not placed correcly
in the presence of phi nodes. This commit fixes the insertion point to be in the
idom instead.
This fixes PR18768
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201009 91177308-0d34-0410-b5e6-96231b3b80d8
This fix first traverses the whole use list of the constant expression and
keeps track of the instructions that need to be updated. Then perform the
fixup afterwards.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@201008 91177308-0d34-0410-b5e6-96231b3b80d8
mode.
Basically the idea is to transform code like this:
%idx = add nsw i32 %a, 1
%sextidx = sext i32 %idx to i64
%gep = gep i8* %myArray, i64 %sextidx
load i8* %gep
Into:
%sexta = sext i32 %a to i64
%idx = add nsw i64 %sexta, 1
%gep = gep i8* %myArray, i64 %idx
load i8* %gep
That way the computation can be folded into the addressing mode.
This transformation is done as part of the addressing mode matcher.
If the matching fails (not profitable, addressing mode not legal, etc.), the
matcher will revert the related promotions.
<rdar://problem/15519855>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200947 91177308-0d34-0410-b5e6-96231b3b80d8
225 is the default value of inline-threshold. This change will make sure
we have the same inlining behavior as prior to r200886.
As Chandler points out, even though we don't have code in our testing
suite that uses cold attribute, there are larger applications that do
use cold attribute.
r200886 + this commit intend to keep the same behavior as prior to r200886.
We can later on tune the inlinecold-threshold.
The main purpose of r200886 is to help performance of instrumentation based
PGO before we actually hook up inliner with analysis passes such as BPI and BFI.
For instrumentation based PGO, we try to increase inlining of hot functions and
reduce inlining of cold functions by setting inlinecold-threshold.
Another option suggested by Chandler is to use a boolean flag that controls
if we should use OptSizeThreshold for cold functions. The default value
of the boolean flag should not change the current behavior. But it gives us
less freedom in controlling inlining of cold functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200898 91177308-0d34-0410-b5e6-96231b3b80d8
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200892 91177308-0d34-0410-b5e6-96231b3b80d8
Added command line option inlinecold-threshold to set threshold for inlining
functions with cold attribute. Listen to the cold attribute when it would
decrease the inline threshold.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200886 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change. Updated loops from:
for (I = scc_begin(), E = scc_end(); I != E; ++I)
to:
for (I = scc_begin(); !I.isAtEnd(); ++I)
for teh win.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200789 91177308-0d34-0410-b5e6-96231b3b80d8
Add the missing transformation strchr(p, 0) -> p + strlen(p) to SimplifyLibCalls
and remove the ToDo comment.
Reviewer: Duncan P.N. Exan Smith
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200736 91177308-0d34-0410-b5e6-96231b3b80d8
It disturbs the layout of the parameters in memory and registers,
leading to problems in the backend.
The plan for optimizing internal inalloca functions going forward is to
essentially SROA the argument memory and demote any captured arguments
(things that aren't trivially written by a load or store) to an indirect
pointer to a static alloca.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200717 91177308-0d34-0410-b5e6-96231b3b80d8
LowerExpectIntrinsic previously only understood the idiom of an expect
intrinsic followed by a comparison with zero. For llvm.expect.i1, the
comparison would be stripped by the early-cse pass.
Patch by Daniel Micay.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200664 91177308-0d34-0410-b5e6-96231b3b80d8
unrolling heuristic per default
Benchmarking on x86_64 (thanks Chandler!) and ARM has shown those options speed
up some benchmarks while not causing any interesting regressions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200621 91177308-0d34-0410-b5e6-96231b3b80d8
LCSSA when we promote to SSA registers inside of LICM.
Currently, this is actually necessary. The promotion logic in LICM uses
SSAUpdater which doesn't understand how to place LCSSA PHI nodes.
Teaching it to do so would be a very significant undertaking. It may be
worthwhile and I've left a FIXME about this in the code as well as
starting a thread on llvmdev to try to figure out the right long-term
solution.
For now, the PR needs to be fixed. Short of using the promition
SSAUpdater to place both the LCSSA PHI nodes and the promoted PHI nodes,
I don't see a cleaner or cheaper way of achieving this. Fortunately,
LCSSA is relatively lazy and sparse -- it should only update
instructions which need it. We can also skip the recursive variant when
we don't promote to SSA values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200612 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r200576. It broke 32-bit self-host builds by
vectorizing two calls to @llvm.bswap.i64, which we then fail to expand.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200602 91177308-0d34-0410-b5e6-96231b3b80d8
transform accordingly. Based on similar code from Loop vectorization.
Subsequent commits will include vectorization of function calls to
vector intrinsics and form function calls to vector library calls.
Patch by Raul Silvera! (Much delayed due to my not running dcommit)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200576 91177308-0d34-0410-b5e6-96231b3b80d8
loop vectorizer to not do so when runtime pointer checks are needed and
share code with the new (not yet enabled) load/store saturation runtime
unrolling. Also ensure that we only consider the runtime checks when the
loop hasn't already been vectorized. If it has, the runtime check cost
has already been paid.
I've fleshed out a test case to cover the scalar unrolling as well as
the vector unrolling and comment clearly why we are or aren't following
the pattern.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200530 91177308-0d34-0410-b5e6-96231b3b80d8
The entry block of a function starts with all the static allocas. The change
in r195513 splits the block before those allocas, which has the effect of
turning them into dynamic allocas. That breaks all sorts of things. Change to
split after the initial allocas, and also add a comment explaining why the
block is split.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200515 91177308-0d34-0410-b5e6-96231b3b80d8
preserve loop simplify of enclosing loops.
The problem here starts with LoopRotation which ends up cloning code out
of the latch into the new preheader it is buidling. This can create
a new edge from the preheader into the exit block of the loop which
breaks LoopSimplify form. The code tries to fix this by splitting the
critical edge between the latch and the exit block to get a new exit
block that only the latch dominates. This sadly isn't sufficient.
The exit block may be an exit block for multiple nested loops. When we
clone an edge from the latch of the inner loop to the new preheader
being built in the outer loop, we create an exiting edge from the outer
loop to this exit block. Despite breaking the LoopSimplify form for the
inner loop, this is fine for the outer loop. However, when we split the
edge from the inner loop to the exit block, we create a new block which
is in neither the inner nor outer loop as the new exit block. This is
a predecessor to the old exit block, and so the split itself takes the
outer loop out of LoopSimplify form. We need to split every edge
entering the exit block from inside a loop nested more deeply than the
exit block in order to preserve all of the loop simplify constraints.
Once we try to do that, a problem with splitting critical edges
surfaces. Previously, we tried a very brute force to update LoopSimplify
form by re-computing it for all exit blocks. We don't need to do this,
and doing this much will sometimes but not always overlap with the
LoopRotate bug fix. Instead, the code needs to specifically handle the
cases which can start to violate LoopSimplify -- they aren't that
common. We need to see if the destination of the split edge was a loop
exit block in simplified form for the loop of the source of the edge.
For this to be true, all the predecessors need to be in the exact same
loop as the source of the edge being split. If the dest block was
originally in this form, we have to split all of the deges back into
this loop to recover it. The old mechanism of doing this was
conservatively correct because at least *one* of the exiting blocks it
rewrote was the DestBB and so the DestBB's predecessors were fixed. But
this is a much more targeted way of doing it. Making it targeted is
important, because ballooning the set of edges touched prevents
LoopRotate from being able to split edges *it* needs to split to
preserve loop simplify in a coherent way -- the critical edge splitting
would sometimes find the other edges in need of splitting but not
others.
Many, *many* thanks for help from Nick reducing these test cases
mightily. And helping lots with the analysis here as this one was quite
tricky to track down.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200393 91177308-0d34-0410-b5e6-96231b3b80d8
because of the inside-out run of LoopSimplify in the LoopPassManager and
the fact that LoopSimplify couldn't be "preserved" across two
independent LoopPassManagers.
Anyways, in that case, IndVars wasn't correctly preserving an LCSSA PHI
node because it thought it was rewriting (via SCEV) the incoming value
to a loop invariant value. While it may well be invariant for the
current loop, it may be rewritten in terms of an enclosing loop's
values. This in and of itself is fine, as the LCSSA PHI node in the
enclosing loop for the inner loop value we're rewriting will have its
own LCSSA PHI node if used outside of the enclosing loop. With me so
far?
Well, the current loop and the enclosing loop may share an exiting
block and exit block, and when they do they also share LCSSA PHI nodes.
In this case, its not valid to RAUW through the LCSSA PHI node.
Expected crazy test included.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200372 91177308-0d34-0410-b5e6-96231b3b80d8
When estimating register pressure, don't count the induction variable mulitple
times. It is unlikely to be unrolled. This is currently disabled and hidden
behind a flag ("enable-ind-var-reg-heur").
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200371 91177308-0d34-0410-b5e6-96231b3b80d8
When simplifycfg moves an instruction, it must drop metadata it doesn't know
is still valid with the preconditions changes. In particular, it must drop
the range and tbaa metadata.
The patch implements this with an utility function to drop all metadata not
in a white list.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200322 91177308-0d34-0410-b5e6-96231b3b80d8
vectorizer, placing it behind an off-by-default flag.
It turns out that block frequency isn't what we want at all, here or
elsewhere. This has been I think a nagging feeling for several of us
working with it, but Arnold has given some really nice simple examples
where the results are so comprehensively wrong that they aren't useful.
I'm planning to email the dev list with a summary of why its not really
useful and a couple of ideas about how to better structure these types
of heuristics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200294 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
I searched Transforms/ and Analysis/ for 'ByVal' and updated those call
sites to check for inalloca if appropriate.
I added tests for any change that would allow an optimization to fire on
inalloca.
Reviewers: nlewycky
Differential Revision: http://llvm-reviews.chandlerc.com/D2449
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200281 91177308-0d34-0410-b5e6-96231b3b80d8
LCSSA from it caused a crasher with the LoopUnroll pass.
This crasher is really nasty. We destroy LCSSA form in a suprising way.
When unrolling a loop into an outer loop, we not only need to restore
LCSSA form for the outer loop, but for all children of the outer loop.
This is somewhat obvious in retrospect, but hey!
While this seems pretty heavy-handed, it's not that bad. Fundamentally,
we only do this when we unroll a loop, which is already a heavyweight
operation. We're unrolling all of these hypothetical inner loops as
well, so their size and complexity is already on the critical path. This
is just adding another pass over them to re-canonicalize.
I have a test case from PR18616 that is great for reproducing this, but
pretty useless to check in as it relies on many 10s of nested empty
loops that get unrolled and deleted in just the right order. =/ What's
worse is that investigating this has exposed another source of failure
that is likely to be even harder to test. I'll try to come up with test
cases for these fixes, but I want to get the fixes into the tree first
as they're causing crashes in the wild.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200273 91177308-0d34-0410-b5e6-96231b3b80d8
The vectorizer takes a loop like this and widens all instructions except for the
store. The stores are scalarized/unrolled and hidden behind an "if" block.
for (i = 0; i < 128; ++i) {
if (a[i] < 10)
a[i] += val;
}
for (i = 0; i < 128; i+=2) {
v = a[i:i+1];
v0 = (extract v, 0) + 10;
v1 = (extract v, 1) + 10;
if (v0 < 10)
a[i] = v0;
if (v1 < 10)
a[i] = v1;
}
The vectorizer relies on subsequent optimizations to sink instructions into the
conditional block where they are anticipated.
The flag "vectorize-num-stores-pred" controls whether and how many stores to
handle this way. Vectorization of conditional stores is disabled per default for
now.
This patch also adds a change to the heuristic when the flag
"enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small
loops until load/store ports are saturated. This heuristic uses TTI's
getMaxUnrollFactor as a measure for load/store ports.
I also added a second flag -enable-cond-stores-vec. It will enable vectorization
of conditional stores. But there is no cost model for vectorization of
conditional stores in place yet so this will not do good at the moment.
rdar://15892953
Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll
-vectorize-num-stores-pred=1 (before the BFI change):
Performance Regressions:
Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower)
Applications/siod/siod 2.18%
Performance improvements:
mesa -4.42%
libquantum -4.15%
With a patch that slightly changes the register heuristics (by subtracting the
induction variable on both sides of the register pressure equation, as the
induction variable is probably not really unrolled):
Performance Regressions:
Benchmarks/Ptrdist/yacr2/yacr2 7.73%
Applications/siod/siod 1.97%
Performance Improvements:
libquantum -13.05% (we now also unroll quantum_toffoli)
mesa -4.27%
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200270 91177308-0d34-0410-b5e6-96231b3b80d8
uint32.
When folding branches to common destination, the updated branch weights
can exceed uint32 by more than factor of 2. We should keep halving the
weights until they can fit into uint32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200262 91177308-0d34-0410-b5e6-96231b3b80d8
cold loops as-if they were being optimized for size.
Nothing fancy here. Simply test case included. The nice thing is that we
can now incrementally build on top of this to drive other heuristics.
All of the infrastructure work is done to get the profile information
into this layer.
The remaining work necessary to make this a fully general purpose loop
unroller for very hot loops is to make it a fully general purpose loop
unroller. Things I know of but am not going to have time to benchmark
and fix in the immediate future:
1) Don't disable the entire pass when the target is lacking vector
registers. This really doesn't make any sense any more.
2) Teach the unroller at least and the vectorizer potentially to handle
non-if-converted loops. This is trivial for the unroller but hard for
the vectorizer.
3) Compute the relative hotness of the loop and thread that down to the
various places that make cost tradeoffs (very likely only the
unroller makes sense here, and then only when dealing with loops that
are small enough for unrolling to not completely blow out the LSD).
I'm still dubious how useful hotness information will be. So far, my
experiments show that if we can get the correct logic for determining
when unrolling actually helps performance, the code size impact is
completely unimportant and we can unroll in all cases. But at least
we'll no longer burn code size on cold code.
One somewhat unrelated idea that I've had forever but not had time to
implement: mark all functions which are only reachable via the global
constructors rigging in the module as optsize. This would also decrease
the impact of any more aggressive heuristics here on code size.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200219 91177308-0d34-0410-b5e6-96231b3b80d8
to stabilize a test that really is trying to test generic behavior and
not a specific target's behavior.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200215 91177308-0d34-0410-b5e6-96231b3b80d8
object and fewer pointless variables.
Also, add a clarifying comment and a FIXME because the code which
disables *all* vectorization if we can't use implicit floating point
instructions just makes no sense at all.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200214 91177308-0d34-0410-b5e6-96231b3b80d8
powers of two. This is essentially always the correct thing given the
impact on alignment, scaling factors that can be used in addressing
modes, etc. Also, fix the management of the unroll vs. small loop cost
to more accurately model things with this world.
Enhance a test case to actually exercise more of the unroll machinery if
using synthetic constants rather than a specific target model. Before
this change, with the added flags this test will unroll 3 times instead
of either 2 or 4 (the two sensible answers).
While I don't expect this to make a huge difference, if there are lots
of loops sitting right on the edge of hitting the 'small unroll' factor,
they might change behavior. However, I've benchmarked moving the small
loop cost up and down in many various ways and by a huge factor (2x)
without seeing more than 0.2% code size growth. Small adjustments such
as the series that led up here have led to about 1% improvement on some
benchmarks, but it is very close to the noise floor so I mostly checked
that nothing regressed. Let me know if you see bad behavior on other
targets but I don't expect this to be a sufficiently dramatic change to
trigger anything.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200213 91177308-0d34-0410-b5e6-96231b3b80d8
with the unrolling behavior in the loop vectorizer. No functionality
changed at this point.
These are a bit hack-y, but talking with Hal, there doesn't seem to be
a cleaner way to easily experiment with different thresholds here and he
was also interested in them so I wanted to commit them. Suggestions for
improvement are very welcome here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200212 91177308-0d34-0410-b5e6-96231b3b80d8
number of vector registers rather than toggling between vector and
scalar register number based on VF. I don't have a test case as
I spotted this by inspection and on X86 it only makes a difference if
your target is lacking SSE and thus has *no* vector registers.
If someone wants to add a test case for this for ARM or somewhere else
where this is more significant, that would be awesome.
Also made the variable name a bit more sensible while I'm here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200211 91177308-0d34-0410-b5e6-96231b3b80d8
LoopVectorize pass.
The logic here doesn't make much sense. We *only* unrolled if the
unvectorized loop was a reduction loop with a single basic block *and*
small loop body. The reduction part in particular doesn't make much
sense. Instead, if we just fall through to the vectorized unroll logic
it makes more sense of unrolling if there is a vectorized reduction that
could be hacked on by the SLP vectorizer *or* if the loop is small.
This is mostly a cleanup and nothing in the test suite really exercises
this, but I did run benchmarks across this change and saw no really
significant changes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200198 91177308-0d34-0410-b5e6-96231b3b80d8
a FunctionPass. With this change the loop vectorizer no longer is a loop
pass and can readily depend on function analyses. In particular, with
this change we no longer have to form a loop pass manager to run the
loop vectorizer which simplifies the entire pass management of LLVM.
The next step here is to teach the loop vectorizer to leverage profile
information through the profile information providing analysis passes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200074 91177308-0d34-0410-b5e6-96231b3b80d8
the loops in a function, and teach LICM to work in the presance of
LCSSA.
Previously, LCSSA was a loop pass. That made passes requiring it also be
loop passes and unable to depend on function analysis passes easily. It
also caused outer loops to have a different "canonical" form from inner
loops during analysis. Instead, we go into LCSSA form and preserve it
through the loop pass manager run.
Note that this has the same problem as LoopSimplify that prevents
enabling its verification -- loop passes which run at the end of the loop
pass manager and don't preserve these are valid, but the subsequent loop
pass runs of outer loops that do preserve this pass trigger too much
verification and fail because the inner loop no longer verifies.
The other problem this exposed is that LICM was completely unable to
handle LCSSA form. It didn't preserve it and it actually would give up
on moving instructions in many cases when they were used by an LCSSA phi
node. I've taught LICM to support detecting LCSSA-form PHI nodes and to
hoist and sink around them. This may actually let LICM fire
significantly more because we put everything into LCSSA form to rotate
the loop before running LICM. =/ Now LICM should handle that fine and
preserve it correctly. The down side is that LICM has to require LCSSA
in order to preserve it. This is just a fact of life for LCSSA. It's
entirely possible we should completely remove LCSSA from the optimizer.
The test updates are essentially accomodating LCSSA phi nodes in the
output of LICM, and the fact that we now completely sink every
instruction in ashr-crash below the loop bodies prior to unrolling.
With this change, LCSSA is computed only three times in the pass
pipeline. One of them could be removed (and potentially a SCEV run and
a separate LoopPassManager entirely!) if we had a LoopPass variant of
InstCombine that ran InstCombine on the loop body but refused to combine
away LCSSA PHI nodes. Currently, this also prevents loop unrolling from
being in the same loop pass manager is rotate, LICM, and unswitch.
There is one thing that I *really* don't like -- preserving LCSSA in
LICM is quite expensive. We end up having to re-run LCSSA twice for some
loops after LICM runs because LICM can undo LCSSA both in the current
loop and the parent loop. I don't really see good solutions to this
other than to completely move away from LCSSA and using tools like
SSAUpdater instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200067 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r200058 and adds the using directive for
ARMTargetTransformInfo to silence two g++ overload warnings.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200062 91177308-0d34-0410-b5e6-96231b3b80d8
This commit caused -Woverloaded-virtual warnings. The two new
TargetTransformInfo::getIntImmCost functions were only added to the superclass,
and to the X86 subclass. The other targets were not updated, and the
warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was
hiding the two new getIntImmCost variants.
We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost"
to the various subclasses, or turning it off, but I suspect that it's wrong to
leave the functions unimplemnted in those targets. The default implementations
return TCC_Free, which I don't think is right e.g. for ARM.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200058 91177308-0d34-0410-b5e6-96231b3b80d8
Retry commit r200022 with a fix for the build bot errors. Constant expressions
have (unlike instructions) module scope use lists and therefore may have users
in different functions. The fix is to simply ignore these out-of-function uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@200034 91177308-0d34-0410-b5e6-96231b3b80d8