Not only will this take huge amounts of compile time, the resultant loop nests
won't be useful for optimization. This reduces loopsimplify time on
Transforms/LoopSimplify/2006-08-11-LoopSimplifyLongTime.ll from ~32s to ~0.4s
with a debug build of llvm on a 2.7Ghz G5.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29647 91177308-0d34-0410-b5e6-96231b3b80d8
blocks that target loop blocks.
Before, the code was run once per loop, and depended on the number of
predecessors each block in the loop had. Unfortunately, scanning preds can
be really slow when huge numbers of phis exist or when phis with huge numbers
of inputs exist.
Now, the code is run once per function and scans successors instead of preds,
which is far faster. In addition, the new code is simpler and is goto free,
woo.
This change speeds up a nasty testcase Duraid provided me from taking hours to
taking ~72s with a debug build. The functionality this implements is already
tested in the testsuite as Transforms/CodeExtractor/2004-03-13-LoopExtractorCrash.ll.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29644 91177308-0d34-0410-b5e6-96231b3b80d8
SlowOperatingInfo, Statistics). Besides providing an example of how to
use these facilities, it also serves to debug problems with runtime linking
when dlopening a loadable module. These three support facilities exercise
different combinations of Text/Weak Weak/Text and Text/Text linking
between the executable and the module.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29552 91177308-0d34-0410-b5e6-96231b3b80d8
1. Change the usage of LOADABLE_MODULE so that it implies all the things
necessary to make a loadable module. This reduces the user's burdern to
get a loadable module correctly built.
2. Document the usage of LOADABLE_MODULE in the MakefileGuide
3. Adjust the makefile for lib/Transforms/Hello to use the new specification
for building loadable modules
4. Adjust the sample project to not attempt to build a shared library for
its little library. This was just wasteful and not instructive at all.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29551 91177308-0d34-0410-b5e6-96231b3b80d8
1. Update an obsolete comment.
2. Make the sorting by base an explicit (though still N^2) step, so
that the code is more clear on what it is doing.
3. Partition uses so that uses inside the loop are handled before uses
outside the loop.
Note that none of these changes currently changes the code inserted by LSR,
but they are a stepping stone to getting there.
This code is the result of some crazy pair programming with Nate. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29493 91177308-0d34-0410-b5e6-96231b3b80d8
down approach, inspired by discussions with Tanya.
This approach is significantly faster, because it does not need dominator
frontiers and it does not insert extraneous unused PHI nodes. For example, on
252.eon, in a release-asserts build, this speeds up LCSSA (which is the slowest
pass in gccas) from 9.14s to 0.74s on my G5. This code is also slightly smaller
and significantly simpler than the old code.
Amusingly, in a normal Release build (which includes the
"assert(L->isLCSSAForm());" assertion), asserting that the result of LCSSA
is in LCSSA form is actually slower than the LCSSA transformation pass
itself on 252.eon. I will see if Loop::isLCSSAForm can be sped up next.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29463 91177308-0d34-0410-b5e6-96231b3b80d8
Handle this case, which doesn't require a new callgraph edge. This fixes
a crash compiling MallocBench/gs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29121 91177308-0d34-0410-b5e6-96231b3b80d8
target CG node. This allows the inliner to properly update the callgraph
when using the pruning inliner. The pruning inliner may not copy over all
call sites from a callee to a caller, so the edges corresponding to those
call sites should not be copied over either.
This fixes PR827 and Transforms/Inline/2006-07-12-InlinePruneCGUpdate.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29120 91177308-0d34-0410-b5e6-96231b3b80d8
cases. Ideally, this issue will go away in the future as LCSSA gets smarter
about which Phi nodes it inserts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@29076 91177308-0d34-0410-b5e6-96231b3b80d8
will be profitable. This is mainly to remove some cases where excessive
unswitching would result in long compile times and/or huge generated code.
Once someone comes up with a better heuristic that avoids these cases, this
should be switched out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28962 91177308-0d34-0410-b5e6-96231b3b80d8
Remove the Function pointer cast in these calls, converting it to
a cast of argument.
%tmp60 = tail call int cast (int (ulong)* %str to int (int)*)( int 10 )
%tmp60 = tail call int cast (int (ulong)* %str to int (int)*)( uint %tmp51 )
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28953 91177308-0d34-0410-b5e6-96231b3b80d8
of LCSSA. This results several times the number of unswitchings occurring on
tests such and timberwolfmc, unix-tbl, and ldecod.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28912 91177308-0d34-0410-b5e6-96231b3b80d8
"LCSSA" phi node causes indvars to break dominance properties. This fixes
causes indvars to avoid inserting aggressive code in this case, instead
indvars should be fixed to be more aggressive in the face of lcssa phi's.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28850 91177308-0d34-0410-b5e6-96231b3b80d8
LCSSA is still the slowest pass when gccas'ing 252.eon, but now it only takes
39s instead of 289s. :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28776 91177308-0d34-0410-b5e6-96231b3b80d8
not handling PHI nodes correctly when determining if a value was live-out.
This patch reduces the number of detected live-out variables in the testcase
from 6565 to 485.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28771 91177308-0d34-0410-b5e6-96231b3b80d8
If a single exit block has multiple predecessors within the loop, it will
appear in the exit blocks list more than once. LCSSA needs to take that into
account so that it doesn't double process that exit block.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28750 91177308-0d34-0410-b5e6-96231b3b80d8
post-increment value, should be first cast to the appropriated type (to the
type of the common expr). Otherwise, the rewrite of a use based on (common +
iv) may end up with an incorrect type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28735 91177308-0d34-0410-b5e6-96231b3b80d8
to link in the implementation. Thanks to Anton Korobeynikov for figuring out
what was going on here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28660 91177308-0d34-0410-b5e6-96231b3b80d8
code (while cloning) it often gets the branch/switch instructions. Since it
knows that edges of the CFG are dead, it need not clone (or even look) at
the obviously dead blocks. This should speed up the inliner substantially on
code where there are lots of inlinable calls to functions with constant
arguments. On C++ code in particular, this kicks in.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28641 91177308-0d34-0410-b5e6-96231b3b80d8
reimplement getValueDominatingFunction to walk the DominanceTree rather than
just searching blindly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28618 91177308-0d34-0410-b5e6-96231b3b80d8
is now theoretically feature-complete. It has not, however, been thoroughly
test, and is still considered experimental.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28529 91177308-0d34-0410-b5e6-96231b3b80d8
the iterated Dominance Frontier of the loop-closure Phi's. This is the
second phase of the LCSSA pass. The third phase (coming soon) will be to
update all uses of loop variables to use the loop-closure Phi's instead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28524 91177308-0d34-0410-b5e6-96231b3b80d8
makes it so that it constant folds instructions on the fly. This is good
for several reasons:
0. Many instructions are constant foldable after inlining, particularly if
inlining a call with constant arguments.
1. Without this, the inliner has to allocate memory for all of the instructions
that can be constant folded, then a subsequent pass has to delete them. This
gets the job done without this extra work.
2. This makes the inliner *pass* a bit more aggressive: in particular, it
partially solves a phase order issue where the inliner would inline lots
of code that folds away to nothing, but think that the resultant function
is big because of this code that will be gone. Now the code never exists.
This is the first part of a 2-step process. The second part will be smart
enough to see when this implicit constant folding propagates a constant into
a branch or switch instruction, making CFG edges dead.
This implements Transforms/Inline/inline_constprop.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28521 91177308-0d34-0410-b5e6-96231b3b80d8
the program. This exposes more opportunities for the instcombiner, and implements
vec_shuffle.ll:test6
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28487 91177308-0d34-0410-b5e6-96231b3b80d8
When doing the initial pass of constant folding, if we get a constantexpr,
simplify the constant expr like we would do if the constant is folded in the
normal loop.
This fixes the missed-optimization regression in
Transforms/InstCombine/getelementptr.ll last night.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28224 91177308-0d34-0410-b5e6-96231b3b80d8
1. Implement InstCombine/deadcode.ll by not adding instructions in unreachable
blocks (due to constants in conditional branches/switches) to the worklist.
This causes them to be deleted before instcombine starts up, leading to
better optimization.
2. In the prepass over instructions, do trivial constprop/dce as we go. This
has the effect of improving the effectiveness of #1. In addition, it
*significantly* speeds up instcombine on test cases with large amounts of
constant folding code (for example, that produced by code specialization
or partial evaluation). In one example, it speeds up instcombine from
0.0589s to 0.0224s with a release build (a 2.6x speedup).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28215 91177308-0d34-0410-b5e6-96231b3b80d8
Make the "fold (and (cast A), (cast B)) -> (cast (and A, B))" transformation
only apply when both casts really will cause code to be generated. If one or
both doesn't, then this xform doesn't remove a cast.
This fixes Transforms/InstCombine/2006-05-06-Infloop.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@28141 91177308-0d34-0410-b5e6-96231b3b80d8
nondeterminism being bad) could cause some trivial missed optimizations (dead
phi nodes being left around for later passes to clean up).
With this, llvm-gcc4 now bootstraps and correctly compares. I don't know
why I never tried to do it before... :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@27984 91177308-0d34-0410-b5e6-96231b3b80d8
Make the insert/extract elt -> shuffle code more aggressive.
This fixes CodeGen/PowerPC/vec_shuffle.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@27728 91177308-0d34-0410-b5e6-96231b3b80d8
aggressive in some cases where LLVMGCC 4 is inserting casts for no reason.
This implements InstCombine/cast.ll:test27/28.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@27620 91177308-0d34-0410-b5e6-96231b3b80d8
are visible to analysis as intrinsics. That is, make sure someone doesn't pass
free around by address in some struct (as happens in say 176.gcc).
This doesn't get rid of any indirect calls, just ensure calls to free and malloc
are always direct.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@27560 91177308-0d34-0410-b5e6-96231b3b80d8
%tmp = cast <4 x uint> %tmp to <4 x int> ; <<4 x int>> [#uses=1]
%tmp = cast <4 x int> %tmp to <4 x float> ; <<4 x float>> [#uses=1]
into:
%tmp = cast <4 x uint> %tmp to <4 x float> ; <<4 x float>> [#uses=1]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@27355 91177308-0d34-0410-b5e6-96231b3b80d8
%tmp = cast <4 x uint>* %testData to <4 x int>* ; <<4 x int>*> [#uses=1]
%tmp = load <4 x int>* %tmp ; <<4 x int>> [#uses=1]
to this:
%tmp = load <4 x uint>* %testData ; <<4 x uint>> [#uses=1]
%tmp = cast <4 x uint> %tmp to <4 x int> ; <<4 x int>> [#uses=1]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@27353 91177308-0d34-0410-b5e6-96231b3b80d8
- Added more debugging info.
- Allow reuse of IV of negative stride. e.g. -4 stride == 2 * iv of -2 stride.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@26841 91177308-0d34-0410-b5e6-96231b3b80d8
stride. For a set of uses of the IV of a stride which is a multiple
of another stride, do not insert a new IV expression. Rather, reuse the
previous IV and rewrite the uses as uses of IV expression multiplied by
the factor.
e.g.
x = 0 ...; x ++
y = 0 ...; y += 4
then use of y can be rewritten as use of 4*x for x86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@26803 91177308-0d34-0410-b5e6-96231b3b80d8
the pointer is known to come from either a global variable, alloca or
malloc. This allows us to compile this:
P = malloc(28);
memset(P, 0, 28);
into explicit stores on PPC instead of a memset call.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@26577 91177308-0d34-0410-b5e6-96231b3b80d8
Transforms/InstCombine/vec_narrow.ll. This add support for narrowing
extract_element(insertelement) also.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@26538 91177308-0d34-0410-b5e6-96231b3b80d8
Make this code more powerful by using ComputeMaskedBits instead of looking
for an AND operand. This lets us fold this:
int %test23(int %a) {
%tmp.1 = and int %a, 1
%tmp.2 = seteq int %tmp.1, 0
%tmp.3 = cast bool %tmp.2 to int ;; xor tmp1, 1
ret int %tmp.3
}
into: xor (and a, 1), 1
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@26396 91177308-0d34-0410-b5e6-96231b3b80d8
caused SPASS to fail building last night.
We can't trivially unswitch a loop if the exit block has phi nodes in it,
because we don't know which predecessor to use.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@26320 91177308-0d34-0410-b5e6-96231b3b80d8
and ComputeMaskedBits to match the new improved versions in instcombine.
Tested against all of multisource/benchmarks on ppc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@26238 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a testcase that nate reduced from spass.
Also included are a couple minor code changes that don't affect the generated
code at all.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@26235 91177308-0d34-0410-b5e6-96231b3b80d8