"minimal" SSA form (in other words, it doesn't insert dead PHIs). This
speeds up the mem2reg pass very significantly because it doesn't have to
do a lot of frivolous work in many common cases.
In the 252.eon function I have been playing with, this doesn't even insert
the 120 PHI nodes that it used to which were trivially dead (in the process
of promoting 356 alloca instructions overall). This speeds up the mem2reg
pass from 1.2459s to 0.1284s. More significantly, the DCE pass used to take
2.4138s to remove the 120 dead PHI nodes that mem2reg constructed, now it
takes 0.0134s (which is the time to scan the function and decide that there
is nothing dead). So overall, on this one function, we speed things up a
total of 3.5179s, which is a 24.8x speedup! :)
This change is tested by the Mem2Reg/2003-10-05-DeadPHIInsertion.ll test,
which now passes.
llvm-svn: 8884
basic block. This is amazingly common in code generated by the C/C++ front-ends.
This change makes it not have to insert ANY phi nodes, whereas before it would insert
a ton of dead ones which DCE would have to clean up.
Thus, this fix improves compile-time performance of these trivial allocas in two ways:
1. It doesn't have to do the walking and book-keeping for renaming
2. It does not insert dead phi nodes for them which would have to
subsequently be cleaned up.
On my favorite testcase from 252.eon, this special case handles 305 out of
356 promoted allocas in the function. It speeds up the mem2reg pass from 7.5256s
to 1.2505s. It inserts 677 fewer dead PHI nodes, which speeds up a subsequent
-dce pass from 18.7524s to 2.4806s.
There are still 120 trivially dead PHI nodes being inserted for variables used
in multiple basic blocks, but they are not handled by this patch.
llvm-svn: 8881
*** Revamp the code which handled unreachable code in the function. Now the
code is much more efficient for high-degree basic blocks, such as those
that occur in the 252.eon SPEC benchmark.
For the interested, the time to promote a SINGLE alloca in _ZN7mrScene4ReadERSi
function used to be > 3.5s. Now it is < .075s. The function has a LOT of
allocas in it, so it appeared to be infinite looping, this should make it much
nicer. :)
llvm-svn: 8863
work-list of value definitions. This allows elimination of the explicit
'iterative' step of the algorithm, and also reuses temporary memory better.
llvm-svn: 8861
* Do not insert a new entry into NewPhiNodes during the rename pass if there are no PHIs in a block.
* Do not compute WriteSets in parallel
llvm-svn: 8858
* Eliminate the KillList instance variable, instead, just delete loads and
stores as they are "renamed", and delete allocas when they are done
* Make the 'visited' set an instance variable to avoid passing it on the stack.
llvm-svn: 8857
constants as necessary due to type resolution. With this change, the
following spec benchmarks now link: 176.gcc, 177.mesa, 252.eon,
253.perlbmk, & 300.twolf. IOW, all SPEC INT and FP benchmarks now link.
llvm-svn: 8853
machinery. This dramatically simplifies how things works, removes irritating
little corner cases, and overall improves speed and reliability.
Highlights of this change are:
1. The exponential algorithm built into the code is now gone. For example
the time to disassemble one bytecode file from the mesa benchmark went
from taking 12.5s to taking 0.16s.
2. The linker bugs should be dramatically reduced. The one remaining bug
has to do with constant handling, which I actually introduced in
"union-find" checkins.
3. The code is much easier to follow, as a result of fewer special cases.
It's probably also smaller. yaay.
llvm-svn: 8842
This makes use of the new PATypeHolder's to keep types from being deleted
prematurely, instead of the wierd "self reference" garbage. This is easier
to understand and more efficient as well.
llvm-svn: 8834