a special case hack for X86, make the hack more general: if an incoming argument
register is not used in any block other than the entry block, don't copy it to
a vreg. This helps us compile code like this:
%struct.foo = type { int, int, [0 x ubyte] }
int %test(%struct.foo* %X) {
%tmp1 = getelementptr %struct.foo* %X, int 0, uint 2, int 100
%tmp = load ubyte* %tmp1 ; <ubyte> [#uses=1]
%tmp2 = cast ubyte %tmp to int ; <int> [#uses=1]
ret int %tmp2
}
to:
_test:
lbz r3, 108(r3)
blr
instead of:
_test:
lbz r2, 108(r3)
or r3, r2, r2
blr
The (dead) copy emitted to copy r3 into a vreg for extra-block uses was
increasing the live range of r3 past the load, preventing the coallescing.
This implements CodeGen/PowerPC/reg-coallesce-simple.ll
llvm-svn: 24115
generating results in vregs that will need them. In the case of something
like this: CopyToReg((add X, Y), reg1024), we no longer emit code like
this:
reg1025 = add X, Y
reg1024 = reg 1025
Instead, we emit:
reg1024 = add X, Y
Whoa! :)
llvm-svn: 24111
When inserting code for an addrec expression with a non-unit stride, be
more careful where we insert the multiply. In particular, insert the multiply
in the outermost loop we can, instead of the requested insertion point.
This allows LSR to notice the mul in the right loop, reducing it when it gets
to it. This allows it to reduce the multiply, where before it missed it.
This happens quite a bit in the test suite, for example, eliminating 2
multiplies in art, 3 in ammp, 4 in apsi, reducing from 1050 multiplies to
910 muls in galgel (!), from 877 to 859 in applu, and 36 to 30 in bzip2.
This speeds up galgel from 16.45s to 16.01s, applu from 14.21 to 13.94s and
fourinarow from 66.67s to 63.48s.
This implements Transforms/LoopStrengthReduce/nested-reduce.ll
llvm-svn: 24102
DAG instruction selector, which should be destroyed one day (in the pattern
isel also) since ia64 can pack any constant in the instruction stream
llvm-svn: 24094
use -enable-ia64-dag-isel to turn this on
TODO: delete lowering stuff from the pattern isel
: get operations on predicate bits working
: get other bits of pseudocode going
: use sampo's mulh/mull-using divide-by-constant magic
: *so* many patterns ("extr", "tbit" and "dep" will be fun :)
: add FP
: add a JIT!
: get it working 100%
in short: this'll be happier in a couple of weeks, but it's here now so
the tester can make me feel guilty sooner.
OTHER: there are a couple of fixes to the pattern isel, in particular
making the linker happy with big blobs of fun like pypy.
llvm-svn: 24058