that need them. This is very useful on CISCy targets like the X86 because it
reduces the total spill pressure, and makes better use of it's (large)
instruction set. Though the X86 backend doesn't know how to rewrite many
instructions yet, this already makes a substantial difference on 176.gcc for
example:
Before:
Time:
8.0099 ( 31.2%) 0.0100 ( 12.5%) 8.0199 ( 31.2%) 7.7186 ( 30.0%) Local Register Allocator
Code quality:
734559 asm-printer - Number of machine instrs printed
111395 ra-local - Number of registers reloaded
79902 ra-local - Number of registers spilled
231554 x86-peephole - Number of peephole optimization performed
After:
Time:
7.8700 ( 30.6%) 0.0099 ( 19.9%) 7.8800 ( 30.6%) 7.7892 ( 30.2%) Local Register Allocator
Code quality:
733083 asm-printer - Number of machine instrs printed
2379 ra-local - Number of reloads fused into instructions
109046 ra-local - Number of registers reloaded
79881 ra-local - Number of registers spilled
230658 x86-peephole - Number of peephole optimization performed
So by fusing 2300 instructions, we reduced the static number of instructions
by 1500, and reduces the number of peepholes (and thus the work) by about 900.
This also clearly reduces the number of reload/spill instructions that are
emitted.
llvm-svn: 11542
MRegisterInfo::getNumRegs() instead of
MRegisterInfo::FirstVirtualRegister.
Also use MRegisterInfo::is{Physical,Virtual}Register where
appropriate.
llvm-svn: 11477
Rename SetMachineOperandConst's formal parameters to match other methods here.
Mark some methods as being used only by the SPARC back-end.
Fix a missing-paren bug in OutputValue().
llvm-svn: 11363
MachineBasicBlock. Also change opcode to a short and numImplicitRefs
to an unsigned char so that overall MachineInstr's size stays the
same.
llvm-svn: 11357
ilist of MachineInstr objects. This allows constant time removal and
insertion of MachineInstr instances from anywhere in each
MachineBasicBlock. It also allows for constant time splicing of
MachineInstrs into or out of MachineBasicBlocks.
llvm-svn: 11340
instead of randomly groping about inside its outEdges array.
Make SchedGraph::addDummyEdges() use getNumOutEdges() instead of
outEdges.size().
Get rid of ifdefed-out code in SchedGraph::buildGraph().
llvm-svn: 11238
the Virt2PhysRegMap std::map with an std::vector. This speeds up the
register allocator another (almost) 40%, from .72->.45s in a release build
of LLC on 253.perlbmk.
llvm-svn: 11219
from physical registers, and they are always dense, it makes sense to not have
a ton of RBtree overhead. This change speeds up regalloclocal about ~30% on
253.perlbmk, from .35s -> .27s in the JIT (in LLC, it goes from .74 -> .55).
Now live variable analysis is the slowest codegen pass. Of course it doesn't
help that we have to run it twice, because regalloclocal doesn't update it,
but even if it did it would be the slowest pass (now it's just the 2x slowest
pass :(
llvm-svn: 11215
slots each. As a concequence they get numbered as 0, 2, 4 and so
on. The first slot is used for operand uses and the second for
defs. Here's an example:
0: A = ...
2: B = ...
4: C = A + B ;; last use of A
The live intervals should look like:
A = [1, 5)
B = [3, x)
C = [5, y)
llvm-svn: 11141