store r12 -> [ss#2]
R3 = load [ss#1]
use R3
R3 = load [ss#2]
R4 = load [ss#1]
and turn it into this code:
store R12 -> [ss#2]
R3 = load [ss#1]
use R3
R3 = R12
R4 = R3 <- oops!
The problem was that promoting R3 = load[ss#2] to a copy missed the fact that
the instruction invalidated R3 at that point.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23638 91177308-0d34-0410-b5e6-96231b3b80d8
with the dag combiner. This speeds up espresso by 8%, reaching performance
parity with the dag-combiner-disabled llc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23636 91177308-0d34-0410-b5e6-96231b3b80d8
dead node elim and dag combiner passes where the root is potentially updated.
This fixes a fixme in the dag combiner.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23634 91177308-0d34-0410-b5e6-96231b3b80d8
that testcase still does not pass with the dag combiner. This is because
not all forms of br* are folded yet.
Also, when we combine a node into another one, delete the node immediately
instead of waiting for the node to potentially come up in the future.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23632 91177308-0d34-0410-b5e6-96231b3b80d8
Since calls return more than one value, don't bail if one of their uses
happens to be a node that's not an MVT::Other when following the chain
from CALLSEQ_START to CALLSEQ_END.
Once we've found a CALLSEQ_START, we can just return; there's no need to
tail-recurse further up the graph.
Most importantly, just because something only has one use doesn't mean we
should use it's one use to follow from start to end. This faulty logic
caused us to follow a chain of one-use FP operations back to a much earlier
call, putting a cycle in the graph from a later start to an earlier end.
This is a better fix that reverting to the workaround committed earlier
today.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23620 91177308-0d34-0410-b5e6-96231b3b80d8
Neither of us have yet figured out why this code is necessary, but stuff
breaks if its not there. Still tracking this down...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23617 91177308-0d34-0410-b5e6-96231b3b80d8
large basic blocks because it was purely recursive. This switches it to an
iterative/recursive hybrid.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23596 91177308-0d34-0410-b5e6-96231b3b80d8
For instructions that define multiple results, use the right regclass
to define the result, not always the rc of result #0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23580 91177308-0d34-0410-b5e6-96231b3b80d8
2. Added node groups to handle flagged nodes.
3. Started weaning simple scheduling off existing emitter.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23566 91177308-0d34-0410-b5e6-96231b3b80d8
Though I have done extensive testing, it is possible that this will break
things in configs I can't test. Please let me know if this causes a problem
and I'll fix it ASAP.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23504 91177308-0d34-0410-b5e6-96231b3b80d8
for testing and will require target machine info to do a proper scheduling.
The simple scheduler can be turned on using -sched=simple (defaults
to -sched=none)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23455 91177308-0d34-0410-b5e6-96231b3b80d8
This happens all the time on PPC for bool values, e.g. eliminating a xori
in inverted-bool-compares.ll.
This should be added to the dag combiner as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23403 91177308-0d34-0410-b5e6-96231b3b80d8
when possible, avoiding the load (and avoiding the copy if the value is already
in the right register).
This patch came about when I noticed code like the following being generated:
store R17 -> [SS1]
...blah...
R4 = load [SS1]
This was causing an LSU reject on the G5. This problem was due to the register
allocator folding spill code into a reg-reg copy (producing the load), which
prevented the spiller from being able to rewrite the load into a copy, despite
the fact that the value was already available in a register. In the case
above, we now rip out the R4 load and replace it with a R4 = R17 copy.
This speeds up several programs on X86 (which spills a lot :) ), e.g.
smg2k from 22.39->20.60s, povray from 12.93->12.66s, 168.wupwise from
68.54->53.83s (!), 197.parser from 7.33->6.62s (!), etc. This may have a larger
impact in some cases on the G5 (by avoiding LSU rejects), though it probably
won't trigger as often (less spilling in general).
Targets that implement folding of loads/stores into copies should implement
the isLoadFromStackSlot hook to get this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23388 91177308-0d34-0410-b5e6-96231b3b80d8
select (x < y), 1, 0 -> (x < y) incorrectly: the setcc returns i1 but the
select returned i32. Add the zero extend as needed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23301 91177308-0d34-0410-b5e6-96231b3b80d8
are, simplify logic, and cause things to not be nested as deeply. This also
uses MRI->areAliases instead of an explicit loop.
No functionality change, just code cleanup.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23296 91177308-0d34-0410-b5e6-96231b3b80d8
only add a reload live range once for the instruction. This is one step
towards fixing a regalloc pessimization that Nate notice, but is later undone
by the spiller (so no code is changed).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23293 91177308-0d34-0410-b5e6-96231b3b80d8
This restores all of stanford to being identical with and without the dag
combiner with the add folding turned off in sd.cpp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23258 91177308-0d34-0410-b5e6-96231b3b80d8
we were losing a node, causing an assertion to fail. Now we eagerly delete
discovered CSE's, and provide an optional vector to keep track of these
discovered equivalences.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23255 91177308-0d34-0410-b5e6-96231b3b80d8
I have run so far when run before Legalize. It still needs to pick up the
SetCC folds, and nodes that use SetCC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23243 91177308-0d34-0410-b5e6-96231b3b80d8
i64 values on targets that need that expanded to 32-bit registers. This fixes
PowerPC/2005-09-02-LegalizeDuplicatesCalls.ll and speeds up 189.lucas from
taking 122.72s to 81.96s on my desktop.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23228 91177308-0d34-0410-b5e6-96231b3b80d8
from the binary ops map, even if they had multiple results. This latent bug
caused a few failures with the dag isel last night.
To prevent stuff like this from happening in the future, add some really
strict checking to make sure that the CSE maps always match up with reality!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23221 91177308-0d34-0410-b5e6-96231b3b80d8
instead of ZERO_EXTEND to eliminate extraneous extensions. This eliminates
dead zero extensions on formal arguments and other cases on PPC, implementing
the newly tightened up test/Regression/CodeGen/PowerPC/small-arguments.ll test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23205 91177308-0d34-0410-b5e6-96231b3b80d8
over to DAGCombiner.cpp
1. Don't assume that SetCC returns i1 when folding (xor (setcc) constant)
2. Don't duplicate code in folding AND with AssertZext that is handled by
MaskedValueIsZero
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23196 91177308-0d34-0410-b5e6-96231b3b80d8
be mostly functional. It currently has all folds from SelectionDAG.cpp
that do not involve a condition code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23184 91177308-0d34-0410-b5e6-96231b3b80d8
them. This allows for elminination of redundant extends in the entry
blocks of functions on PowerPC.
Add support for i32 x i32 -> i64 multiplies, by recognizing when the inputs
to ISD::MUL in ExpandOp are actually just extended i32 values and not real
i64 values. this allows us to codegen
int mulhs(int a, int b) { return ((long long)a * b) >> 32; }
as:
_mulhs:
mulhw r3, r4, r3
blr
instead of:
_mulhs:
mulhwu r2, r4, r3
srawi r5, r3, 31
mullw r5, r4, r5
add r2, r2, r5
srawi r4, r4, 31
mullw r3, r4, r3
add r3, r2, r3
blr
with a similar improvement on x86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23147 91177308-0d34-0410-b5e6-96231b3b80d8
token chains first. For this C function:
int test() {
int i;
for (i = 0; i < 100000; ++i)
foo();
}
Instead of emitting this (condition before call)
.LBB_test_1: ; no_exit
addi r30, r30, 1
lis r2, 1
ori r2, r2, 34464
cmpw cr2, r30, r2
bl L_foo$stub
bne cr2, .LBB_test_1 ; no_exit
Emit this:
.LBB_test_1: ; no_exit
bl L_foo$stub
addi r30, r30, 1
lis r2, 1
ori r2, r2, 34464
cmpw cr0, r30, r2
bne cr0, .LBB_test_1 ; no_exit
Which makes it so we don't have to save/restore cr2 in the prolog/epilog of
the function.
This also makes the code much more similar to what the pattern isel produces.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23135 91177308-0d34-0410-b5e6-96231b3b80d8
putting it into the constant pool. This allows the isel machinery to
create constants that it will end up deciding are not needed, without them
ending up in the resultant function constant pool.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23081 91177308-0d34-0410-b5e6-96231b3b80d8
select. Also teach it that the bit count instructions can only set the low bits
of the result, depending on the size of the input.
This allows us to compile this:
int %eq0(int %a) {
%tmp.1 = seteq int %a, 0 ; <bool> [#uses=1]
%tmp.2 = cast bool %tmp.1 to int ; <int> [#uses=1]
ret int %tmp.2
}
To this:
_eq0:
cntlzw r2, r3
srwi r3, r2, 5
blr
instead of this:
_eq0:
cntlzw r2, r3
rlwinm r3, r2, 27, 31, 31
blr
when setcc is marked illegal on ppc (which restores parity to non-illegal
setcc). Thanks to Nate for pointing this out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@23013 91177308-0d34-0410-b5e6-96231b3b80d8
Use this information to avoid doing expensive interval intersections for
registers that could not possible be interesting. This speeds up linscan
on ia64 compiling kc++ in release mode from taking 7.82s to 4.8s(!), total
itanium llc time on this program is 27.3s now. This marginally speeds up
PPC and X86, but they appear to be limited by other parts of linscan, not
this code.
On this program, on itanium, live intervals now takes 41% of llc time.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22986 91177308-0d34-0410-b5e6-96231b3b80d8
number of regs (e.g. most riscs), many functions won't need to use callee
clobbered registers. Do a speculative check to see if we can get a free
register without processing the fixed list (which has all of these). This
saves a lot of time on machines with lots of callee clobbered regs (e.g.
ppc and itanium, also x86).
This reduces ppc llc compile time from 184s -> 172s on kc++. This is probably
worth FAR FAR more on itanium though.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22972 91177308-0d34-0410-b5e6-96231b3b80d8
we spill out of the fast path. The scan of active_ and the calls to
updateSpillWeights don't need to happen unless a spill occurs. This reduces
debug llc time of kc++ with ppc from 187.3s to 183.2s.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22971 91177308-0d34-0410-b5e6-96231b3b80d8
the old condition to a one bit value. The incoming value must have been
promoted, and the top bits are undefined. This causes us to generate:
_test:
rlwinm r2, r3, 0, 31, 31
li r3, 17
cmpwi cr0, r2, 0
bne .LBB_test_2 ;
.LBB_test_1: ;
li r3, 1
.LBB_test_2: ;
blr
instead of:
_test:
rlwinm r2, r3, 0, 31, 31
li r2, 17
cmpwi cr0, r3, 0
bne .LBB_test_2 ;
.LBB_test_1: ;
li r2, 1
.LBB_test_2: ;
or r3, r2, r2
blr
for:
int %test(bool %c) {
%retval = select bool %c, int 17, int 1
ret int %retval
}
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22947 91177308-0d34-0410-b5e6-96231b3b80d8
This gets us this for the previous testcase:
_test:
lis r2, 0
ori r3, r2, 65535
blr
Note that we actually write to r3 (the return reg) correctly now :)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22933 91177308-0d34-0410-b5e6-96231b3b80d8
temporary registers for things that define a register. This allows dag->dag
isel to compile this:
int %test() { ret int 65535 }
into:
_test:
lis r2, 0
ori r2, r2, 65535
blr
Next up, getting CopyFromReg to work, allowing arguments and cross-bb values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@22932 91177308-0d34-0410-b5e6-96231b3b80d8