This pass precomputes CFG block frequency information that can be used by the
register allocator to find optimal spill code placement.
Given an interference pattern, placeSpills() will compute which basic blocks
should have the current variable enter or exit in a register, and which blocks
prefer the stack.
The algorithm is ready to consume block frequencies from profiling data, but for
now it gets by with the static estimates used for spill weights.
This is a work in progress and still not hooked up to RegAllocGreedy.
llvm-svn: 122938
up freebsd bootloader. However, this doesn't make much sense for Darwin, whose
-Os is meant to optimize for size only if it doesn't hurt performance.
rdar://8821501
llvm-svn: 122936
The analysis will be needed by both the greedy register allocator and the
X86FloatingPoint pass. It only needs to be computed once when the CFG doesn't
change.
This pass is very fast, usually showing up as 0.0% wall time.
llvm-svn: 122832
This allows us to compile:
void test(char *s, int a) {
__builtin_memset(s, a, 15);
}
into 1 mul + 3 stores instead of 3 muls + 3 stores.
llvm-svn: 122710
We could implement a DAGCombine to turn x * 0x0101 back into logic operations
on targets that doesn't support the multiply or it is slow (p4) if someone cares
enough.
Example code:
void test(char *s, int a) {
__builtin_memset(s, a, 4);
}
before:
_test: ## @test
movzbl 8(%esp), %eax
movl %eax, %ecx
shll $8, %ecx
orl %eax, %ecx
movl %ecx, %eax
shll $16, %eax
orl %ecx, %eax
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
after:
_test: ## @test
movzbl 8(%esp), %eax
imull $16843009, %eax, %eax ## imm = 0x1010101
movl 4(%esp), %ecx
movl %eax, 4(%ecx)
movl %eax, (%ecx)
ret
llvm-svn: 122707
when running without the verifier, and I have not yet checked them to see if
the new results are still correct. There are more verifier failures, but they
all seem to be additional occurrences of verifier failures that occur with the
existing PHIElimination pass. There are a few obvious issues with the code:
1) It doesn't properly update the register equivalence classes during copy
insertion, and instead recomputes them before merging live intervals and
renaming registers. I wanted to keep this first patch simple for debugging
purposes, but it shouldn't be very hard to do this.
2) It doesn't mix the renaming and live interval merging with the copy insertion
process, which leads to a lot of virtual register churn. Virtual registers and
live intervals are created, only to later be merged into others. The code should
be smarter and only create a new virtual register if there is no existing
register in the same congruence class.
3) In one place the code uses a DenseMap per basic block, which is unnecessary
heap allocation. There should be an inline storage version of DenseMap.
I did a quick compile-time test of running llc on 403.gcc with and without
StrongPHIElimination. It is slightly slower with StrongPHIElimination, because
the small decrease in the coalescer runtime can't beat the increase in phi
elimination runtime. Perhaps fixing the above performance issues will narrow
the gap.
I also haven't yet run any tests of the quality of the generated code.
llvm-svn: 122582
valno verification. The "Different value live out of predecessor" check is
incorrect in the case of phi-def valnos, so just skip that check for phi-def
valnos and instead check that all of the valnos for predecessors have phi-kill.
Fixes PR8863.
llvm-svn: 122581
DAG scheduling during isel. Most new functionality is currently
guarded by -enable-sched-cycles and -enable-sched-hazard.
Added InstrItineraryData::IssueWidth field, currently derived from
ARM itineraries, but could be initialized differently on other targets.
Added ScheduleHazardRecognizer::MaxLookAhead to indicate whether it is
active, and if so how many cycles of state it holds.
Added SchedulingPriorityQueue::HasReadyFilter to allowing gating entry
into the scheduler's available queue.
ScoreboardHazardRecognizer now accesses the ScheduleDAG in order to
get information about it's SUnits, provides RecedeCycle for bottom-up
scheduling, correctly computes scoreboard depth, tracks IssueCount, and
considers potential stall cycles when checking for hazards.
ScheduleDAGRRList now models machine cycles and hazards (under
flags). It tracks MinAvailableCycle, drives the hazard recognizer and
priority queue's ready filter, manages a new PendingQueue, properly
accounts for stall cycles, etc.
llvm-svn: 122541
In the bottom-up selection DAG scheduling, handle two-address
instructions that read/write unspillable registers. Treat
the entire chain of two-address nodes as a single live range.
llvm-svn: 122472
loads properly. We miscompiled the testcase into:
_test: ## @test
movl $128, (%rdi)
movzbl 1(%rdi), %eax
ret
Now we get a proper:
_test: ## @test
movl $128, (%rdi)
movsbl (%rdi), %eax
movzbl %ah, %eax
ret
This fixes PR8757.
llvm-svn: 122392