If a loop is not rotated (for example when optimizing for size), the latch is not the backedge. If we promote an expression to post-inc form, we not only increase register pressure and add a COPY for that IV expression but for all IVs!
Motivating testcase:
void f(float *a, float *b, float *c, int n) {
while (n-- > 0)
*c++ = *a++ + *b++;
}
It's imperative that the pointer increments be located in the latch block and not the header block; if not, we cannot use post-increment loads and stores and we have to keep both the post-inc and pre-inc values around until the end of the latch which bloats register usage.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278658 91177308-0d34-0410-b5e6-96231b3b80d8
LowerTargetConstantPool is not properly setting the TargetFlag to indicate
desired relocation. Coding error, the offset parameter was omitted, so the
TargetFlag was used as the offset, and the TargetFlag defaulted to zero.
This only affects -fpic compilation, and only those items created in a
Constant Pool, for example a vector of constants. Halide ran into this issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278614 91177308-0d34-0410-b5e6-96231b3b80d8
From the point of view of register assignment, byval parameters are
ignored: a byval parameter is not going to be assigned to a register,
and it will not affect the assignments of subsequent parameters.
When matching registers with parameters in the bit tracker, make sure
to skip byval parameters before advancing the registers.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278375 91177308-0d34-0410-b5e6-96231b3b80d8
This change makes it possible for tail-duplication and tail-merging to
be disjoint. By being less aggressive when merging during layout, there are no
overlapping cases between tail-duplication and tail-merging, provided the
thresholds are disjoint.
There is a remaining TODO to benchmark the succ_size() test for non-layout tail
merging.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278265 91177308-0d34-0410-b5e6-96231b3b80d8
Floating point instructions use general purpose registers, so the few
instructions that can put floating point immediates into registers are,
in fact, integer instruction. Use them explicitly instead of having
pseudo-instructions specifically for dealing with floating point values.
Simplify the constant loading instructions (from sdata) to have only two:
one for 32-bit values and one for 64-bit values: CONST32 and CONST64.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278244 91177308-0d34-0410-b5e6-96231b3b80d8
When the same base address is used to load two different data types, LSR
would assume a memory type of "void". This type is not sized and has no
alignment information. Checking for it causes a crash.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277601 91177308-0d34-0410-b5e6-96231b3b80d8
Identify patterns where the address is aligned to an 8-byte boundary,
but both the base address and the constant offset are both proper
multiples of 4. In such cases, extract Base+4 into a separate instruc-
tion, and use S2_storerd_io, instead of using S4_storerd_rr.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277497 91177308-0d34-0410-b5e6-96231b3b80d8
Scavenging slots were only reserved when pseudo-instruction expansion in
frame lowering created new virtual registers. It is possible to still
need a scavenging slot even if no virtual registers were created, in cases
where the stack is large enough to overflow instruction offsets.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277355 91177308-0d34-0410-b5e6-96231b3b80d8
The DAG combiner will try to merge consecutive stores into a bigger
store, unless the resulting store is not fast. Misaligned vector stores
are allowed on Hexagon, but are not fast. Add a testcase to make sure
this type of merging does not occur.
Patch by Pranav Bhandarkar.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277182 91177308-0d34-0410-b5e6-96231b3b80d8
The DAG combiner tries to merge stores to adjacent vector wide memory
locations by creating stores which are integral multiples of the vector
width. Discourage this by informing it that this is slow. This should
not affect legalization passes, because all of them ignore the "Fast"
argument.
Patch by Pranav Bhandarkar.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277178 91177308-0d34-0410-b5e6-96231b3b80d8
Software pipelining is an optimization for improving ILP by
overlapping loop iterations. Swing Modulo Scheduling (SMS) is
an implementation of software pipelining that attempts to
reduce register pressure and generate efficient pipelines with
a low compile-time cost.
This implementaion of SMS is a target-independent back-end pass.
When enabled, the pass should run just prior to the register
allocation pass, while the machine IR is in SSA form. If the pass
is successful, then the original loop is replaced by the optimized
loop. The optimized loop contains one or more prolog blocks, the
pipelined kernel, and one or more epilog blocks.
This pass is enabled for Hexagon only. To enable for other targets,
a couple of target specific hooks must be implemented, and the
pass needs to be called from the target's TargetMachine
implementation.
Differential Review: http://reviews.llvm.org/D16829
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277169 91177308-0d34-0410-b5e6-96231b3b80d8
If the mask of a vector shuffle has alternating odd or even numbers
starting with 1 or 0 respectively up to the largest possible index
for the given type in the given HVX mode (single of double) we can
generate vpacko or vpacke instruction respectively.
E.g.
%42 = shufflevector <32 x i16> %37, <32 x i16> %41,
<32 x i32> <i32 1, i32 3, ..., i32 63>
is %42.h = vpacko(%41.w, %37.w)
Patch by Pranav Bhandarkar.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277168 91177308-0d34-0410-b5e6-96231b3b80d8
Rebalances address calculation trees and applies Hexagon-specific
optimizations to the trees to improve instruction selection.
Patch by Tobias Edler von Koch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277151 91177308-0d34-0410-b5e6-96231b3b80d8
Normally, CFI instructions should be inserted after allocframe, but
if allocframe is in the same packet with a call, the CFI instructions
should be inserted before that packet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277020 91177308-0d34-0410-b5e6-96231b3b80d8
Before adding a new preheader block, check if there is a candidate block
where the loop setup could be placed speculatively. This will be off by
default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276919 91177308-0d34-0410-b5e6-96231b3b80d8
- Generate vector post-increment stores more aggressively.
- Predicate post-increment and vector stores in early if-conversion.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276800 91177308-0d34-0410-b5e6-96231b3b80d8
Consider this case:
vreg1 = A2_zxth vreg0 (1)
...
vreg2 = A2_zxth vreg1 (2)
Redundant instruction elimination could delete the instruction (1)
because the user (2) only cares about the low 16 bits. Then it could
delete (2) because the input is already zero-extended. The problem
is that the properties allowing each individual instruction to be
deleted depend on the existence of the other instruction, so either
one can be deleted, but not both.
The existing check for this situation in RIE was insufficient. The
fix is to update all dependent cells when an instruction is removed
(replaced via COPY) in RIE.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276792 91177308-0d34-0410-b5e6-96231b3b80d8
Change the bit simplifier to generate REG_SEQUENCE instructions in
addition to COPY, which will handle cases of word insert/extract.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276787 91177308-0d34-0410-b5e6-96231b3b80d8
Schedule a load and its use in the same packet in MISched. Previously,
isResourceAvailable was returning false for dependences in the same
packet, which prevented MISched from packetizing a load and its use in
the same packet for v60.
Patch by Ikhlas Ajbar.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275804 91177308-0d34-0410-b5e6-96231b3b80d8
The machine scheduler needs to account for available resources
more accurately in order to avoid scheduling an instruction that
forces a new packet to be created.
This occurs in two ways: First, an instruction without an available
resource may have a large priority due to other metrics and be
scheduled when there are other instructions with available resources.
Second, an instruction with a non-zero latency may become available
prematurely. In both these cases, we attempt change the priority
in order to allow a better instruction to be scheduled.
Patch by Brendon Cahoon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275793 91177308-0d34-0410-b5e6-96231b3b80d8
An instruction may have multiple predecessors that are candidates
for using .cur. However, only one of them can use .cur in the
packet. When this case occurs, we need to make sure that only
one of the dependences gets a 0 latency value.
Patch by Brendon Cahoon.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275790 91177308-0d34-0410-b5e6-96231b3b80d8
- Treat bitwise OR with a frame index as an ADD wherever possible, fold it
into addressing mode.
- Extend patterns for memops to allow memops with frame indexes as address
operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275569 91177308-0d34-0410-b5e6-96231b3b80d8
On Hexagon is it legal to packetize the instructions setting up call
arguments with the call instruction itself. This was already done,
except for tail calls. Make sure tail calls are handled as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275458 91177308-0d34-0410-b5e6-96231b3b80d8
Transform: (store ch addr (add x (add (shl y c) e)))
to: (store ch addr (add x (shl (add y d) c))),
where e = (shl d c) for some integer d.
The purpose of this is to enable generation of loads/stores with
shifted addressing mode, i.e. mem(x+y<<#c). For that, the shift
value c must be 0, 1 or 2.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273466 91177308-0d34-0410-b5e6-96231b3b80d8
The aggressive anti-dependency breaker can rename the restored callee-
saved registers. To prevent this, mark these registers are live on all
paths to the return/tail-call instructions, and add implicit use operands
for them to these instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270898 91177308-0d34-0410-b5e6-96231b3b80d8
When looking for an available spill slot, the register scavenger would stop
after finding the first one with no register assigned to it. That slot may
have size and alignment that do not meet the requirements of the register
that is to be spilled. Instead, find an available slot that is the closest
in size and alignment to one that is needed to spill a register from RC.
Differential Revision: http://reviews.llvm.org/D20295
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269969 91177308-0d34-0410-b5e6-96231b3b80d8