Summary:
When performing cmp for EQ/NE and the operand is sign extended, we can
avoid the truncaton if the bits to be tested are no less than origianl
bits.
Reviewers: eli.friedman
Subscribers: eli.friedman, aemerson, nemanjai, t.p.northover, llvm-commits
Differential Revision: https://reviews.llvm.org/D22933
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277252 91177308-0d34-0410-b5e6-96231b3b80d8
These come in two variants for now: G_INTRINSIC and G_INTRINSIC_W_SIDE_EFFECTS.
We may decide to split the latter up with finer-grained restrictions later, if
necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277224 91177308-0d34-0410-b5e6-96231b3b80d8
Up until now, we only had code to match PSADBW patterns that look like what
comes out of the loop vectorizer - a partial reduction inside the loop body
that gets fed into a horizontal operation in a different basic block.
This adds support for straight-line patterns, like those generated by the
SLP vectorizer.
Differential Revision: https://reviews.llvm.org/D22889
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277219 91177308-0d34-0410-b5e6-96231b3b80d8
Support for lowering to VBROADCASTF128 etc. in D22460 was not correctly ensuring that the only users of the 128-bit vector load were the insertions of the vector into the lower/upper subvectors.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277214 91177308-0d34-0410-b5e6-96231b3b80d8
This will be used during GlobalISel, where we need a more robust and readable
way to write tests than a simple immediate ID.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277209 91177308-0d34-0410-b5e6-96231b3b80d8
Patch by Sunita Marathe
Third try, now following fixes to MSan to handle mempcy in such a way that this commit won't break the MSan buildbots. (Thanks, Evegenii!)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277189 91177308-0d34-0410-b5e6-96231b3b80d8
The following pattern was being layed out poorly:
A
/ \
B C
/ \ / \
D E ? (Doesn't matter)
Where A->B is far more likely than A->C, and prob(B->D) = prob(B->E)
The current algorithm gives:
A,B,C,E (D goes on worklist)
It does this even if C has a frequency count of 0. This patch
adjusts the layout calculation so that if freq(B->E) >> freq(C->E)
then we go ahead and layout E rather than C. Fallthrough half the time
is better than fallthrough never, or fallthrough very rarely. The
resulting layout is:
A,B,E, (C and D are in a worklist)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277187 91177308-0d34-0410-b5e6-96231b3b80d8
Add branch weights to a few tests that aren't testing layout to make them less
sensitive to changes in the layout algorithm.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277186 91177308-0d34-0410-b5e6-96231b3b80d8
Just the basic equivalent to DAG's condbr for now, we'll get to things like
br_cc when we start doing more legalization.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277184 91177308-0d34-0410-b5e6-96231b3b80d8
The DAG combiner will try to merge consecutive stores into a bigger
store, unless the resulting store is not fast. Misaligned vector stores
are allowed on Hexagon, but are not fast. Add a testcase to make sure
this type of merging does not occur.
Patch by Pranav Bhandarkar.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277182 91177308-0d34-0410-b5e6-96231b3b80d8
The DAG combiner tries to merge stores to adjacent vector wide memory
locations by creating stores which are integral multiples of the vector
width. Discourage this by informing it that this is slow. This should
not affect legalization passes, because all of them ignore the "Fast"
argument.
Patch by Pranav Bhandarkar.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277178 91177308-0d34-0410-b5e6-96231b3b80d8
Mostly straightforward as we ignore addressing modes and just
use the base + unsigned immediate offset (always 0) variants.
This currently fails to select extloads because we have yet to
agree on a representation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277171 91177308-0d34-0410-b5e6-96231b3b80d8
Software pipelining is an optimization for improving ILP by
overlapping loop iterations. Swing Modulo Scheduling (SMS) is
an implementation of software pipelining that attempts to
reduce register pressure and generate efficient pipelines with
a low compile-time cost.
This implementaion of SMS is a target-independent back-end pass.
When enabled, the pass should run just prior to the register
allocation pass, while the machine IR is in SSA form. If the pass
is successful, then the original loop is replaced by the optimized
loop. The optimized loop contains one or more prolog blocks, the
pipelined kernel, and one or more epilog blocks.
This pass is enabled for Hexagon only. To enable for other targets,
a couple of target specific hooks must be implemented, and the
pass needs to be called from the target's TargetMachine
implementation.
Differential Review: http://reviews.llvm.org/D16829
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277169 91177308-0d34-0410-b5e6-96231b3b80d8
If the mask of a vector shuffle has alternating odd or even numbers
starting with 1 or 0 respectively up to the largest possible index
for the given type in the given HVX mode (single of double) we can
generate vpacko or vpacke instruction respectively.
E.g.
%42 = shufflevector <32 x i16> %37, <32 x i16> %41,
<32 x i32> <i32 1, i32 3, ..., i32 63>
is %42.h = vpacko(%41.w, %37.w)
Patch by Pranav Bhandarkar.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277168 91177308-0d34-0410-b5e6-96231b3b80d8
Rebalances address calculation trees and applies Hexagon-specific
optimizations to the trees to improve instruction selection.
Patch by Tobias Edler von Koch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277151 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Implements fastLowerArguments() to avoid the need to fall back on
SelectionDAG for 0-4 argument functions that don't do tricky things like
passing double in a pair of i32's.
This allows us to move all except one test to -fast-isel-abort=3. The
remaining one has function prototypes of the form 'i32 (i32, double, double)'
which requires floats to be passed in GPR's.
The previous commit had an uninitialized variable that caused the incoming
argument region to have undefined size. This has been fixed.
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D22680
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277136 91177308-0d34-0410-b5e6-96231b3b80d8
We currently default to using either generic shuffles or MASK+PACKUS/PACKSS to truncate all integer vectors. For vector comparisons, we know that the result will be either all or zero bits in every element, which can be efficiently truncated by directly using PACKSS to repeatedly halve the size of each element.
Due to the limited input values (-1 or 0) we don't need to account for vector element size, so for simplicity we just use the PACKSS(vXi16,vXi16) implementation in all cases. Additionally for AVX2 PACKSS of 256bit data we must perform a PERMQ shuffle to reorder the data into the correct order. I did investigate performing a single shuffle after all the PACKSS calls but the need to cross 128bit lanes makes this difficult to achieve efficiently.
We avoid performing this on AVX512 as it should have better alternative truncation instructions.
Differential Revision: https://reviews.llvm.org/D22814
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277132 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The MOV/MOVT instructions being chosen for struct_byval predicates was
conditional only on Thumb2, resulting in an ARM MOV/MOVT instruction
being incorrectly emitted in Thumb1 mode. This is especially apparent
with v8-m.base targets. This patch ensures that Thumb instructions are
emitted in both Thumb modes.
Reviewers: rengolin, t.p.northover
Subscribers: llvm-commits, aemerson, rengolin
Differential Revision: https://reviews.llvm.org/D22865
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277128 91177308-0d34-0410-b5e6-96231b3b80d8
I'm not convinced the patterns for the rm_Int was correct anyway. It had a tied source that should't exist for the unmasked version. The load form of MOVSS always zeros the most significant bits. I've left the patterns off the masked load instructions as I'm not sure what the correct pattern should be and we don't have any tests currently. Nor do we implement masked scalar load intrinsics in clang currently.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277098 91177308-0d34-0410-b5e6-96231b3b80d8
Normally, CFI instructions should be inserted after allocframe, but
if allocframe is in the same packet with a call, the CFI instructions
should be inserted before that packet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277020 91177308-0d34-0410-b5e6-96231b3b80d8
Since r276158, we require generic instructions to have a sized type.
G_BR doesn't; relax the restriction.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277006 91177308-0d34-0410-b5e6-96231b3b80d8
LLT() has a particular meaning: it's one invalid type. But we really
want selected instructions to have no type whatsoever.
Also verify that types don't linger after ISel, and enable the verifier
on the AArch64 select test.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277001 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Implements fastLowerArguments() to avoid the need to fall back on
SelectionDAG for 0-4 argument functions that don't do tricky things like
passing double in a pair of i32's.
This allows us to move all except one test to -fast-isel-abort=3. The
remaining one has function prototypes of the form 'i32 (i32, double, double)'
which requires floats to be passed in GPR's.
Reviewers: sdardis
Subscribers: dsanders, llvm-commits, sdardis
Differential Revision: https://reviews.llvm.org/D22680
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276982 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
SI_ELSE is lowered into two parts:
s_or_saveexec_b64 dst, src (at the start of the basic block)
s_xor_b64 exec, exec, dst (at the end of the basic block)
The idea is that dst contains the exec mask of the preceding IF block. It can
happen that SIWholeQuadMode decides to switch from WQM to Exact mode inside
the basic block that contains SI_ELSE, in which case it introduces an instruction
s_and_b64 exec, exec, s[...]
which masks out bits that can correspond to both the IF and the ELSE paths.
So the resulting sequence must be:
s_or_savexec_b64 dst, src
s_and_b64 exec, exec, s[...] <-- added by SIWholeQuadMode
s_and_b64 dst, dst, exec <-- added by SILowerControlFlow
s_xor_b64 exec, exec, dst
Whether to add the additional s_and_b64 dst, dst, exec is currently determined
via the ExecModified tracking. With this change, it is instead determined by
an additional flag on SI_ELSE which is set by SIWholeQuadMode.
Finally: It also occured to me that an alternative approach for the long run
is for SILowerControlFlow to unconditionally emit
s_or_saveexec_b64 dst, src
...
s_and_b64 dst, dst, exec
s_xor_b64 exec, exec, dst
and have a pass that detects and cleans up the "redundant AND with exec"
pattern where possible. This could be useful anyway, because we also add
instructions
s_and_b64 vcc, exec, vcc
before s_cbranch_scc (in moveToALU), and those are often redundant. I have
some pending changes to how KILL is lowered that could also benefit from
such a cleanup pass.
In any case, this current patch could help in the short term with the whole
ExecModified business.
Reviewers: tstellarAMD, arsenm
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: https://reviews.llvm.org/D22846
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276972 91177308-0d34-0410-b5e6-96231b3b80d8
Before adding a new preheader block, check if there is a candidate block
where the loop setup could be placed speculatively. This will be off by
default.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276919 91177308-0d34-0410-b5e6-96231b3b80d8