As with x86 and AArch64, certain situations can arise where we need to spill
CPSR in the middle of a calculation. These should be avoided where possible
(MRS/MSR is rather expensive), which ARM is actually better at than the other
two since it tries to Glue defs to uses, but as a last ditch effort, copying is
better than crashing.
rdar://problem/18011155
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218789 91177308-0d34-0410-b5e6-96231b3b80d8
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
Note: I accidentally committed a bogus older version of this patch previously.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218787 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: Implement conversion of 64 to 32 bit floating point numbers (fptrunc) in mips fast-isel
Test Plan:
fptrunc.ll
checked also with 4 internal mips build bot flavors mip32r1/miprs32r2 and at -O0 and -O2
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: rfuhler
Differential Revision: http://reviews.llvm.org/D5553
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218785 91177308-0d34-0410-b5e6-96231b3b80d8
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.
Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.
By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.
The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)
This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.
What this patch doesn't do:
This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.
http://reviews.llvm.org/D4919
rdar://problem/17994491
Thanks to dblaikie and dexonsmith for reviewing this patch!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218778 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions
when targeting ARMv8, but they are actually present on any target with
FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an
M-profile core, but they have the same instructions so we model them
both as FPARMv8 in the ARM backend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218763 91177308-0d34-0410-b5e6-96231b3b80d8
that keep cropping up in the regression test suite.
This also addresses one of the issues raised on the mailing list with
failing to form 'movsd' in as many cases as we realistically should.
There will be corresponding patches forthcoming for v4f32 at least. This
was a lot of fuss for a relatively small gain, but all the fuss was on
my end trying different ways of holding the pieces of the x86 fragment
patterns *just right*. Now that it works, the code is reasonably simple.
In the new test cases I'm adding here, v2i64 sticks out as just plain
horrible. I've not come up with any great ideas here other than that it
would be nice to recognize when we're *going* to take a domain crossing
hit and cross earlier to get the decent instructions. At least with AVX
it is slightly less silly....
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218756 91177308-0d34-0410-b5e6-96231b3b80d8
Nothing was relying on this and there are potentially some edge cases
that it would not be correct under. Removing it seems better than trying
to "fix" it as nothing was relying on it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218755 91177308-0d34-0410-b5e6-96231b3b80d8
The A64 instruction set includes a generic register syntax for accessing
implementation-defined system registers. The syntax for these registers is:
S<op0>_<op1>_<CRn>_<CRm>_<op2>
The encoding space permitted for implementation-defined system registers
is:
op0 op1 CRn CRm op2
11 xxx 1x11 xxxx xxx
The full encoding space can now be accessed:
op0 op1 CRn CRm op2
xx xxx xxxx xxxx xxx
This is useful to anyone needing to write assembly code supporting new
system registers before the assembler has learned the official names for
them.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218753 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST
was introduced in r217159 and r217138. This patch adds a missing cast from
v2f32 to v1i64 which is causing some compilation failures. Also added test
cases to cover various modimm types and BUILD_VECTORs with i64 elements.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218751 91177308-0d34-0410-b5e6-96231b3b80d8
The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and
FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be
modelled using the same target feature, and all double-precision
operations are already disabled by the fp-only-sp target features.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218747 91177308-0d34-0410-b5e6-96231b3b80d8
in exposing the scalar value to the broadcast DAG fragment so that we
can catch even reloads and fold them into the broadcast.
This is somewhat magical I'm afraid but seems to work. It is also what
the old lowering did, and I've switched an old test to run both
lowerings demonstrating that we get the same result.
Unlike the old code, I'm not lowering f32 or f64 scalars through this
path when we only have AVX1. The target patterns include pretty heinous
code to re-cast those as shuffles when the scalar happens to not be
spilled because AVX1 provides no broadcast mechanism from registers
what-so-ever. This is terribly brittle. I'd much rather go through our
generic lowering code to get this. If needed, we can add a peephole to
get even more opportunities to broadcast-from-spill-slots that are
exposed post-RA, but my suspicion is this just doesn't matter that much.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218734 91177308-0d34-0410-b5e6-96231b3b80d8
the same speed as pshufd but we can fold loads into the pmovzx
instructions.
This fixes some regressions that came up in the regression test suite
for the new vector shuffle lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218733 91177308-0d34-0410-b5e6-96231b3b80d8
VPBROADCAST.
This has the somewhat expected pervasive impact. I don't know why
I forgot about this. Everything seems good with lots of significant
improvements in the tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218724 91177308-0d34-0410-b5e6-96231b3b80d8
It was hacky to use an opcode as a switch because it won't always match
(rsqrte != sqrte), and it looks like we'll need to add more special casing
per arch than I had hoped for. Eg, x86 will prefer a different NR estimate
implementation. ARM will want to use it's 'step' instructions. There also
don't appear to be any new estimate instructions in any arch in a long,
long time. Altivec vloge and vexpte may have been the first and last in
that field...
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218698 91177308-0d34-0410-b5e6-96231b3b80d8
Note: This version fixed an issue with the TBZ/TBNZ instructions that were
generated in FastISel. The issue was that the 64bit version of TBZ (TBZX)
automagically sets the upper bit of the immediate field that is used to specify
the bit we want to test. To test for any of the lower 32bits we have to first
extract the subregister and use the 32bit version of the TBZ instruction (TBZW).
Original commit message:
Teach selectBranch to fold bit test and branch into a single instruction (TBZ or
TBNZ).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218693 91177308-0d34-0410-b5e6-96231b3b80d8
No tests for omod since nothing uses it yet, but
this should get rid of the remaining annoying trailing
zeros after some instructions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218692 91177308-0d34-0410-b5e6-96231b3b80d8
Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1.
Now cmp intrinsics lower in the following way:
(i8 (int_x86_avx512_mask_pcmpeq_q_128
(v2i64 %a), (v2i64 %b), (i8 %mask))) ->
(i8 (bitcast
(v8i1 (insert_subvector undef,
(v2i1 (and (PCMPEQM %a, %b),
(extract_subvector
(v8i1 (bitcast %mask)), 0))), 0))))
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218669 91177308-0d34-0410-b5e6-96231b3b80d8
a flawed direction and causing miscompiles. Read on for details.
Fundamentally, the premise of this patch series was to map
VECTOR_SHUFFLE DAG nodes into VSELECT DAG nodes for all blends because
we are going to *have* to lower to VSELECT nodes for some blends to
trigger the instruction selection patterns of variable blend
instructions. This doesn't actually work out so well.
In order to match performance with the existing VECTOR_SHUFFLE
lowering code, we would need to re-slice the blend in order to fit it
into either the integer or floating point blends available on the ISA.
When coming from VECTOR_SHUFFLE (or other vNi1 style VSELECT sources)
this works well because the X86 backend ensures that these types of
operands to VSELECT get sign extended into '-1' and '0' for true and
false, allowing us to re-slice the bits in whatever granularity without
changing semantics.
However, if the VSELECT condition comes from some other source, for
example code lowering vector comparisons, it will likely only have the
required bit set -- the high bit. We can't blindly slice up this style
of VSELECT. Reid found some code using Halide that triggers this and I'm
hopeful to eventually get a test case, but I don't need it to understand
why this is A Bad Idea.
There is another aspect that makes this approach flawed. When in
VECTOR_SHUFFLE form, we have very distilled information that represents
the *constant* blend mask. Converting back to a VSELECT form actually
can lose this information, and so I think now that it is better to treat
this as VECTOR_SHUFFLE until the very last moment and only use VSELECT
nodes for instruction selection purposes.
My plan is to:
1) Clean up and formalize the target pre-legalization DAG combine that
converts a VSELECT with a constant condition operand into
a VECTOR_SHUFFLE.
2) Remove any fancy lowering from VSELECT during *legalization* relying
entirely on the DAG combine to catch cases where we can match to an
immediate-controlled blend instruction.
One additional step that I'm not planning on but would be interested in
others' opinions on: we could add an X86ISD::VSELECT or X86ISD::BLENDV
which encodes a fully legalized VSELECT node. Then it would be easy to
write isel patterns only in terms of this to ensure VECTOR_SHUFFLE
legalization only ever forms the fully legalized construct and we can't
cycle between it and VSELECT combining.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218658 91177308-0d34-0410-b5e6-96231b3b80d8
The sign-/zero-extension of the loaded value can be performed by the memory
instruction for free. If the result of the load has only one use and the use is
a sign-/zero-extend, then we emit the proper load instruction. The extend is
only a register copy and will be optimized away later on.
Other instructions that consume the sign-/zero-extended value are also made
aware of this fact, so they don't fold the extend too.
This fixes rdar://problem/18495928.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218653 91177308-0d34-0410-b5e6-96231b3b80d8
Factor out the code that determines the implicit scale factor of memory
operations for a given value type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218652 91177308-0d34-0410-b5e6-96231b3b80d8
No functionality change.
Makes the code more compact (see the FMA part).
This needs a new type attribute MemOpFrag in X86VectorVTInfo. For now I only
defined this in the simple cases. See the commment before the attribute.
Diff of X86.td.expanded before and after is empty except for the appearance of
the new attribute.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218637 91177308-0d34-0410-b5e6-96231b3b80d8
map, this makes sure that we can compile the same code for two different
ABIs (hard and soft float) in the same module.
Update one testcase accordingly (and fix some confusing naming) and
add a new testcase as well with the ordering swapped which would
highlight the problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218632 91177308-0d34-0410-b5e6-96231b3b80d8
Primarily refines all of the instructions with accurate latency
and micro-op information. Refinements largely focus on the NEON
instructions.
Additionally, a few advanced features are modeled, including
forwarding for MAC instructions and hazards for floating point SQRT
and DIV.
Lastly, the issue-width is reduced to three so that the scheduler
will better accommodate the narrower decode and dispatch width.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218627 91177308-0d34-0410-b5e6-96231b3b80d8
These turn into fadds, so combine them into the target
mad node.
fadd (fadd (a, a), b) -> mad 2.0, a, b
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218608 91177308-0d34-0410-b5e6-96231b3b80d8
This patch improves the target-specific cost model to better handle signed
division by a power of two. The immediate result is that this enables the SLP
vectorizer to do a better job.
http://reviews.llvm.org/D5469
PR20714
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218607 91177308-0d34-0410-b5e6-96231b3b80d8
nodes, and rely exclusively on its logic. This removes a ton of
duplication from the blend lowering and centralizes it in one place.
One downside is that it requires a bunch of hacks to make this work with
the current legalization framework. We have to manually speculate one
aspect of legalizing VSELECT nodes to get everything to work nicely
because the existing legalization framework isn't *actually* bottom-up.
The other grossness is that we somewhat duplicate the analysis of
constant blends. I'm on the fence here. If reviewers thing this would
look better with VSELECT when it has constant operands dumping over tho
VECTOR_SHUFFLE, we could go that way. But it would be a substantial
change because currently all of the actual blend instructions are
matched via patterns in the TD files based around VSELECT nodes (despite
them not being perfect fits for that). Suggestions welcome, but at least
this removes the rampant duplication in the backend.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218600 91177308-0d34-0410-b5e6-96231b3b80d8
X86 target-specific DAG combining that tried to convert VSELECT nodes
into VECTOR_SHUFFLE nodes that it "knew" would lower into
immediate-controlled blend nodes.
Turns out, we have perfectly good lowering of all these VSELECT nodes,
and indeed that lowering already knows how to handle lowering through
BLENDI to immediate-controlled blend nodes. The code just wasn't getting
used much because this thing forced the world to go through the vector
shuffle lowering. Yuck.
This also exposes that I was too aggressive in avoiding domain crossing
in v218588 with that lowering -- when the other option is to expand into
two 128-bit vectors, it is worth domain crossing. Restore that behavior
now that we have nice tests covering it.
The test updates here fall into two camps. One is where previously we
ended up with an unsigned encoding of the blend operand and now we get
a signed encoding. In most of those places there were elaborate comments
explaining exactly what these operands really mean. Rather than that,
just switch these tests to use the nicely decoded comments that make it
obvious that the final shuffle matches.
The other updates are just removing pointless domain crossing by
blending integers with PBLENDW rather than BLENDPS.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218589 91177308-0d34-0410-b5e6-96231b3b80d8
crossing and generally work more like the blend emission code in the new
vector shuffle lowering.
My goal is to have the new vector shuffle lowering just produce VSELECT
nodes that are either matched here to BLENDI or are legal and matched in
the .td files to specific blend instructions. That seems much cleaner as
there are other ways to produce a VSELECT anyways. =]
No *observable* functionality changed yet, mostly because this code
appears to be near-dead. The behavior of this lowering routine did
change though. This code being mostly dead and untestable will change
with my next commit which will also point some new tests at it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218588 91177308-0d34-0410-b5e6-96231b3b80d8
AVX-512.
There is no interesting logic yet. Everything ends up eventually
delegating to the generic code to split the vector and shuffle the
halves. Interestingly, that logic does a significantly better job of
lowering all of these types than the generic vector expansion code does.
Mostly, it lets most of the cases fall back to nice AVX2 code rather
than all the way back to SSE code paths.
Step 2 of basic AVX-512 support in the new vector shuffle lowering. Next
up will be to incrementally add direct support for the basic instruction
set to each type (adding tests first).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218585 91177308-0d34-0410-b5e6-96231b3b80d8
assertion, making the name generic, and improving the documentation.
Step 1 in adding very primitive support for AVX-512. No functionality
changed yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218584 91177308-0d34-0410-b5e6-96231b3b80d8
vectors.
Someone will need to build the AVX512 lowering, which should follow
AVX1 and AVX2 *very* closely for AVX512F and AVX512BW resp. I've added
a dummy test which is a port of the v8f32 and v8i32 tests from AVX and
AVX2 to v8f64 and v8i64 tests for AVX512F and AVX512BW. Hopefully this
is enough information for someone to implement proper lowering here. If
not, I'll be happy to help, but right now the AVX-512 support isn't
a priority for me.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218583 91177308-0d34-0410-b5e6-96231b3b80d8
lowerings.
This was hopelessly broken. First, the x86 backend wants '-1' to be the
element value representing true in a boolean vector, and second the
operand order for VSELECT is backwards from the actual x86 instructions.
To make matters worse, the backend is just using '-1' as the true value
to get the high bit to be set. It doesn't actually symbolically map the
'-1' to anything. But on x86 this isn't quite how it works: there *only*
the high bit is relevant. As a consequence weird non-'-1' values like
0x80 actually "work" once you flip the operands to be backwards.
Anyways, thanks to Hal for helping me sort out what these *should* be.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218582 91177308-0d34-0410-b5e6-96231b3b80d8
new vector shuffle target DAG combines -- it helps to actually test for
the value you want rather than just using an integer in a boolean
context.
Have I mentioned that I loathe implicit conversions recently? :: sigh ::
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218576 91177308-0d34-0410-b5e6-96231b3b80d8
of widening masks.
We can't widen a zeroing mask unless both elements that would be merged
are either zeroed or undef. This is the only way to widen a mask if it
has a zeroed element.
Also clean up the code here by ordering the checks in a more logical way
and by using the symoblic values for undef and zero. I'm actually torn
on using the symbolic values because the existing code is littered with
the assumption that -1 is undef, and moreover that entries '< 0' are the
special entries. While that works with the values given to these
constants, using the symbolic constants actually makes it a bit more
opaque why this is the case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218575 91177308-0d34-0410-b5e6-96231b3b80d8
I spotted this by inspection when debugging something else, so I have no
test case what-so-ever, and am not even sure it is possible to
realistically trigger the bug. But this is what was intended here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218565 91177308-0d34-0410-b5e6-96231b3b80d8
and in the target shuffle combining when trying to widen vector
elements.
Previously only one of these was correct, and we didn't correctly
propagate zeroing target shuffle masks (which have a different sentinel
value from undef in non- target shuffle masks now). This isn't just
a missed optimization, this caused us to drop zeroing shuffles on the
floor and miscompile code. The added test case is one example of that.
There are other fixes to the test suite as a consequence of this as well
as restoring the undef elements in some of the masks that were lost when
I brought sanity to the actual *value* of the undef and zero sentinels.
I've also just cleaned up some of the PSHUFD and PSHUFLW and PSHUFHW
combining code, but that code really needs to go. It was a nice initial
attempt, but it isn't very principled and the recursive shuffle combiner
is much more powerful.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218562 91177308-0d34-0410-b5e6-96231b3b80d8
to significantly more sane sentinels. Notably, everywhere else in the
backend's representation of shuffles uses '-1' to represent undef. The
target shuffle masks really shouldn't diverge from that, especially as
in a few places they are manipulated by shared code.
This causes us to lose some undef lanes in various test masks. I want to
get these back, but technically it isn't invalid and there are a *lot*
of bugs here so I want to try to establish a saner baseline for fixing
some of the bugs by aligning the specific senitnel values used.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218561 91177308-0d34-0410-b5e6-96231b3b80d8
This is purely refactoring. No functional changes intended. PowerPC is the only target
that is currently using this interface.
The ultimate goal is to allow targets other than PowerPC (certainly X86 and Aarch64) to turn this:
z = y / sqrt(x)
into:
z = y * rsqrte(x)
And:
z = y / x
into:
z = y * rcpe(x)
using whatever HW magic they can use. See http://llvm.org/bugs/show_bug.cgi?id=20900 .
There is one hook in TargetLowering to get the target-specific opcode for an estimate instruction
along with the number of refinement steps needed to make the estimate usable.
Differential Revision: http://reviews.llvm.org/D5484
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218553 91177308-0d34-0410-b5e6-96231b3b80d8
that managed to elude all of my fuzz testing historically. =/
Something changed to allow this code path to actually be exercised and
it was doing bad things. It is especially heavily exercised by the
patterns that emerge when doing AVX shuffles that end up lowered through
the 128-bit code path.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218540 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of moving the first SGPR that is different than the first,
legalize the operand that requires the fewest moves if one
SGPR is used for multiple operands.
This saves extra moves and is also required for some instructions
which require that the same operand be used for multiple operands.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218532 91177308-0d34-0410-b5e6-96231b3b80d8
Disable the SGPR usage restriction parts of the DAG legalizeOperands.
It now should only be doing immediate folding until it can be replaced
later. The real legalization work is now done by the other
SIInstrInfo::legalizeOperands
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218531 91177308-0d34-0410-b5e6-96231b3b80d8
The base implementation of commuteInstruction is used
in some cases, but it turns out this has been broken for a
long time since modifiers were inserted between the real operands.
The base implementation of commuteInstruction also fails on immediates,
which also needs to be fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218530 91177308-0d34-0410-b5e6-96231b3b80d8
e.g. v_cndmask_b32 requires the condition operand be an SGPR.
If one of the source operands were an SGPR, that would be considered
the one SGPR use and the condition operand would be illegally moved.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218529 91177308-0d34-0410-b5e6-96231b3b80d8
This needs a test, but I'm not sure if it is currently possible and
I originally hit it due to a bug. Right now the only global address
operands have no reason to be VALU instructions, although it
theoretically could be a problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218528 91177308-0d34-0410-b5e6-96231b3b80d8
No test since the current SIISelLowering::legalizeOperands
effectively hides this, and the general uses seem to only fire
on SALU instructions which don't have modifiers between
the operands.
When trying to use legalizeOperands immediately after
instruction selection, it now sees a lot more patterns
it did not see before which break on this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218527 91177308-0d34-0410-b5e6-96231b3b80d8
No tests hit this, and I don't see any way a GlobalAddress
node would survive beyond lowering on SI. It it would, the
move should probably be inserted by selection.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218526 91177308-0d34-0410-b5e6-96231b3b80d8
layer of tie-breaking sorting, it really helps to check that you're in
a tie first. =] Otherwise the whole thing cycles infinitely. Test case
added, another one found through fuzz testing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218523 91177308-0d34-0410-b5e6-96231b3b80d8
AVX support.
New test cases included. Note that none of the existing test cases
covered these buggy code paths. =/ Also, it is clear from this that
SHUFPS and SHUFPD are the most bug prone shuffle instructions in x86. =[
These were all detected by fuzz-testing. (I <3 fuzz testing.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218522 91177308-0d34-0410-b5e6-96231b3b80d8
This patch makes the ARM backend transform 3 operand instructions such as
'adds/subs' to the 2 operand version of the same instruction if the first
two register operands are the same.
Example: 'adds r0, r0, #1' will is transformed to 'adds r0, #1'.
Currently for some instructions such as 'adds' if you try to assemble
'adds r0, r0, #8' for thumb v6m the assembler would throw an error message
because the immediate cannot be encoded using 3 bits.
The backend should be smart enough to transform the instruction to
'adds r0, #8', which allows for larger immediate constants.
Patch by Ranjeet Singh.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218521 91177308-0d34-0410-b5e6-96231b3b80d8
The SSE rsqrt instruction (a fast reciprocal square root estimate) was
grouped in the same scheduling IIC_SSE_SQRT* class as the accurate (but very
slow) SSE sqrt instruction. For code which uses rsqrt (possibly with
newton-raphson iterations) this poor scheduling was affecting performances.
This patch splits off the rsqrt instruction from the sqrt instruction scheduling
classes and creates new IIC_SSE_RSQER* classes with latency values based on
Agner's table.
Differential Revision: http://reviews.llvm.org/D5370
Patch by Simon Pilgrim.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218517 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This will allow us to handle f128 arguments without duplicating code from
CCState::AnalyzeFormalArguments() or CCState::AnalyzeCallOperands().
No functional change.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5292
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218509 91177308-0d34-0410-b5e6-96231b3b80d8
based on the Function. This is currently used to implement
mips16 support in the mips backend via the existing module
pass resetting the subtarget.
Things to note:
a) This involved running resetTargetOptions before creating a
new subtarget so that code generation options like soft-float
could be recognized when creating the new subtarget. This is
to deal with initialization code in isel lowering that only
paid attention to the initial value.
b) Many of the existing testcases weren't using the soft-float
feature correctly. I've corrected these based on the check
values assuming that was the desired behavior.
c) The mips port now pays attention to the target-cpu and
target-features strings when generating code for a particular
function. I've removed these from one function where the
requested cpu and features didn't match the check lines in
the testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218492 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change.
I initially thought that pulling the Pat<> into the instruction pattern was
not possible because it was doing a transform on the index in order to convert
it from a per-element (extract_subvector) index into a per-chunk (vextract*x4)
index.
Turns out this also works inside the pattern because the vextract_extract
PatFrag has an OperandTransform EXTRACT_get_vextract{128,256}_imm, so the
index in $idx goes through the same conversion.
The existing test CodeGen/X86/avx512-insert-extract.ll extended in the
previous commit provides coverage for this change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218480 91177308-0d34-0410-b5e6-96231b3b80d8
No functional change.
These are now implemented as two levels of multiclasses heavily relying on the
new X86VectorVTInfo class. The multiclass at the first level that is called
with float or int provides the 128 or 256 bit subvector extracts. The second
level provides the register and memory variants and some more Pat<>s.
I've compared the td.expanded files before and after. One change is that
ExeDomain for 64x4 is SSEPackedDouble now. I think this is correct, i.e. a
bugfix.
(BTW, this is the change that was blocked on the recent tablegen fix. The
class-instance values X86VectorVTInfo inside vextract_for_type weren't
properly evaluated.)
Part of <rdar://problem/17688758>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218478 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
I originally tried doing this specifically for X86 in the backend in D5091,
but it was rather brittle and generally running too late to be general.
Furthermore, other targets may want to implement similar optimizations.
So I reimplemented it at the IR-level, fitting it into AtomicExpandPass
as it interacts with that pass (which could not be cleanly done before
at the backend level).
This optimization relies on a new target hook, which is only used by X86
for now, as the correctness of the optimization on other targets remains
an open question. If it is found correct on other targets, it should be
trivial to enable for them.
Details of the optimization are discussed in D5091.
Test Plan: make check-all + a new test
Reviewers: jfb
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5422
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218455 91177308-0d34-0410-b5e6-96231b3b80d8
These instructions do not indicate they are extendable or the
number of bits in the extendable operand. Rename to match
architected names. Add a testcase for the intrinsics.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218453 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The N32/N64 ABI's require that structs passed in registers are laid out
such that spilling the register with 'sd' places the struct at the lowest
address. For little endian this is trivial but for big-endian it requires
that structs are shifted into the upper bits of the register.
We also require that structs passed in registers have the 'inreg'
attribute for big-endian N32/N64 to work correctly. This is because the
tablegen-erated calling convention implementation only has access to the
lowered form of struct arguments (one or more integers of up to 64-bits
each) and is unable to determine the original type.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5286
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@218451 91177308-0d34-0410-b5e6-96231b3b80d8