A bug in r216725 meant we tried to discover the type of a SETCC before
confirming the node actually was a SETCC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216734 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Instead of specifying the alignment as metadata which may be destroyed by
transformation passes, make the alignment the second argument to ldu/ldg
intrinsic calls.
Test Plan:
ldu-ldg.ll
ldu-i8.ll
ldu-reg-plus-offset.ll
Reviewers: eliben, meheff, jholewinski
Reviewed By: meheff, jholewinski
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D5093
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216731 91177308-0d34-0410-b5e6-96231b3b80d8
In an llvm-stress generated test, we were trying to create a v0iN type and
asserting when that failed. This case could probably be handled by the
function, but not without added complexity and the situation it arises in is
sufficiently odd that there's probably no benefit anyway.
Should fix PR20775.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216725 91177308-0d34-0410-b5e6-96231b3b80d8
and forget about the previously used accumulator.
Coming up with a simple testcase is not easy, as this highly depends on
what the register allocator is doing: this issue showed up while working
with the PBQP allocator, which produced a different allocation scheme.
A testcase would need to come up with chain starting in D[0-7], then
moving to D[8-15], followed by a call to a function whose regmask
clobbers the starting accumulator in D[0-7], then another use of the chain.
Fixed some formatting, added some invariant checks while there.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216721 91177308-0d34-0410-b5e6-96231b3b80d8
Added new types to Legalizer.
Fixed getSetCCResultType function
Added lowering tests.
Reviewed by Elena Demikhovsky.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216717 91177308-0d34-0410-b5e6-96231b3b80d8
This fix checks first if the instruction to be folded (e.g. sign-/zero-extend,
or shift) is in the same machine basic block as the instruction we are folding
into.
Not doing so can result in incorrect code, because the value might not be
live-out of the basic block, where the value is defined.
This fixes rdar://problem/18169495.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216700 91177308-0d34-0410-b5e6-96231b3b80d8
The AArch64 target lowering for [zs]ext of vectors is set up to handle
input simple types and expects the generic SDag path to do something reasonable
with anything that's not a simple type. The code, however, was only
checking that the result type was a simple type and assuming that
implied that the source type would also be a simple type. That's not a
valid assumption, as operations like "zext <1 x i1> %0 to <1 x i32>"
demonstrate. The fix is to simply explicitly validate the source type
as well as the result type.
PR20791
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216689 91177308-0d34-0410-b5e6-96231b3b80d8
a single early exit.
And factor the subsequent cast<> from all but one block into a single
variable.
No functionality changed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216645 91177308-0d34-0410-b5e6-96231b3b80d8
functionality changed.
Separating this into two functions wasn't helping. There was a decent
amount of boilerplate duplicated, and some subsequent refactorings here
will pull even more common code out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216644 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Introduce support::ulittleX_t::ref type to Support/Endian.h and use it in x86 JIT
to enforce correct endianness and fix unaligned accesses.
Test Plan: regression test suite
Reviewers: lhames
Subscribers: ributzka, llvm-commits
Differential Revision: http://reviews.llvm.org/D5011
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216631 91177308-0d34-0410-b5e6-96231b3b80d8
Currently instructions are folded very aggressively into the memory operation,
which can lead to the use of killed operands:
%vreg1<def> = ADDXri %vreg0<kill>, 2
%vreg2<def> = LDRBBui %vreg0, 2
... = ... %vreg1 ...
This usually happens when the result is also used by another non-memory
instruction in the same basic block, or any instruction in another basic block.
If the computed address is used by only memory operations in the same basic
block, then it is safe to fold them. This is because all memory operations will
fold the address computation and the original computation will never be emitted.
This fixes rdar://problem/18142857.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216629 91177308-0d34-0410-b5e6-96231b3b80d8
When the address comes directly from a shift instruction then the address
computation cannot be folded into the memory instruction, because the zero
register is not available as a base register. Simplify addess needs to emit the
shift instruction and use the result as base register.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216621 91177308-0d34-0410-b5e6-96231b3b80d8
Use the zero register directly when possible to avoid an unnecessary register
copy and a wasted register at -O0. This also uses integer stores to store a
positive floating-point zero. This saves us from materializing the positive zero
in a register and then storing it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216617 91177308-0d34-0410-b5e6-96231b3b80d8
Instructions like 'fxsave' and control flow instructions like 'jne'
match any operand size. The loop I added to the Intel syntax matcher
assumed that using a different size would give a different instruction.
Now it handles the case where we get the same instruction for different
memory operand sizes.
This also allows us to remove the hack we had for unsized absolute
memory operands, because we can successfully match things like 'jnz'
without reporting ambiguity. Removing this hack uncovered test case
involving 'fadd' that was ambiguous. The memory operand could have been
single or double precision.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216604 91177308-0d34-0410-b5e6-96231b3b80d8
int may not have enough bits in it, which was detected by UBSan
bootstrap (it reported left shift by a too large constant).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216579 91177308-0d34-0410-b5e6-96231b3b80d8
This teaches the AArch64 backend to deal with the operations required
to deal with the operations on v4f16 and v8f16 which are exposed by
NEON intrinsics, plus the add, sub, mul and div operations.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216555 91177308-0d34-0410-b5e6-96231b3b80d8
we stopped efficiently lowering sextload using the SSE41 instructions
for that operation.
This is a consequence of a bad predicate I used thinking of the memory
access needs. The code actually handles the cases where the predicate
doesn't apply, and handles them much better. =] Simple fix and a test
case added. Fixes PR20767.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216538 91177308-0d34-0410-b5e6-96231b3b80d8
This combine is essentially combining target-specific nodes back into target
independent nodes that it "knows" will be combined yet again by a target
independent DAG combine into a different set of target-independent nodes that
are legal (not custom though!) and thus "ok". This seems... deeply flawed. The
crux of the problem is that we don't combine un-legalized shuffles that are
introduced by legalizing other operations, and thus we don't see a very
profitable combine opportunity. So the backend just forces the input to that
combine to re-appear.
However, for this to work, the conditions detected to re-form the unlegalized
nodes must be *exactly* right. Previously, failing this would have caused poor
code (if you're lucky) or a crasher when we failed to select instructions.
After r215611 we would fall back into the legalizer. In some cases, this just
"fixed" the crasher by produces bad code. But in the test case added it caused
the legalizer and the dag combiner to iterate forever.
The fix is to make the alignment checking in the x86 side of things match the
alignment checking in the generic DAG combine exactly. This isn't really a
satisfying or principled fix, but it at least make the code work as intended.
It also highlights that it would be nice to detect the availability of under
aligned loads for a given type rather than bailing on this optimization. I've
left a FIXME to document this.
Original commit message for r215611 which covers the rest of the chang:
[SDAG] Fix a case where we would iteratively legalize a node during
combining by replacing it with something else but not re-process the
node afterward to remove it.
In a truly remarkable stroke of bad luck, this would (in the test case
attached) end up getting some other node combined into it without ever
getting re-processed. By adding it back on to the worklist, in addition
to deleting the dead nodes more quickly we also ensure that if it
*stops* being dead for any reason it makes it back through the
legalizer. Without this, the test case will end up failing during
instruction selection due to an and node with a type we don't have an
instruction pattern for.
It took many million runs of the shuffle fuzz tester to find this.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216537 91177308-0d34-0410-b5e6-96231b3b80d8
When a shift with extension or an add with shift and extension cannot be folded
into the memory operation, then the address calculation has to be materialized
separately. While doing so the code forgot to consider a possible sign-/zero-
extension. This fix folds now also the sign-/zero-extension into the add or
shift instruction which is used to materialize the address.
This fixes rdar://problem/18141718.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216511 91177308-0d34-0410-b5e6-96231b3b80d8
The existing matcher has lots of AT&T assembly dialect assumptions baked
into it. In particular, the hack for resolving the size of a memory
operand by appending the four most common suffixes doesn't work at all.
The Intel assembly dialect mnemonic table has ambiguous entries, so we
need to try matching multiple times with different operand sizes, since
that's the only way to choose different instruction variants.
This makes us more compatible with gas's implementation of Intel
assembly syntax. MSVC assumes you want byte-sized operations for the
instructions that we reject as ambiguous.
Reviewed By: grosbach
Differential Revision: http://reviews.llvm.org/D4747
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216481 91177308-0d34-0410-b5e6-96231b3b80d8
It seems on Darwin the illegal round-trip ::iterator -> MachineInstr* -> ::iterator breaks execution horribly when the iterator is not a real MachineInstr, like ::end().
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216455 91177308-0d34-0410-b5e6-96231b3b80d8
Take a StringRef instead of a "const char *".
Take a "std::error_code &" instead of a "std::string &" for error.
A create static method would be even better, but this patch is already a bit too
big.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216393 91177308-0d34-0410-b5e6-96231b3b80d8
This actually was caught by existing tests but those tests were disabled
with an XFAIL because of PR20736. While working on fixing that,
I noticed the test failure, and tracked it down to this.
We even have a really nice Clang warning that would have caught this but
it isn't enabled in LLVM! =[ I may look at enabling it.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216391 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216371 91177308-0d34-0410-b5e6-96231b3b80d8