Fix for PR20648 - http://llvm.org/bugs/show_bug.cgi?id=20648
This patch checks the operands of a vselect to see if all values are constants.
If yes, bail out of any further attempts to create a blend or shuffle because
SelectionDAGLegalize knows how to turn this kind of vselect into a single load.
This already happens for machines without SSE4.1, so the added checks just send
more targets down that path.
Differential Revision: http://reviews.llvm.org/D4934
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216121 91177308-0d34-0410-b5e6-96231b3b80d8
The goal of the patch is to implement section 3.2.3 of the AMD64 ABI
correctly. The controlling sentence is, "The size of each argument gets
rounded up to eightbytes. Therefore the stack will always be eightbyte
aligned." The equivalent sentence in the i386 ABI page 37 says, "At all
times, the stack pointer should point to a word-aligned area." For both
architectures, the stack pointer is not being rounded up to the nearest
eightbyte or word between the last normal argument and the first
variadic argument.
Patch by Thomas Jablin!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216119 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: This fixes http://llvm.org/bugs/show_bug.cgi?id=19530.
The problem is that X86ISelLowering erroneously thought the third call
was eligible for tail call elimination.
It would have been if it's return value was actually the one returned
by the calling function, but here that is not the case and
additional values are being returned.
Test Plan: Test case from the original bug report is included.
Reviewers: rafael
Reviewed By: rafael
Subscribers: rafael, llvm-commits
Differential Revision: http://reviews.llvm.org/D4968
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216117 91177308-0d34-0410-b5e6-96231b3b80d8
In PR20308 ( http://llvm.org/bugs/show_bug.cgi?id=20308 ), the critical-anti-dependency breaker
caused a miscompile because it broke a WAR hazard using a register that it thinks is available
based on info from a kill inst. Until PR18663 is solved, we shouldn't use any def/use info from
a kill because they are really just nops.
This patch adds guard checks for kills around calls to ScanInstruction() where the DefIndices
array is set. For good measure, add an assert in ScanInstruction() so we don't hit this bug again.
The test case is a reduced version of the code from the bug report.
Differential Revision: http://reviews.llvm.org/D4977
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216114 91177308-0d34-0410-b5e6-96231b3b80d8
the isRegSequence property.
This is a follow-up of r215394 and r215404, which respectively introduces the
isRegSequence property and uses it for ARM.
Thanks to the property introduced by the previous commits, this patch is able
to optimize the following sequence:
vmov d0, r2, r3
vmov d1, r0, r1
vmov r0, s0
vmov r1, s2
udiv r0, r1, r0
vmov r1, s1
vmov r2, s3
udiv r1, r2, r1
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
into:
udiv r0, r0, r2
udiv r1, r1, r3
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vmov r0, r1, d16
bx lr
This patch refactors how the copy optimizations are done in the peephole
optimizer. Prior to this patch, we had one copy-related optimization that
replaced a copy or bitcast by a generic, more suitable (in terms of register
file), copy.
With this patch, the peephole optimizer features two copy-related optimizations:
1. One for rewriting generic copies to generic copies:
PeepholeOptimizer::optimizeCoalescableCopy.
2. One for replacing non-generic copies with generic copies:
PeepholeOptimizer::optimizeUncoalescableCopy.
The goals of these two optimizations are slightly different: one rewrite the
operand of the instruction (#1), the other kills off the non-generic instruction
and replace it by a (sequence of) generic instruction(s).
Both optimizations rely on the ValueTracker introduced in r212100.
The ValueTracker has been refactored to use the information from the
TargetInstrInfo for non-generic instruction. As part of the refactoring, we
switched the tracking from the index of the definition to the actual register
(virtual or physical). This one change is to provide better consistency with
register related APIs and to ease the use of the TargetInstrInfo.
Moreover, this patch introduces a new helper class CopyRewriter used to ease the
rewriting of generic copies (i.e., #1).
Finally, this patch adds a dead code elimination pass right after the peephole
optimizer to get rid of dead code that may appear after rewriting.
This is related to <rdar://problem/12702965>.
Review: http://reviews.llvm.org/D4874
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216088 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a bug I introduced in a previous commit (r216033). Sign-/Zero-
extension from i1 cannot be folded into the ADDS/SUBS instructions. Instead both
operands have to be sign-/zero-extended with separate instructions.
Related to <rdar://problem/17913111>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216073 91177308-0d34-0410-b5e6-96231b3b80d8
legalization stage. With those two optimizations, fewer signed/zero extension
instructions can be inserted, and then we can expose more opportunities to
Machine CSE pass in back-end.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216066 91177308-0d34-0410-b5e6-96231b3b80d8
LLVM generates illegal `rbit r0, #352` instruction for rbit intrinsic.
According to ARM ARM, rbit only takes register as argument, not immediate.
The correct instruction should be rbit <Rd>, <Rm>.
The bug was originally introduced in r211057.
Differential Revision: http://reviews.llvm.org/D4980
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216064 91177308-0d34-0410-b5e6-96231b3b80d8
We can prove that a 'sub' can be a 'sub nuw' if the left-hand side is
negative and the right-hand side is non-negative.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216045 91177308-0d34-0410-b5e6-96231b3b80d8
Because declarations of these functions can appear in places like autoconf
checks, they have to be handled somehow, even though we do not support
vararg custom functions. We do so by printing a warning and calling the
uninstrumented function, as we do for unimplemented functions.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216042 91177308-0d34-0410-b5e6-96231b3b80d8
Use FMOVWSr/FMOVXDr instead of FMOVSr/FMOVDr, which have the proper register
class to be used with the zero register. This makes the MachineInstruction
verifier happy again.
This is related to <rdar://problem/18027157>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216040 91177308-0d34-0410-b5e6-96231b3b80d8
We can prove that a 'sub' can be a 'sub nsw' under certain conditions:
- The sign bits of the operands is the same.
- Both operands have more than 1 sign bit.
The subtraction cannot be a signed overflow in either case.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216037 91177308-0d34-0410-b5e6-96231b3b80d8
Factor out the ADDS/SUBS instruction emission code into helper functions and
make the helper functions more clever to support most of the different ADDS/SUBS
instructions the architecture support. This includes better immedediate support,
shift folding, and sign-/zero-extend folding.
This fixes <rdar://problem/17913111>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216033 91177308-0d34-0410-b5e6-96231b3b80d8
Implement `uselistorder` and `uselistorder_bb` assembly directives,
which allow the use-list order to be recovered when round-tripping to
assembly.
This is the bulk of PR20515.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216025 91177308-0d34-0410-b5e6-96231b3b80d8
This adds the missing test that I promised for r215753 to test the
materialization of the floating-point value +0.0.
Related to <rdar://problem/18027157>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216019 91177308-0d34-0410-b5e6-96231b3b80d8
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.
Original commit message:
FastISel didn't take much advantage of the different addressing modes available
to it on AArch64. This commit allows the ComputeAddress method to recognize more
addressing modes that allows shifts and sign-/zero-extensions to be folded into
the memory operation itself.
For Example:
lsl x1, x1, #3 --> ldr x0, [x0, x1, lsl #3]
ldr x0, [x0, x1]
sxtw x1, w1
lsl x1, x1, #3 --> ldr x0, [x0, x1, sxtw #3]
ldr x0, [x0, x1]
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216013 91177308-0d34-0410-b5e6-96231b3b80d8
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.
Original commit message:
In the large code model for X86 floating-point constants are placed in the
constant pool and materialized by loading from it. Since the constant pool
could be far away, a PC relative load might not work. Therefore we first
materialize the address of the constant pool with a movabsq and then load
from there the floating-point value.
Fixes <rdar://problem/17674628>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216012 91177308-0d34-0410-b5e6-96231b3b80d8
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.
Original commit message:
This mostly affects the i64 value type, which always resulted in an 15byte
mobavsq instruction to materialize any constant. The custom code checks the
value of the immediate and tries to use a different and smaller mov
instruction when possible.
This fixes <rdar://problem/17420988>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216010 91177308-0d34-0410-b5e6-96231b3b80d8
Note: This was originally reverted to track down a buildbot error. Reapply
without any modifications.
Original commit message:
This change materializes now the value "0" from the zero register.
The zero register can be folded by several instruction, so no
materialization is need at all.
Fixes <rdar://problem/17924413>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216009 91177308-0d34-0410-b5e6-96231b3b80d8
Note: This was originally reverted to track down a buildbot error. This commit
exposed a latent bug that was fixed in r215753. Therefore it is reapplied
without any modifications.
I run it through SPEC2k and SPEC2k6 for AArch64 and it didn't introduce any new
regeressions.
Original commit message:
This changes the order in which FastISel tries to materialize a constant.
Originally it would try to use a simple target-independent approach, which
can lead to the generation of inefficient code.
On X86 this would result in the use of movabsq to materialize any 64bit
integer constant - even for simple and small values such as 0 and 1. Also
some very funny floating-point materialization could be observed too.
On AArch64 it would materialize the constant 0 in a register even the
architecture has an actual "zero" register.
On ARM it would generate unnecessary mov instructions or not use mvn.
This change simply changes the order and always asks the target first if it
likes to materialize the constant. This doesn't fix all the issues
mentioned above, but it enables the targets to implement such
optimizations.
Related to <rdar://problem/17420988>.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@216006 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a few BuildMI callsites where the result register was added by
using addReg, which is per default a use and therefore an operand register.
Also use the zero register as result register when emitting a compare
instruction (SUBS with unused result register).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215997 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, the hint mechanism relied on clean up passes to remove redundant
metadata, which still showed up if running opt at low levels of optimization.
That also has shown that multiple nodes of the same type, but with different
values could still coexist, even if temporary, and cause confusion if the
next pass got the wrong value.
This patch makes sure that, if metadata already exists in a loop, the hint
mechanism will never append a new node, but always replace the existing one.
It also enhances the algorithm to cope with more metadata types in the future
by just adding a new type, not a lot of code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215994 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This directive is similar to ".set mipsX".
It is used to change the CPU target of the assembler, enabling it to accept instructions for a specific CPU.
This patch only implements the r4000 CPU (which is treated internally as generic mips3) and the generic ISAs.
Contains work done by Matheus Almeida.
Reviewers: dsanders
Reviewed By: dsanders
Differential Revision: http://reviews.llvm.org/D4884
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215978 91177308-0d34-0410-b5e6-96231b3b80d8
Call `verifyModule()` after parsing and after every transformation.
Also convert some `DEBUG(dbgs())` to `errs()` to increase visibility
into what's going on.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215951 91177308-0d34-0410-b5e6-96231b3b80d8
- add check for volatile (probably unneeded, but I agree that we should be conservative about it).
- strengthen condition from isUnordered() to isSimple(), as I don't understand well enough Unordered semantics (and it also matches the comment better this way) to be confident in the previous behaviour (thanks for catching that one, I had missed the case Monotonic/Unordered).
- separate a condition in two.
- lengthen comment about aliasing and loads
- add tests in GVN/atomic.ll
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215943 91177308-0d34-0410-b5e6-96231b3b80d8
file with -macho, the Mach-O specific object file parser option.
After some discussion I chose to do this implementation contained in the logic
of llvm-objdump’s MachODump.cpp using a second disassembler for thumb when
needed and with updates mostly contained in the MachOObjectFile class.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215931 91177308-0d34-0410-b5e6-96231b3b80d8
This allows the AArch64 backend to handle fadd, fsub, fmul and fdiv
operations on f16 (half-precision) types by promoting to f32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215891 91177308-0d34-0410-b5e6-96231b3b80d8
Externally-defined functions with weak linkage should not be
tail-called on ARM or AArch64, as the AAELF spec requires normal calls
to undefined weak functions to be replaced with a NOP or jump to the
next instruction. The behaviour of branch instructions in this
situation (as used for tail calls) is implementation-defined, so we
cannot rely on the linker replacing the tail call with a return.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215890 91177308-0d34-0410-b5e6-96231b3b80d8
The set of functions defined in the RTABI was separated for no real reason.
This brings us closer to proper utilisation of the functions defined by the
RTABI. It also sets the ground for correctly emitting function calls to AEABI
functions on all AEABI conforming platforms.
The previously existing lie on the behaviour of __ldivmod and __uldivmod is
propagated as it is beyond the scope of the change.
The changes to the test are due to the fact that we now use the divmod functions
which return both the quotient and remainder and thus we no longer need to
invoke two functions on Linux (making it closer to EABI's behaviour).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215862 91177308-0d34-0410-b5e6-96231b3b80d8
It causes a number of regressions when -fintegrated-as is enabled. This happens
because there are codegen-only instructions that incorrectly uses the first
operand as the encoding for the $fcc register. The regressions do not occur when
-via-file-asm is also given.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215847 91177308-0d34-0410-b5e6-96231b3b80d8
This was a thinko. The intent was to flip the explicit bits that need toggling
rather than all bits. This would result in incorrect behaviour (which now is
tested).
Thanks to Nico Weber for pointing this out!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215846 91177308-0d34-0410-b5e6-96231b3b80d8
While this might seem like an obvious canonicalization, there is one subtle problem with it. The result of the original expression
is undef when x is NaN (remember, fast math flags), but the result of the select is always defined when x is NaN. This means that the
new expression is strictly more defined than the original one. One unfortunate consequence of this is that the transform is not reversible!
It's always legal to make increase the defined-ness of an expression, but it's not legal to reduce it. Thus, targets that prefer the original
form of the expression cannot reverse the transform to recover it. Another way to think of it is that the transform has lost source-level
information (the fast math flags), which is undesirable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215825 91177308-0d34-0410-b5e6-96231b3b80d8
We can combne a mul with a div if one of the operands is a multiple of
the other:
%mul = mul nsw nuw %a, C1
%ret = udiv %mul, C2
=>
%ret = mul nsw %a, (C1 / C2)
This can expose further optimization opportunities if we end up
multiplying or dividing by a power of 2.
Consider this small example:
define i32 @f(i32 %a) {
%mul = mul nuw i32 %a, 14
%div = udiv exact i32 %mul, 7
ret i32 %div
}
which gets CodeGen'd to:
imull $14, %edi, %eax
imulq $613566757, %rax, %rcx
shrq $32, %rcx
subl %ecx, %eax
shrl %eax
addl %ecx, %eax
shrl $2, %eax
retq
We can now transform this into:
define i32 @f(i32 %a) {
%shl = shl nuw i32 %a, 1
ret i32 %shl
}
which gets CodeGen'd to:
leal (%rdi,%rdi), %eax
retq
This fixes PR20681.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@215815 91177308-0d34-0410-b5e6-96231b3b80d8