This changes the behavior of AddAligntmentAssumptions to match its
comment. I.e, prove the asserted alignment in the context of the caller,
not the callee.
Thanks to Mehdi Amini for seeing the issue here! Also to Artur Pilipenko
who also saw a fix for the issue.
rdar://22521387
Differential Revision: http://reviews.llvm.org/D12997
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248390 91177308-0d34-0410-b5e6-96231b3b80d8
Invoking a function which returns an aggregate can sometimes be
transformed to return a scalar value. However, this means that we need
to create an insertvalue instruction(s) to recreate the correct
aggregate type. We achieved this by inserting an insertvalue
instruction at the invoke's normal successor. However, this is not
feasible if the normal successor uses the invoke's return value inside a
PHI node.
Instead, split the edge between the invoke and the unwind successor and
create the insertvalue instruction in the new basic block. The new
basic block's successor will be the old invoke successor which leaves
us with IR which is well behaved.
This fixes PR24906.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248387 91177308-0d34-0410-b5e6-96231b3b80d8
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.
LLVM counterpart to D12835
Differential Revision: http://reviews.llvm.org/D13002
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248368 91177308-0d34-0410-b5e6-96231b3b80d8
Add two new ways of accessing the unsafe stack pointer:
* At a fixed offset from the thread TLS base. This is very similar to
StackProtector cookies, but we plan to extend it to other backends
(ARM in particular) soon. Bionic-side implementation here:
https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
not emutls).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248357 91177308-0d34-0410-b5e6-96231b3b80d8
We may have subregister defs which are unused but not discovered and
cleaned up prior to liveness analysis. This creates multiple connected
components in the resulting live range which are forbidden in the
MachineVerifier because they would unnecesarily constrain the register
allocator. Rewrite those dead definitions to define a newly created
virtual register.
Differential Revision: http://reviews.llvm.org/D13035
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248335 91177308-0d34-0410-b5e6-96231b3b80d8
Apart from checking that GlobalVariable is a constant, we should check
that it's not a weak constant, in which case we can't propagate its
value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248327 91177308-0d34-0410-b5e6-96231b3b80d8
ARM counterpart to r248291:
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.
Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.
Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".
Differential Revision: http://reviews.llvm.org/D13033
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248294 91177308-0d34-0410-b5e6-96231b3b80d8
In the comparison failure block of a cmpxchg expansion, the initial
ldrex/ldxr will not be followed by a matching strex/stxr.
On ARM/AArch64, this unnecessarily ties up the execution monitor,
which might have a negative performance impact on some uarchs.
Instead, release the monitor in the failure block.
The clrex instruction was designed for this: use it.
Also see ARMARM v8-A B2.10.2:
"Exclusive access instructions and Shareable memory locations".
Differential Revision: http://reviews.llvm.org/D13033
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248291 91177308-0d34-0410-b5e6-96231b3b80d8
The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions *did* raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior. This also lets us fold some logic into TD patterns, which is nice.
Differential Revision: http://reviews.llvm.org/D12969
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248266 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Based on a patch by David Chisnall. I've modified the original patch as follows:
* Moved the expansion to the TargetStreamers so that the directive isn't
expanded when emitting assembly.
* Fixed an operand order bug.
* Changed the move instructions from DADDu to OR to match recent changes to GAS.
Reviewers: vkalintiris
Subscribers: llvm-commits, emaste, seanbruno, theraven
Differential Revision: http://reviews.llvm.org/D13017
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248258 91177308-0d34-0410-b5e6-96231b3b80d8
This patch generalizes the lowering of shuffles as zero extensions to allow extensions that don't start from the first element. It now recognises extensions starting anywhere in the lower 128-bits or at the start of any higher 128-bit lane.
The motivation was to reduce the number of high cost pshufb calls, but it also improves the SSE2 case as well.
Differential Revision: http://reviews.llvm.org/D12561
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248250 91177308-0d34-0410-b5e6-96231b3b80d8
We know that an argmemonly function can only access memory pointed to by it's pointer arguments. Rather than needing to consider all possible stores as aliasing (as we do for a readonly function), we can only consider the aliasing of the pointer arguments.
Note that this change only addresses hoisting. I'm thinking about how to address speculation safety as well, but that will be a different change.
FYI, argmemonly disallows accessing memory through non-pointer typed arguments.
Differential Revision: http://reviews.llvm.org/D12771
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248220 91177308-0d34-0410-b5e6-96231b3b80d8
Turns out that not every basic block is guaranteed to have a node within the DominatorTree. This is really hard to trigger, but the test case from the PR managed to do so. There's active discussion continuing about what documentation and/or invariants needed cleaned up.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248216 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds support for combining patterns such as (FMUL(FADD(1.0, x), y)) and (FMUL(FSUB(x, 1.0), y)) to their FMA equivalents.
This is useful in particular for linear interpolation cases such as (FADD(FMUL(x, t), FMUL(y, FSUB(1.0, t))))
Differential Revision: http://reviews.llvm.org/D13003
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248210 91177308-0d34-0410-b5e6-96231b3b80d8
The vext pseudo-instruction takes the number of elements that need to be
extracted, not the number of bytes. Hence, use the number of elements
directly instead of scaling them with a factor.
Reviewers: Silviu Baranga, James Molloy
(not reflected in the differential revision)
Differential Revision: http://reviews.llvm.org/D12974
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248208 91177308-0d34-0410-b5e6-96231b3b80d8
We're currently losing any fast-math flags when synthesizing fcmps for
min/max reductions. In LV, make sure we copy over the scalar inst's
flags. In LoopUtils, we know we only ever match patterns with
hasUnsafeAlgebra, so apply that to any synthesized ops.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248201 91177308-0d34-0410-b5e6-96231b3b80d8
The ISD::FPOW and ISD::FSINCOS opcodes default to Legal, but there
is no legal instruction for those on SystemZ. This could cause
LLVM internal errors. Fixed by setting the operation action to
Expand for those opcodes.
Also added test cases for all other LLVM IR intrinsics that should
generate a library call. (Those already work correctly since the
default operation action is fine.)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248180 91177308-0d34-0410-b5e6-96231b3b80d8
If storing multiple FP constants, some subset of the stores
would be replaced with integers due to visit order, so
MergeConsecutiveStores would only partially merge
these.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248169 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Also tightened up the test and made a trivial fix to prevent double-newline
after emitting .cpsetup directives.
Reviewers: vkalintiris
Subscribers: seanbruno, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D12956
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248143 91177308-0d34-0410-b5e6-96231b3b80d8
Because -indvars widens induction variables through arithmetic,
`NeverNegative` cannot be a property of the `WidenIV` (a `WidenIV`
manages information for all transitive uses of an IV being widened,
including uses of `-1 * IV`). Instead it must live on `NarrowIVDefUse`
which manages information for a specific def-use edge in the transitive
use list of an induction variable.
This change also adds a test case that demonstrates the problem with
r248045.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248107 91177308-0d34-0410-b5e6-96231b3b80d8
Now that we have fast vector CTPOP implementations we can use this to speed up vector CTTZ using the pattern (cttz(x) = ctpop((x & -x) - 1))
Additionally, for AVX512CD that provides lzcnt instructions we can use the pattern (cttz_undef(x) = (width - 1) - ctlz(x & -x))
Differential Revision: http://reviews.llvm.org/D12663
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248091 91177308-0d34-0410-b5e6-96231b3b80d8
(icmp eq (ashr C1, %V) -1) may have multiple answers if C1 is not a
power of two and has the sign bit set.
This fixes PR24873.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248074 91177308-0d34-0410-b5e6-96231b3b80d8