is defined!
The users were checking the proper thing (Defined + PartialDeadDef), but the
information may have been wrong for other use cases, so fix that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267641 91177308-0d34-0410-b5e6-96231b3b80d8
Now, it is possible to know that partial definitions are dead definitions and
recognize that clobbered registers are also dead.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267621 91177308-0d34-0410-b5e6-96231b3b80d8
When a block is tail-duplicated, the PHI nodes from that block are
replaced with appropriate COPY instructions. When those PHI nodes
contained use operands with subregisters, the subregisters were
dropped from the COPY instructions, resulting in incorrect code.
Keep track of the subregister information and use this information
when remapping instructions from the duplicated block.
Differential Revision: http://reviews.llvm.org/D19337
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267583 91177308-0d34-0410-b5e6-96231b3b80d8
This is part of solving PR27344:
https://llvm.org/bugs/show_bug.cgi?id=27344
CGP should undo the SimplifyCFG transform for the same reason that earlier patches have used this
same mechanism: it's possible that passes between SimplifyCFG and CGP may be able to optimize the
IR further with a select in place.
For the TLI hook default, >99% taken or not taken is chosen as the default threshold for a highly
predictable branch. Even the most limited HW branch predictors will be correct on this branch almost
all the time, so even a massive mispredict penalty perf loss would be overcome by the win from all
the times the branch was predicted correctly.
As a follow-up, we could make the default target hook less conservative by using the SchedMachineModel's
MispredictPenalty. Or we could just let targets override the default by implementing the hook with that
and other target-specific options. Note that trying to statically determine mispredict rates for
close-to-balanced profile weight data is generally impossible if the HW is sufficiently advanced. Ie,
50/50 taken/not-taken might still be 100% predictable.
Finally, note that this patch as-is will not solve PR27344 because the current __builtin_unpredictable()
branch weight default values are 4 and 64. A proposal to change that is in D19435.
Differential Revision: http://reviews.llvm.org/D19488
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267572 91177308-0d34-0410-b5e6-96231b3b80d8
visitAND, when folding and (load) forgets to check which output of
an indexed load is involved, happily folding the updated address
output on the following testcase:
target datalayout = "e-m:e-i64:64-n32:64"
target triple = "powerpc64le-unknown-linux-gnu"
%typ = type { i32, i32 }
define signext i32 @_Z8access_pP1Tc(%typ* %p, i8 zeroext %type) {
%b = getelementptr inbounds %typ, %typ* %p, i64 0, i32 1
%1 = load i32, i32* %b, align 4
%2 = ptrtoint i32* %b to i64
%3 = and i64 %2, -35184372088833
%4 = inttoptr i64 %3 to i32*
%_msld = load i32, i32* %4, align 4
%zzz = add i32 %1, %_msld
ret i32 %zzz
}
Fix this by checking ResNo.
I've found a few more places that currently neglect to check for
indexed load, and tightened them up as well, but I don't have test
cases for them. In fact, they might not be triggerable at all,
at least with current targets. Still, better safe than sorry.
Differential Revision: http://reviews.llvm.org/D19202
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267420 91177308-0d34-0410-b5e6-96231b3b80d8
We didn't have logic to correctly handle CFGs where there was more than
one EH-pad successor (these are novel with WinEH).
There were situations where a register was live in one exceptional
successor but not another but the code as written would only consider
the first exceptional successor it found.
This resulted in split points which were insufficiently early if an
invoke was present.
This fixes PR27501.
N.B. This removes getLandingPadSuccessor.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267412 91177308-0d34-0410-b5e6-96231b3b80d8
The original patch caused crashes because it could derefence a null pointer
for SelectionDAGTargetInfo for targets that do not define it.
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267328 91177308-0d34-0410-b5e6-96231b3b80d8
Eliminate DITypeIdentifierMap and make DITypeRef a thin wrapper around
DIType*. It is no longer legal to refer to a DICompositeType by its
'identifier:', and DIBuilder no longer retains all types with an
'identifier:' automatically.
Aside from the bitcode upgrade, this is mainly removing logic to resolve
an MDString-based reference to an actualy DIType. The commits leading
up to this have made the implicit type map in DICompileUnit's
'retainedTypes:' field superfluous.
This does not remove DITypeRef, DIScopeRef, DINodeRef, and
DITypeRefArray, or stop using them in DI-related metadata. Although as
of this commit they aren't serving a useful purpose, there are patchces
under review to reuse them for CodeView support.
The tests in LLVM were updated with deref-typerefs.sh, which is attached
to the thread "[RFC] Lazy-loading of debug info metadata":
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098318.html
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267296 91177308-0d34-0410-b5e6-96231b3b80d8
This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads
a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that
value and returns it. The constant folder specifically recognizes the form of
this intrinsic and the constant initializers it may load from; if a loaded
constant initializer is known to have the form ``i32 trunc(x - %ptr)``,
the intrinsic call is folded to ``x``.
LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if ``x`` is an
``unnamed_addr`` function. However, it does not provide this guarantee for
a constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a constant.
Differential Revision: http://reviews.llvm.org/D18367
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267223 91177308-0d34-0410-b5e6-96231b3b80d8
The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual
functions defined in other DSOs. The unnamed_addr attribute means that the
function's address is not significant, so we're allowed to substitute it
with the address of a PLT entry.
Also includes a bonus feature: addends for COFF image-relative references.
Differential Revision: http://reviews.llvm.org/D17938
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267211 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This new pass allows targets to use the hazard recognizer without having
to also run one of the schedulers. This is useful when compiling with
optimizations disabled for targets that still need noop hazards
to be handled correctly.
Reviewers: hfinkel, atrick
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D18594
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267156 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This intrinsic returns true if the current thread belongs to a live pixel
and false if it belongs to a pixel that we are executing only for derivative
computation. It will be used by Mesa to implement gl_HelperInvocation.
Note that for pixels that are killed during the shader, this implementation
also returns true, but it doesn't matter because those pixels are always
disabled in the EXEC mask.
This unearthed a corner case in the instruction verifier, which complained
about a v_cndmask 0, 1, exec, exec<imp-use> instruction. That's stupid but
correct code, so make the verifier accept it as such.
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19191
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267102 91177308-0d34-0410-b5e6-96231b3b80d8
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267098 91177308-0d34-0410-b5e6-96231b3b80d8
When printing the properties required by a pass, only print the
properties that are set, and not those that are clear (only properties
that are set are verified, clear properties are "don't-care").
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267070 91177308-0d34-0410-b5e6-96231b3b80d8
splitting edges.
MachineBasicBlock::SplitCriticalEdges will crash if a nullptr would have
been passed for the Pass argument. Do not allow that by turning this
argument into a reference.
The alternative would have been to make the Pass a truly optional
argument, but although this is easy to do, I was afraid users using it
like this would not be aware the livness information, dominator tree and
such would silently be broken.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267052 91177308-0d34-0410-b5e6-96231b3b80d8
Introduce canSplitCriticalEdge, so that clients can now query whether or
not a critical edge can be split without actually needing to split it.
This may be useful when gathering information for cost models for
instance.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267046 91177308-0d34-0410-b5e6-96231b3b80d8
When custom lowered, this is not called if the store is custom
lowered. Move it to be a utility function so targets can
easily expand unaligned accesses when custom lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267029 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of holding a mask, hold two value: the start index and the
length of the mapping. This is a more compact representation, although
less powerful. That being said, arbitrary masks would not have worked
for the generic so do not allow them in the first place.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267025 91177308-0d34-0410-b5e6-96231b3b80d8
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.
The bisection is enabled using a new command line option (-opt-bisect-limit). Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit. A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.
The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check. Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute. A new function call has been added for module and SCC passes that behaves in a similar way.
Differential Revision: http://reviews.llvm.org/D19172
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267022 91177308-0d34-0410-b5e6-96231b3b80d8
This is needed to support CTTZ/CTLZ Custom correctly since LegalizeOps would be too late to do the custom lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266951 91177308-0d34-0410-b5e6-96231b3b80d8
With this change, ideally IR pass can always generate llvm.stackguard
call to get the stack guard; but for now there are still IR form stack
guard customizations around (see getIRStackGuard()). Future SSP
customization should go through LOAD_STACK_GUARD.
There is a behavior change: stack guard values are not CSEed anymore,
since we should never reuse the value in case that it has been spilled (and
corrupted). See ssp-guard-spill.ll. This also cause the change of stack
size and codegen in X86 and AArch64 test cases.
Ideally we'd like to know if the guard created in llvm.stackprotector() gets
spilled or not. If the value is spilled, discard the value and reload
stack guard; otherwise reuse the value. This can be done by teaching
register allocator to know how to rematerialize LOAD_STACK_GUARD and
force a rematerialization (which seems hard), or check for spilling in
expandPostRAPseudo. It only makes sense when the stack guard is a global
variable, which requires more instructions to load. Anyway, this seems to go out
of the scope of the current patch.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266806 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The `"patchable-function"` attribute can be used by an LLVM client to
influence LLVM's code generation in ways that makes the generated code
easily patchable at runtime (for instance, to redirect control).
Right now only one patchability scheme is supported,
`"prologue-short-redirect"`, but this can be expanded in the future.
Reviewers: joker.eph, rnk, echristo, dberris
Subscribers: joker.eph, echristo, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D19046
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266715 91177308-0d34-0410-b5e6-96231b3b80d8