Summary:
Extend AArch64RedundantCopyElimination to catch cases where the register
that is known to be zero is COPY'd in the predecessor block. Before
this change, this pass would catch cases like:
CBZW %W0, <BB#1>
BB#1:
%W0 = COPY %WZR // removed
After this change, cases like the one below are also caught:
%W0 = COPY %W1
CBZW %W1, <BB#1>
BB#1:
%W0 = COPY %WZR // removed
This change results in a 4% increase in static copies removed by this
pass when compiling the llvm test-suite. It also fixes regressions
caused by doing post-RA copy propagation (a separate change to be put up
for review shortly).
Reviewers: junbuml, mcrosier, t.p.northover, qcolombet, MatzeB
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: https://reviews.llvm.org/D30113
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295863 91177308-0d34-0410-b5e6-96231b3b80d8
Instead, just be conservative as these are unfrequent enough. Thanks
to Peter Collingbourne for the discussion about this on IRC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295861 91177308-0d34-0410-b5e6-96231b3b80d8
Prevent memory objects of different address spaces to be part of
the same load/store groups when analysing interleaved accesses.
This is fixing pr31900.
Reviewers: HaoLiu, mssimpso, mkuper
Reviewed By: mssimpso, mkuper
Subscribers: llvm-commits, efriedma, mzolotukhin
Differential Revision: https://reviews.llvm.org/D29717
This reverts r295042 (re-applies r295038) with an additional fix for the
buildbot problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295858 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: The CallTargetProfile should be added to FProfile to be consistent with other profile readers.
Reviewers: dnovillo, davidxl
Reviewed By: davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30233
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295852 91177308-0d34-0410-b5e6-96231b3b80d8
The pass tries to fix a spill of LR that turns out to be unnecessary.
So it removes the tPOP but forgets to remove tPUSH.
This causes the stack be misaligned upon returning the function.
Thus, remove the tPUSH as well in this case.
Differential Revision: https://reviews.llvm.org/D30207
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295816 91177308-0d34-0410-b5e6-96231b3b80d8
This patch adds missing sched classes for Thumb2 instructions.
This has been missing so far, and as a consequence, machine
scheduler models for individual sub-targets have tended to
be larger than they needed to be. These patches should help
write schedulers better and faster in the future
for ARM sub-targets.
Reviewer: Diana Picus
Differential Revision: https://reviews.llvm.org/D29953
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295811 91177308-0d34-0410-b5e6-96231b3b80d8
This patch introduces new X86ISD::FMAXS and X86ISD::FMINS opcodes. The legacy intrinsics now lower to this node. As do the AVX-512 masked intrinsics when the rounding mode is CUR_DIRECTION.
I've merged a copy of the tablegen multiclass avx512_fp_scalar into avx512_fp_scalar_sae. avx512_fp_scalar still needs to support CUR_DIRECTION appearing as a rounding mode for X86ISD::FADD_ROUND and others.
Differential revision: https://reviews.llvm.org/D30186
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295810 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Motivation: fix PR31181 without regression (the actual fix is still in
progress). However, the actual content of PR31181 is not relevant
here.
This change makes poison propagation more aggressive in the following
cases:
1. poision * Val == poison, for any Val. In particular, this changes
existing intentional and documented behavior in these two cases:
a. Val is 0
b. Val is 2^k * N
2. poison << Val == poison, for any Val
3. getelementptr is poison if any input is poison
I think all of these are justified (and are axiomatically true in the
new poison / undef model):
1a: we need poison * 0 to be poison to allow transforms like these:
A * (B + C) ==> A * B + A * C
If poison * 0 were 0 then the above transform could not be allowed
since e.g. we could have A = poison, B = 1, C = -1, making the LHS
poison * (1 + -1) = poison * 0 = 0
and the RHS
poison * 1 + poison * -1 = poison + poison = poison
1b: we need e.g. poison * 4 to be poison since we want to allow
A * 4 ==> A + A + A + A
If poison * 4 were a value with all of their bits poison except the
last four; then we'd not be able to do this transform since then if A
were poison the LHS would only be "partially" poison while the RHS
would be "full" poison.
2: Same reasoning as (1b), we'd like have the following kinds
transforms be legal:
A << 1 ==> A + A
Reviewers: majnemer, efriedma
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D30185
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295809 91177308-0d34-0410-b5e6-96231b3b80d8
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.
Also allow using clamp with f16, and use knowledge
of dx10_clamp.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295788 91177308-0d34-0410-b5e6-96231b3b80d8
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.
This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.
General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
which returns an array of flags marking boundaries of vectorized
load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
that constructs a vector according to the plan.
Differential Revision: https://reviews.llvm.org/D30011
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295784 91177308-0d34-0410-b5e6-96231b3b80d8
For whatever reason ld64 requires that member headers (not the member
themselves) should be aligned. The only way to do that is to edit the
previous member so that it ends at an aligned boundary.
Since modifying data put in an archive is an undesirable property,
llvm-ar should only do it when it is absolutely necessary.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295765 91177308-0d34-0410-b5e6-96231b3b80d8
Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295762 91177308-0d34-0410-b5e6-96231b3b80d8
This is part of trying to clean up our handling of min/max patterns in IR.
By converting these to canonical form, we're more likely to recognize them
because there are various places in InstCombine that don't use
matchSelectPattern or m_SMax and friends.
The backend fixups referenced in the now deleted TODO comment were added with:
https://reviews.llvm.org/rL291392https://reviews.llvm.org/rL289738
If there's any codegen fallout from this change, we should be able to address
it in DAGCombiner or target-specific lowering.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295758 91177308-0d34-0410-b5e6-96231b3b80d8
Before frame offsets are calculated, try to eliminate the
frame indexes used by SGPR spills. Then we can delete them
after.
I think for now we can be sure that no other instruction
will be re-using the same frame indexes. It should be easy
to notice if this assumption ever breaks since everything
asserts if it tries to use a dead frame index later.
The unused emergency stack slot seems to still be left behind,
so an additional 4 bytes is still wasted.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295753 91177308-0d34-0410-b5e6-96231b3b80d8
Conflicting debug info for function arguments causes hard-to-debug
assertions in the DWARF backend, so the Verifier should reject it.
For performance reasons this only checks function arguments from
non-inlined debug intrinsics for now.
rdar://problem/30520286
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295749 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64. The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.
The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used. One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.
This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).
Reviewers: t.p.northover, qcolombet, MatzeB
Subscribers: aemerson, mcrosier, sebpop, llvm-commits
Differential Revision: https://reviews.llvm.org/D28813
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295746 91177308-0d34-0410-b5e6-96231b3b80d8
As i64 isn't a value type on 32-bit targets, we fail to fold the VZEXT_LOAD into VPBROADCASTQ.
Also shows that we're not decoding VPERMIV3 shuffles very well....
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295729 91177308-0d34-0410-b5e6-96231b3b80d8
This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295723 91177308-0d34-0410-b5e6-96231b3b80d8
Currently just contains one case where we combine to VZEXT_MOVL instead of VZEXT which would avoid the need for a zero vector to be generated
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295721 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is a fix for assertion failure in
`getInverseMinMaxSelectPattern` when ABS is passed in as a select pattern.
We should not be invoking the simplification rule for
ABS(MIN(~ x,y))) or ABS(MAX(~x,y)) combinations.
Added a test case which would cause an assertion failure without the patch.
Reviewers: sanjoy, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30051
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295719 91177308-0d34-0410-b5e6-96231b3b80d8