The underlying issues surrounding codegen for 32-bit vselects have been resolved. The pessimistic costs for 64-bit vselects remain due to the bad
scalarization that is still happening there.
I tested this on A57 in T32, A32 and A64 modes. I saw no regressions, and some improvements.
From my benchmarks, I saw these improvements in A57 (T32)
spec.cpu2000.ref.177_mesa 5.95%
lnt.SingleSource/Benchmarks/Shootout/strcat 12.93%
lnt.MultiSource/Benchmarks/MiBench/telecomm-CRC32/telecomm-CRC32 11.89%
I also measured A57 A32, A53 T32 and A9 T32 and found no performance regressions. I see much bigger wins in third-party benchmarks with this change
Differential Revision: http://reviews.llvm.org/D14743
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@253349 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r252862. This introduced test failures and I'm reverting while I investigate how this happened.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252863 91177308-0d34-0410-b5e6-96231b3b80d8
A function can be marked as norecurse if:
* The SCC to which it belongs has cardinality 1; and either
a) It does not call any non-norecurse function. This includes self-recursion; or
b) It only has one callsite and the function that callsite is within is marked norecurse.
a) is best propagated bottom-up and b) is best propagated top-down.
We build up the norecurse attributes bottom-up using the existing SCC pass, and mark functions with no obvious recursion (but not provably norecurse) to sweep later, top-down.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252862 91177308-0d34-0410-b5e6-96231b3b80d8
This is fix for PR24059.
When we are hoisting instruction above some condition it may turn out
that metadata on this instruction was control dependant on the condition.
This metadata becomes invalid and we need to drop it.
This patch should cover most obvious places of speculative execution (which
I have found by greping isSafeToSpeculativelyExecute). I think there are more
cases but at least this change covers the severe ones.
Differential Revision: http://reviews.llvm.org/D14398
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252604 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
GetUnderlyingObjects() can return "null" among its list of objects,
we don't want to deduce that two pointers can point to the same
memory in this case, so filter it out.
Reviewers: anemet
Subscribers: dexonsmith, llvm-commits
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252149 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When the dependence distance in zero then we have a loop-independent
dependence from the earlier to the later access.
No current client of LAA uses forward dependences so other than
potentially hitting the MaxDependences threshold earlier, this change
shouldn't affect anything right now.
This and the previous patch were tested together for compile-time
regression. None found in LNT/SPEC.
Reviewers: hfinkel
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13255
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251973 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Before this change, we didn't use to collect forward dependences since
none of the current clients (LV, LDist) required them.
The motivation to also collect forward dependences is a new pass
LoopLoadElimination (LLE) which discovers store-to-load forwarding
opportunities across the loop's backedge. The pass uses both lexically
forward or backward loop-carried dependences to detect these
opportunities.
The new pass also analyzes loop-independent (forward) dependences since
they can conflict with the loop-carried dependences in terms of how the
data flows through memory.
The newly added test only covers loop-carried forward dependences
because loop-independent ones are currently categorized as NoDep. The
next patch will fix this.
The two patches were tested together for compile-time regression. None
found in LNT/SPEC.
Note that with this change LAA provides all dependences rather than just
"interesting" ones. A subsequent NFC patch will remove the now trivial
isInterestingDependence and rename the APIs.
Reviewers: hfinkel
Subscribers: jmolloy, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D13254
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251972 91177308-0d34-0410-b5e6-96231b3b80d8
Have `getConstantEvolutionLoopExitValue` work correctly with multiple
entry loops.
As far as I can tell, `getConstantEvolutionLoopExitValue` never did the
right thing for multiple entry loops; and before r249712 it would
silently return an incorrect answer. r249712 changed SCEV to fail an
assert on a multiple entry loop, and this change fixes the underlying
issue.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251770 91177308-0d34-0410-b5e6-96231b3b80d8
Prevent `createNodeFromSelectLikePHI` from creating SCEV expressions
that break LCSSA.
A better fix for the same issue is to teach SCEVExpander to not break
LCSSA by inserting PHI nodes at appropriate places. That's planned for
the future.
Fixes PR25360.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251756 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When forming expressions for phi nodes having an incoming value from
outside the loop A and a value coming from the previous iteration B
we were forming an AddRec if:
- B was an AddRec
- the value A was equal to the value for B at iteration -1 (or equal
to the value of B shifted by one iteration, at iteration 0)
In this case, we were computing the expression to be the expression of
B, shifted by one iteration.
This changes generalizes the logic above by removing the restriction that
B needs to be an AddRec. For this we introduce two expression rewriters
that allow us to
- shift an expression by one iteration
- get the value of an expression at iteration 0
This allows us to get SCEV expressions for PHI nodes when these expressions
are not AddRecExprs.
Reviewers: sanjoy
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D14175
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251700 91177308-0d34-0410-b5e6-96231b3b80d8
This teaches SCEV to compute //max// backedge taken counts for loops
like
for (int i = k; i != 0; i >>>= 1)
whatever();
SCEV yet cannot represent the exact backedge count for these loops, and
this patch does not change that. This is really geared towards teaching
SCEV that loops like the above are *not* infinite.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251558 91177308-0d34-0410-b5e6-96231b3b80d8
When checking if an indirect global (a global with pointer type) is only assigned by allocation functions, first check if the global is itself initialized. If it is, it's not only assigned by allocation functions.
This fixes PR25309. Thanks to David Majnemer for reducing the test case!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251508 91177308-0d34-0410-b5e6-96231b3b80d8
Even though we may not know the value of the shifter operand, it's possible we know the shifter operand is non-zero. This can allow us to infer more known bits - for example:
%1 = load %p !range {1, 5}
%2 = shl %q, %1
We don't know %1, but we do know that it is nonzero so %2[0] is known zero, and importantly %2 is known non-zero.
Calling isKnownNonZero is nontrivially expensive so use an Optional to run it lazily and cache its result.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251294 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This uses `ScalarEvolution::getRange` and not potentially control
dependent `nsw` and `nuw` bits on the arithmetic instruction.
Reviewers: atrick, hfinkel, nlewycky
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D13613
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251048 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of bailing out when we see loads, analyze them. If we can prove that the loaded-from address must escape, then we can conclude that a load from that address must escape too and therefore cannot alias a non-addr-taken global.
When checking if a Value can alias a non-addr-taken global, if the Value is a LoadInst of a non-global, recurse instead of bailing.
If we can follow a trail of loads up to some base that is captured, we know by inference that all the loads we followed are also captured.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251017 91177308-0d34-0410-b5e6-96231b3b80d8
If the final indices of two GEPs can be proven to not be equal, and
the GEP is of a SequentialType (not a StructType), then the two GEPs
do not alias.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251016 91177308-0d34-0410-b5e6-96231b3b80d8
isKnownNonEqual(A, B) returns true if it can be determined that A != B.
At the moment it only knows two facts, that a non-wrapping add of nonzero to a value cannot be that value:
A + B != A [where B != 0, addition is nsw or nuw]
and that contradictory known bits imply two values are not equal.
This patch also hooks this up to InstSimplify; InstSimplify had a peephole for the first fact but not the second so this teaches InstSimplify a new trick too (alas no measured performance impact!)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251012 91177308-0d34-0410-b5e6-96231b3b80d8
Weak linkage and friends allow a symbol to be overriden outside the
code generator's model, so GlobalsAA shouldn't assume that anything it
can compute about such a symbol is valid.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250156 91177308-0d34-0410-b5e6-96231b3b80d8
This patch also allows the -delinearize pass to delinearize expressions that do
not have an outermost SCEVAddRec expression. The SCEV::delinearize
infrastructure allowed this since r240952, but the -delinearize pass was not
updated yet.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250018 91177308-0d34-0410-b5e6-96231b3b80d8
There are several dodgy costings due to AVX1 legalizing 256-bit integer vectors that need fixing.
As discussed in D8690.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249983 91177308-0d34-0410-b5e6-96231b3b80d8
There are several dodgy costings due to AVX1 legalizing 256-bit integer vectors that need fixing.
As discussed in D8690.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249981 91177308-0d34-0410-b5e6-96231b3b80d8
The new implementation works at least as well as the old implementation
did.
Also delete the associated preparation tests. They don't exercise
interesting corner cases of the new implementation. All the codegen
tests of the EH tables have already been ported.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249918 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of bailing out when we see an icmp, we can instead at least
say that if the upper bits of both operands are known zero, they are
not demanded. This doesn't help with signed comparisons, but it's at
least better than bailing out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249687 91177308-0d34-0410-b5e6-96231b3b80d8
Like adds and subtracts, muls ripple only to the left so we can use
the same logic.
While we're here, add a print method to DemandedBits so it can be used
with -analyze, which we'll use in the testcase.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249686 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r249528 and reapply r249431. The fix for the
fallout has been commited in r249575.
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249581 91177308-0d34-0410-b5e6-96231b3b80d8
With this patch, clang -O3 optimizes correctly providing > 1000x speedup on this artificial benchmark):
for (a=0; a<n; a++)
for (b=0; b<n; b++)
for (c=0; c<n; c++)
for (d=0; d<n; d++)
for (e=0; e<n; e++)
for (f=0; f<n; f++)
x++;
From test-suite/SingleSource/Benchmarks/Shootout/nestedloop.c
Reviewers: sanjoyd
Differential Revision: http://reviews.llvm.org/D13390
From: Mehdi Amini <mehdi.amini@apple.com>
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249431 91177308-0d34-0410-b5e6-96231b3b80d8
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,
<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)
to
<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)
This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.
Differential Revision: http://reviews.llvm.org/D12985
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248887 91177308-0d34-0410-b5e6-96231b3b80d8
The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.
Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.
Differential Revision: http://reviews.llvm.org/D8690
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248878 91177308-0d34-0410-b5e6-96231b3b80d8
If a PHI starts at a non-negative constant, monotonically increases
(only adds of a constant are supported at the moment) and that add
does not wrap, then the PHI is known never to be zero.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248796 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the trip count of a specific backedge is `N`, then we know that
backedge is effectively guarded by the condition `{0,+,1} u< N`. This
change teaches SCEV to use this condition to prove things in
`isLoopBackedgeGuardedByCond`.
Depends on D12948
Depends on D12949
The original checkin, r248608 had to be backed out due to an issue with
a ObjCXX unit test. That issue is now fixed, so re-landing.
Reviewers: atrick, reames, majnemer, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12950
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248638 91177308-0d34-0410-b5e6-96231b3b80d8
BranchProbability now is represented by its numerator and denominator in uint32_t type. This patch changes this representation into a fixed point that is represented by the numerator in uint32_t type and a constant denominator 1<<31. This is quite similar to the representation of BlockMass in BlockFrequencyInfoImpl.h. There are several pros and cons of this change:
Pros:
1. It uses only a half space of the current one.
2. Some operations are much faster like plus, subtraction, comparison, and scaling by an integer.
Cons:
1. Constructing a probability using arbitrary numerator and denominator needs additional calculations.
2. It is a little less precise than before as we use a fixed denominator. For example, 1 - 1/3 may not be exactly identical to 1 / 3 (this will lead to many BranchProbability unit test failures). This should not matter when we only use it for branch probability. If we use it like a rational value for some precise calculations we may need another construct like ValueRatio.
One important reason for this change is that we propose to store branch probabilities instead of edge weights in MachineBasicBlock. We also want clients to use probability instead of weight when adding successors to a MBB. The current BranchProbability has more space which may be a concern.
Differential revision: http://reviews.llvm.org/D12603
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248633 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If the trip count of a specific backedge is `N`, then we know that
backedge is effectively guarded by the condition `{0,+,1} u< N`. This
change teaches SCEV to use this condition to prove things in
`isLoopBackedgeGuardedByCond`.
Depends on D12948
Depends on D12949
Reviewers: atrick, reames, majnemer, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12950
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248608 91177308-0d34-0410-b5e6-96231b3b80d8