This patch simply mirrors the attributes we give to @llvm.nvvm.reflect
to the __nvvm_reflect libdevice call. This shaves about 30% of the code
in libdevice away because of CSE opportunities. It's also helps us
figure out that libdevice implementations of transcendental functions
don't have side-effects.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@265060 91177308-0d34-0410-b5e6-96231b3b80d8
The rule for SMIN introduced in rL236202 doesn't work as advertised: the
check for Pred == ICmpInst::ICMP_SGT was missing.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264996 91177308-0d34-0410-b5e6-96231b3b80d8
This way once we teach MatchBinaryOp to map more things into arithmetic,
the non-wrapping add recurrence construction would understand it too.
Right now MatchBinaryOp still only understands arithmetic, so this is
solely a code-reorganization change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264994 91177308-0d34-0410-b5e6-96231b3b80d8
We already try not to truncate PHIs in computeMinimalBitwidths. LoopVectorize can't handle it and we really don't need to, because both induction and reduction PHIs are truncated by other means.
However, we weren't bailing out in all the places we should have, and we ended up by returning a PHI to be truncated, which has caused PR27018.
This fixes PR17018.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264852 91177308-0d34-0410-b5e6-96231b3b80d8
MatchBinaryOp abstracts out the IR instructions from the operations they
represent. While this change is NFC, we will use this factoring later
to map things like `(extractvalue 0 (sadd.with.overflow X Y))` to `(add
X Y)`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264747 91177308-0d34-0410-b5e6-96231b3b80d8
A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-load forwarding across a release fence.
We choose to be much more conservative about stores. In theory, nothing prevents us from shifting a store from after a release fence to before it, and then eliminating the preceeding (previously fenced) store. Doing this without actually moving the second store is likely also legal, but we chose to be conservative at this time.
The LangRef indicates only atomic loads and stores are effected by fences. This patch chooses to be far more conservative then that.
This is the GVN companion to http://reviews.llvm.org/D11434 which applied the same logic in EarlyCSE and has been baking in tree for a while now.
Differential Revision: http://reviews.llvm.org/D11436
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264472 91177308-0d34-0410-b5e6-96231b3b80d8
This reserves an MDKind for !llvm.loop, which allows callers to avoid a
string-based lookup. I'm not sure why it was missing.
There should be no functionality change here, just a small compile-time
speedup.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264371 91177308-0d34-0410-b5e6-96231b3b80d8
We used to only allow SCEVAddRecExpr for pointer expressions in order to
be able to compute the bounds. However this is also trivially possible
for loop-invariant addresses (scUnknown) since then the bounds are the
address itself.
Interestingly, we used allow this for the special case when the
loop-invariant address happens to also be an SCEVAddRecExpr (in an outer
loop).
There are a couple more loops that are vectorized in SPEC after this.
My guess is that the main reason we don't see more because for example a
loop-invariant load is vectorized into a splat vector with several
vector-inserts. This is likely to make the vectorization unprofitable.
I.e. we don't notice that a later LICM will move all of this out of the
loop so the cost estimate should really be 0.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264243 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This changes the conversion functions from SCEV * to SCEVAddRecExpr from
ScalarEvolution and PredicatedScalarEvolution to return a SCEVAddRecExpr*
instead of a SCEV* (which removes the need of most clients to do a
dyn_cast right after calling these functions).
We also don't add new predicates if the transformation was not successful.
This is not entirely a NFC (as it can theoretically remove some predicates
from LAA when we have an unknown dependece), but I couldn't find an obvious
regression test for it.
Reviewers: sanjoy
Subscribers: sanjoy, mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18368
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264161 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
replaceCongruentIVs can break LCSSA when trying to replace IV increments
since it tries to replace all uses of a phi node with another phi node
while both of the phi nodes are not necessarily in the processed loop.
This will cause an assert in IndVars.
To fix this, we add a check to make sure that the replacement maintains
LCSSA.
Reviewers: sanjoy
Subscribers: mzolotukhin, llvm-commits
Differential Revision: http://reviews.llvm.org/D18266
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263941 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
It can hurt performance to prefetch ahead too much. Be conservative for
now and don't prefetch ahead more than 3 iterations on Cyclone.
Reviewers: hfinkel
Subscribers: llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17949
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263772 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
And use this TTI for Cyclone. As it was explained in the original RFC
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW
prefetcher work up to 2KB strides.
I am also adding tests for this and the previous change (D17943):
* Cyclone prefetching accesses with a large stride
* Cyclone not prefetching accesses with a small stride
* Generic Aarch64 subtarget not prefetching either
Reviewers: hfinkel
Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17945
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263771 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This form was replaced by a form taking an instruction instead of opcode and
return type in r258391. After committing this change (and some depending,
follow-up changes) it turned out in the review thread to be controversial. The
discussion didn't come to a conclusion yet. I'm re-adding the old form to fix
the API regression and to provide a better base for discussion, possibly on
llvm-dev.
A difference to the original function is that it can't be called with GEPs
(similarly to how it was already the case for compares). In order to support
opaque pointers in the future, folding GEPs needs to be passed the source
element type, which is not possible with the current API.
Reviewers: dberlin, reames
Subscribers: dblaikie, eddyb
Differential Revision: http://reviews.llvm.org/D17901
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263501 91177308-0d34-0410-b5e6-96231b3b80d8
Check to see if all operands are constant before calling simplify on them
so that we don't perform wasted simplifications.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263374 91177308-0d34-0410-b5e6-96231b3b80d8
This doesn't change how many times we construct domtrees in the normal
pipeline, and it removes fragility and instability where basic-aa may
not be run in time to see domtrees because they happen to be constructed
afterward.
This isn't quite as clean as the change to memdep because there is
a mode where basic-aa specifically runs without domtrees -- in the
hacking version used by function-attrs with the legacy pass manager.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263234 91177308-0d34-0410-b5e6-96231b3b80d8
This doesn't cause us to construct dominator trees any more often in the
normal pipeline, and removes an entire mode of memdep that needed to be
reasoned about and maintained. Perhaps more importantly, it removes the
ability for the results of memdep to be different because of accidental
pass scheduling goofs or the order of evaluation of 'getResult' calls.
Essentially, 'getCachedResult', unless across IR-unit boundaries, is
extremely dangerous. We need to work much harder to avoid it (or its
analog in the old pass manager).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263232 91177308-0d34-0410-b5e6-96231b3b80d8
This was originally a pointer to support pass managers which didn't use
AnalysisManagers. However, that doesn't realistically come up much and
the complexity of supporting it doesn't really make sense.
In fact, *many* parts of the pass manager were just assuming the pointer
was never null already. This at least makes it much more explicit and
clear.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263219 91177308-0d34-0410-b5e6-96231b3b80d8
work in the face of the limitations of DLLs and templated static
variables.
This requires passes that use the AnalysisBase mixin provide a static
variable themselves. So as to keep their APIs clean, I've made these
private and befriended the CRTP base class (which is the common
practice).
I've added documentation to AnalysisBase for why this is necessary and
at what point we can go back to the much simpler system.
This is clearly a better pattern than the extern template as it caught
*numerous* places where the template magic hadn't been applied and
things were "just working" but would eventually have broken
mysteriously.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263216 91177308-0d34-0410-b5e6-96231b3b80d8
actually finish wiring up the old call graph.
There were bugs in the old call graph that hadn't been caught because it
wasn't being tested. It wasn't being tested because it wasn't in the
pipeline system and we didn't have a printing pass to run in tests. This
fixes all of that.
As for why I'm still keeping the old call graph alive its so that I can
port GlobalsAA to the new pass manager with out forking it to work with
the lazy call graph. That's clearly the right eventual design, but it
seems pragmatic to defer that until its necessary. The old call graph
works just fine for GlobalsAA.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263104 91177308-0d34-0410-b5e6-96231b3b80d8
location in the opt tool to live along side the analysis in LLVM's
libraries.
No functionality changed here, but this will allow me to port the
printer to the new pass manager as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263101 91177308-0d34-0410-b5e6-96231b3b80d8
There is another pass by the generic name 'CallGraphPrinter' which is
actually just a call graph printer tucked away inside the opt tool. I'd
like to bring it out and make it follow the same patterns as the rest of
the CallGraph code, but doing so would end up conflicting with the name
of the DOT printing pass. So this makes the DOT printing pass name be
more precise.
No functionality changed here.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263100 91177308-0d34-0410-b5e6-96231b3b80d8
This is a fairly straightforward port to the new pass manager with one
exception. It removes a very questionable use of releaseMemory() in
the old pass to invalidate its caches between runs on a function.
I don't think this is really guaranteed to be safe. I've just used the
more direct port to the new PM to address this by nuking the results
object each time the pass runs. While this could cause some minor malloc
traffic increase, I don't expect the compile time performance hit to be
noticable, and it makes the correctness and other aspects of the pass
much easier to reason about. In some cases, it may make things faster by
making the sets and maps smaller with better locality. Indeed, the
measurements collected by Bruno (thanks!!!) show mostly compile time
improvements.
There is sadly very limited testing at this point as there are only two
tests of memdep, and both rely on GVN. I'll be porting GVN next and that
will exercise this heavily though.
Differential Revision: http://reviews.llvm.org/D17962
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263082 91177308-0d34-0410-b5e6-96231b3b80d8
MemoryDependenceAnalysis had a hard-coded exception to the general aliasing rules for malloc and calloc. The reasoning that applied there is equally valid in BasicAA and clarifies the remaining logic in MDA.
In principal, this can expose slightly more optimization opportunities, but since essentially all of our aliasing aware memory optimization passes go through MDA, this will likely be NFC in practice.
Differential Revision: http://reviews.llvm.org/D15912
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263075 91177308-0d34-0410-b5e6-96231b3b80d8
Extract out a generic interface from a recently landed patch and document a TODO in case compile time becomes a problem.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263062 91177308-0d34-0410-b5e6-96231b3b80d8
Building on the previous change, this generalizes
ScalarEvolution::getRangeViaFactoring to work with
{Ext(C?A:B)+k0,+,Ext(C?A:B)+k1} where Ext can be a zero extend, sign
extend or truncate operation, and k0 and k1 are constants.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262979 91177308-0d34-0410-b5e6-96231b3b80d8
This change generalizes ScalarEvolution::getRangeViaFactoring to work
with {Ext(C?A:B),+,Ext(C?A:B)} where Ext can be a zero extend, sign
extend or truncate operation.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262978 91177308-0d34-0410-b5e6-96231b3b80d8
This is much more clear and less surprising IMO. It also makes things
more consistent with the increasingly large chunk of LLVM code that
assumes true-on-success.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262826 91177308-0d34-0410-b5e6-96231b3b80d8
duplicated comments.
In several cases these had diverged making them especially nice to
canonicalize. I checked to make sure we weren't losing important
information of course.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262825 91177308-0d34-0410-b5e6-96231b3b80d8
the new pass manager.
The port will involve substantial edits here, and would likely introduce
bad formatting if formatted in isolation, so just get all the formatting
up to snuff. I'll also go through and try to freshen the doxygen here as
well as modernizing some of the code.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262821 91177308-0d34-0410-b5e6-96231b3b80d8
The diff is relatively large since I took a chance to rearrange the code I had to touch in a more obvious way, but the key bit is merely using the !range metadata when we can't analyze the instruction further. The previous !range metadata code was essentially just dead since no binary operator or cast will have !range metadata (per Verifier) and it was otherwise dropped on the floor.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262751 91177308-0d34-0410-b5e6-96231b3b80d8
This experiment was originally about trying to use facts implied dominating conditions to infer more precise known bits. While the compile time was found to be acceptable on several large code bases, we never found sufficiently profitable examples to justify turning on the code by default. Given this, it's time to abandon the experiment.
Several folks have commented that they've found this useful for experimentation, but nothing has come of those experiments. Given how easy the patch is to apply, there's no reason to leave the code in tree.
For anyone interested in further investigation in this area, I recommend finding the summary email I sent on one of the original review threads. In particular, I now believe the use-list based approach is strictly worse than the dom-tree-walking approach.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262646 91177308-0d34-0410-b5e6-96231b3b80d8
Exploit ScalarEvolution::getRange's newly acquired smartness (since
r262438) by using that to infer nsw and nuw when possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@262639 91177308-0d34-0410-b5e6-96231b3b80d8