Commit Graph

2423 Commits

Author SHA1 Message Date
Duncan P. N. Exon Smith
9df0e64f25 ADT: Split ilist_node_traits into alloc and callback, NFC
Many lists want to override only allocation semantics, or callbacks for
iplist.  Split these up to prevent code duplication.
- Specialize ilist_alloc_traits to change the implementations of
  deleteNode() and createNode().
- One common desire is to do nothing deleteNode() and disable
  createNode().  Specialize ilist_alloc_traits to inherit from
  ilist_noalloc_traits for that behaviour.
- Specialize ilist_callback_traits to use the addNodeToList(),
  removeNodeFromList(), and transferNodesFromList() callbacks.

As a drive-by, add some coverage to the callback-related unit tests.

llvm-svn: 280128
2016-08-30 18:40:47 +00:00
James Y Knight
67397c2b8d Replace incorrect "#ifdef DEBUG" with "#ifndef NDEBUG".
The former is simply wrong -- the code will either never be used or will
always be used, rather than being dependent upon whether it's built with
debug assertions enabled.

The macro DEBUG isn't ever set by the llvm build system. But, the macro
DEBUG(X) is defined (unconditionally) if you happen to include
llvm/Support/Debug.h.

The code in Value.h which was erroneously protected by the #ifdef DEBUG
didn't even compile -- you can't cast<> from an LLVMOpaqueValue
directly. Fortunately, it was never invoked, as Core.cpp included
Value.h before Debug.h.

The conditionalized code in AArch64CollectLOH.cpp was previously always
used, as it includes Debug.h.

llvm-svn: 280056
2016-08-30 03:16:16 +00:00
Duncan P. N. Exon Smith
4e09f9bf86 ADT: Give ilist<T>::reverse_iterator a handle to the current node
Reverse iterators to doubly-linked lists can be simpler (and cheaper)
than std::reverse_iterator.  Make it so.

In particular, change ilist<T>::reverse_iterator so that it is *never*
invalidated unless the node it references is deleted.  This matches the
guarantees of ilist<T>::iterator.

(Note: MachineBasicBlock::iterator is *not* an ilist iterator, but a
MachineInstrBundleIterator<MachineInstr>.  This commit does not change
MachineBasicBlock::reverse_iterator, but it does update
MachineBasicBlock::reverse_instr_iterator.  See note at end of commit
message for details on bundle iterators.)

Given the list (with the Sentinel showing twice for simplicity):

     [Sentinel] <-> A <-> B <-> [Sentinel]

the following is now true:
 1. begin() represents A.
 2. begin() holds the pointer for A.
 3. end() represents [Sentinel].
 4. end() holds the poitner for [Sentinel].
 5. rbegin() represents B.
 6. rbegin() holds the pointer for B.
 7. rend() represents [Sentinel].
 8. rend() holds the pointer for [Sentinel].

The changes are #6 and #8.  Here are some properties from the old
scheme (which used std::reverse_iterator):
- rbegin() held the pointer for [Sentinel] and rend() held the pointer
  for A;
- operator*() cost two dereferences instead of one;
- converting from a valid iterator to its valid reverse_iterator
  involved a confusing increment; and
- "RI++->erase()" left RI invalid.  The unintuitive replacement was
  "RI->erase(), RE = end()".

With vector-like data structures these properties are hard to avoid
(since past-the-beginning is not a valid pointer), and don't impose a
real cost (since there's still only one dereference, and all iterators
are invalidated on erase).  But with lists, this was a poor design.

Specifically, the following code (which obviously works with normal
iterators) now works with ilist::reverse_iterator as well:

    for (auto RI = L.rbegin(), RE = L.rend(); RI != RE;)
      fooThatMightRemoveArgFromList(*RI++);

Converting between iterator and reverse_iterator for the same node uses
the getReverse() function.

    reverse_iterator iterator::getReverse();
    iterator reverse_iterator::getReverse();

Why doesn't iterator <=> reverse_iterator conversion use constructors?

In order to catch and update old code, reverse_iterator does not even
have an explicit conversion from iterator.  It wouldn't be safe because
there would be no reasonable way to catch all the bugs from the changed
semantic (see the changes at call sites that are part of this patch).

Old code used this API:

    std::reverse_iterator::reverse_iterator(iterator);
    iterator std::reverse_iterator::base();

Here's how to update from old code to new (that incorporates the
semantic change), assuming I is an ilist<>::iterator and RI is an
ilist<>::reverse_iterator:

            [Old]         ==>          [New]
    reverse_iterator(I)       (--I).getReverse()
    reverse_iterator(I)         ++I.getReverse()
  --reverse_iterator(I)           I.getReverse()
    reverse_iterator(++I)         I.getReverse()
          RI.base()          (--RI).getReverse()
          RI.base()            ++RI.getReverse()
        --RI.base()              RI.getReverse()
      (++RI).base()              RI.getReverse()
  delete &*RI, RE = end()         delete &*RI++
  RI->erase(), RE = end()         RI++->erase()

=======================================
Note: bundle iterators are out of scope
=======================================

MachineBasicBlock::iterator, also known as
MachineInstrBundleIterator<MachineInstr>, is a wrapper to represent
MachineInstr bundles.  The idea is that each operator++ takes you to the
beginning of the next bundle.  Implementing a sane reverse iterator for
this is harder than ilist.  Here are the options:
- Use std::reverse_iterator<MBB::i>.  Store a handle to the beginning of
  the next bundle.  A call to operator*() runs a loop (usually
  operator--() will be called 1 time, for unbundled instructions).
  Increment/decrement just works.  This is the status quo.
- Store a handle to the final node in the bundle.  A call to operator*()
  still runs a loop, but it iterates one time fewer (usually
  operator--() will be called 0 times, for unbundled instructions).
  Increment/decrement just works.
- Make the ilist_sentinel<MachineInstr> *always* store that it's the
  sentinel (instead of just in asserts mode).  Then the bundle iterator
  can sniff the sentinel bit in operator++().

I initially tried implementing the end() option as part of this commit,
but updating iterator/reverse_iterator conversion call sites was
error-prone.  I have a WIP series of patches that implements the final
option.

llvm-svn: 280032
2016-08-30 00:13:12 +00:00
Gor Nishanov
cc3ff9c81d [Coroutines] Part 9: Add cleanup subfunction.
Summary:
[Coroutines] Part 9: Add cleanup subfunction.

This patch completes coroutine heap allocation elision. Now, the heap elision example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex3.ll)

Intrinsic Changes:
* coro.free gets a token parameter tying it to coro.id to allow reliably discovering all coro.frees associated with a particular coroutine.
* coro.id gets an extra parameter that points back to a coroutine function. This allows to check whether a coro.id describes the enclosing function or it belongs to a different function that was later inlined.

CoroSplit now creates three subfunctions:
# f$resume - resume logic
# f$destroy - cleanup logic, followed by a deallocation code
# f$cleanup - just the cleanup code

CoroElide pass during devirtualization replaces coro.destroy with either f$destroy or f$cleanup depending whether heap elision is performed or not.

Other fixes, improvements:
* Fixed buglet in Shape::buildFrame that was not creating coro.save properly if coroutine has more than one suspend point.

* Switched to using variable width suspend index field (no longer limited to 32 bit index field can be as little as i1 or as large as i<whatever-size_t-is>)

Reviewers: majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23844

llvm-svn: 279971
2016-08-29 14:34:12 +00:00
Changpeng Fang
351c66ab91 AMDGCN/SI: Implement readlane/readfirstlane intrinsics
Summary:
  This patch implements readlane/readfirstlane intrinsics.
TODO: need to define a new register class to consider the case
that the source could be a vector register or M0.

Reviewed by:
  arsenm and tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D22489

llvm-svn: 279660
2016-08-24 20:35:23 +00:00
David Blaikie
22b8e86371 DebugInfo: Add flag to CU to disable emission of inline debug info into the skeleton CU
In cases where .dwo/.dwp files are guaranteed to be available, skipping
the extra online (in the .o file) inline info can save a substantial
amount of space - see the original r221306 for more details there.

llvm-svn: 279650
2016-08-24 18:29:49 +00:00
Chandler Carruth
94942f5c00 [PM] Introduce basic update capabilities to the new PM's CGSCC pass
manager, including both plumbing and logic to handle function pass
updates.

There are three fundamentally tied changes here:
1) Plumbing *some* mechanism for updating the CGSCC pass manager as the
   CG changes while passes are running.
2) Changing the CGSCC pass manager infrastructure to have support for
   the underlying graph to mutate mid-pass run.
3) Actually updating the CG after function passes run.

I can separate them if necessary, but I think its really useful to have
them together as the needs of #3 drove #2, and that in turn drove #1.

The plumbing technique is to extend the "run" method signature with
extra arguments. We provide the call graph that intrinsically is
available as it is the basis of the pass manager's IR units, and an
output parameter that records the results of updating the call graph
during an SCC passes's run. Note that "...UpdateResult" isn't a *great*
name here... suggestions very welcome.

I tried a pretty frustrating number of different data structures and such
for the innards of the update result. Every other one failed for one
reason or another. Sometimes I just couldn't keep the layers of
complexity right in my head. The thing that really worked was to just
directly provide access to the underlying structures used to walk the
call graph so that their updates could be informed by the *particular*
nature of the change to the graph.

The technique for how to make the pass management infrastructure cope
with mutating graphs was also something that took a really, really large
number of iterations to get to a place where I was happy. Here are some
of the considerations that drove the design:

- We operate at three levels within the infrastructure: RefSCC, SCC, and
  Node. In each case, we are working bottom up and so we want to
  continue to iterate on the "lowest" node as the graph changes. Look at
  how we iterate over nodes in an SCC running function passes as those
  function passes mutate the CG. We continue to iterate on the "lowest"
  SCC, which is the one that continues to contain the function just
  processed.

- The call graph structure re-uses SCCs (and RefSCCs) during mutation
  events for the *highest* entry in the resulting new subgraph, not the
  lowest. This means that it is necessary to continually update the
  current SCC or RefSCC as it shifts. This is really surprising and
  subtle, and took a long time for me to work out. I actually tried
  changing the call graph to provide the opposite behavior, and it
  breaks *EVERYTHING*. The graph update algorithms are really deeply
  tied to this particualr pattern.

- When SCCs or RefSCCs are split apart and refined and we continually
  re-pin our processing to the bottom one in the subgraph, we need to
  enqueue the newly formed SCCs and RefSCCs for subsequent processing.
  Queuing them presents a few challenges:
  1) SCCs and RefSCCs use wildly different iteration strategies at
     a high level. We end up needing to converge them on worklist
     approaches that can be extended in order to be able to handle the
     mutations.
  2) The order of the enqueuing need to remain bottom-up post-order so
     that we don't get surprising order of visitation for things like
     the inliner.
  3) We need the worklists to have set semantics so we don't duplicate
     things endlessly. We don't need a *persistent* set though because
     we always keep processing the bottom node!!!! This is super, super
     surprising to me and took a long time to convince myself this is
     correct, but I'm pretty sure it is... Once we sink down to the
     bottom node, we can't re-split out the same node in any way, and
     the postorder of the current queue is fixed and unchanging.
  4) We need to make sure that the "current" SCC or RefSCC actually gets
     enqueued here such that we re-visit it because we continue
     processing a *new*, *bottom* SCC/RefSCC.

- We also need the ability to *skip* SCCs and RefSCCs that get merged
  into a larger component. We even need the ability to skip *nodes* from
  an SCC that are no longer part of that SCC.

This led to the design you see in the patch which uses SetVector-based
worklists. The RefSCC worklist is always empty until an update occurs
and is just used to handle those RefSCCs created by updates as the
others don't even exist yet and are formed on-demand during the
bottom-up walk. The SCC worklist is pre-populated from the RefSCC, and
we push new SCCs onto it and blacklist existing SCCs on it to get the
desired processing.

We then *directly* update these when updating the call graph as I was
never able to find a satisfactory abstraction around the update
strategy.

Finally, we need to compute the updates for function passes. This is
mostly used as an initial customer of all the update mechanisms to drive
their design to at least cover some real set of use cases. There are
a bunch of interesting things that came out of doing this:

- It is really nice to do this a function at a time because that
  function is likely hot in the cache. This means we want even the
  function pass adaptor to support online updates to the call graph!

- To update the call graph after arbitrary function pass mutations is
  quite hard. We have to build a fairly comprehensive set of
  data structures and then process them. Fortunately, some of this code
  is related to the code for building the cal graph in the first place.
  Unfortunately, very little of it makes any sense to share because the
  nature of what we're doing is so very different. I've factored out the
  one part that made sense at least.

- We need to transfer these updates into the various structures for the
  CGSCC pass manager. Once those were more sanely worked out, this
  became relatively easier. But some of those needs necessitated changes
  to the LazyCallGraph interface to make it significantly easier to
  extract the changed SCCs from an update operation.

- We also need to update the CGSCC analysis manager as the shape of the
  graph changes. When an SCC is merged away we need to clear analyses
  associated with it from the analysis manager which we didn't have
  support for in the analysis manager infrsatructure. New SCCs are easy!
  But then we have the case that the original SCC has its shape changed
  but remains in the call graph. There we need to *invalidate* the
  analyses associated with it.

- We also need to invalidate analyses after we *finish* processing an
  SCC. But the analyses we need to invalidate here are *only those for
  the newly updated SCC*!!! Because we only continue processing the
  bottom SCC, if we split SCCs apart the original one gets invalidated
  once when its shape changes and is not processed farther so its
  analyses will be correct. It is the bottom SCC which continues being
  processed and needs to have the "normal" invalidation done based on
  the preserved analyses set.

All of this is mostly background and context for the changes here.

Many thanks to all the reviewers who helped here. Especially Sanjoy who
caught several interesting bugs in the graph algorithms, David, Sean,
and others who all helped with feedback.

Differential Revision: http://reviews.llvm.org/D21464

llvm-svn: 279618
2016-08-24 09:37:14 +00:00
Xinliang David Li
af8f6ba1d3 Fix windows build failure
llvm-svn: 279525
2016-08-23 16:00:54 +00:00
Xinliang David Li
df1f2a779a [Profile] refactor meta data copying/swapping code
Differential Revision: http://reviews.llvm.org/D23619

llvm-svn: 279523
2016-08-23 15:39:03 +00:00
Tim Shen
33e4d80307 [GraphTraits] Replace all NodeType usage with NodeRef
This should finish the GraphTraits migration.

Differential Revision: http://reviews.llvm.org/D23730

llvm-svn: 279475
2016-08-22 21:09:30 +00:00
Duncan P. N. Exon Smith
68b6058de5 ADT: Remove ilist_*sentinel_traits, NFC
Remove all the dead code around ilist_*sentinel_traits.  This is a
follow-up to gutting them as part of r279314 (originally r278974),
staged to prevent broken builds in sub-projects.

Uses were removed from clang in r279457 and lld in r279458.

llvm-svn: 279473
2016-08-22 20:51:00 +00:00
Pete Cooper
27f0062b01 Add comments and an assert to follow-up on r279113. NFC.
Philip commented on r279113 to ask for better comments as to
when to use the different versions of getName.  Its also possible
to assert in the simple case that we aren't an overloaded intrinsic
as those have to use the more capable version of getName.

Thanks for the comments Philip.

llvm-svn: 279466
2016-08-22 20:18:28 +00:00
Chandler Carruth
7dc472d231 [PM] Introduce an abstraction for all the analyses over a particular IR
unit for use in the PreservedAnalyses set.

This doesn't have any important functional change yet but it cleans
things up and makes the analysis substantially more efficient by
avoiding querying through the type erasure for every analysis.

I also think it makes it much easier to reason about how analyses are
preserved when walking across pass managers and across IR unit
abstractions.

Thanks to Sean and Mehdi both for the comments and suggestions.

Differential Revision: https://reviews.llvm.org/D23691

llvm-svn: 279360
2016-08-20 04:57:28 +00:00
Tim Shen
823bde34b3 [GraphTraits] Make nodes_iterator dereference to NodeType*/NodeRef
Currently nodes_iterator may dereference to a NodeType* or a NodeType&. Make them all dereference to NodeType*, which is NodeRef later.

Differential Revision: https://reviews.llvm.org/D23704
Differential Revision: https://reviews.llvm.org/D23705

llvm-svn: 279326
2016-08-19 21:20:13 +00:00
Chandler Carruth
682a2c8bc6 [PM] Re-instate r279227 and r279228 with a fix to the way the templating
was done to hopefully appease MSVC.

As an upside, this also implements the suggestion Sanjoy made in code
review, so two for one! =]

I'll be watching the bots to see if there are still issues.

llvm-svn: 279295
2016-08-19 18:36:06 +00:00
Chandler Carruth
969549cf52 [PM] Revert r279227 and r279228 until I can find someone to help me
solve completely opaque MSVC build errors. It complains about lots of
stuff with this change without givin nearly enough information to even
try to fix.

llvm-svn: 279231
2016-08-19 10:51:55 +00:00
Chandler Carruth
9d65c3131d [PM] Make the the new pass manager support fully generic extra arguments
to run methods, both for transform passes and analysis passes.

This also allows the analysis manager to use a different set of extra
arguments from the pass manager where useful. Consider passes over
analysis produced units of IR like SCCs of the call graph or loops.
Passes of this nature will often want to refer to the analysis result
that was used to compute their IR units (the call graph or LoopInfo).
And for transformations, they may want to communicate special update
information to the outer pass manager. With this change, it becomes
possible to have a run method for a loop pass that looks more like:

  PreservedAnalyses run(Loop &L, AnalysisManager<Loop, LoopInfo> &AM,
                        LoopInfo &LI, LoopUpdateRecord &UR);

And to query the analysis manager like:

    AM.getResult<MyLoopAnalysis>(L, LI);

This makes accessing the known-available analyses convenient and clear,
and it makes passing customized data structures around easy.

My initial use case is going to be in updating the pass manager layers
when the analysis units of IR change. But there are more use cases here
such as having a layer that lets inner passes signal whether certain
additional passes should be run because of particular simplifications
made. Two desires for this have come up in the past: triggering
additional optimization after successfully unrolling loops, and
triggering additional inlining after collapsing indirect calls to direct
calls.

Despite adding this layer of generic extensibility, the *only* change to
existing, simple usage are for places where we forward declare the
AnalysisManager template. We really shouldn't be doing this because of
the fragility exposed here, but currently it makes coping with the
legacy PM code easier.

Differential Revision: http://reviews.llvm.org/D21462

llvm-svn: 279227
2016-08-19 09:45:16 +00:00
Chandler Carruth
ce7d8b0ddc [PM] Try to work-around what appears to be an MSVC SFINAE issue with
r279217 where it fails to select the path that other compilers select.

The workaround won't be as careful to produce an error when an analysis
result is incorrect, but we can rely on non-MSVC builds to catch such
errors it seems and MSVC doesn't seem to support the alternative
techniques.

Hoping this brings the windows bots back to life. If not, will have to
revert all of this.

llvm-svn: 279225
2016-08-19 09:26:00 +00:00
Chandler Carruth
8fa994880c [PM] NFC refactoring: remove the AnalysisManagerBase class, folding it
into the AnalysisManager class template.

Back when I first added this base class there were separate analysis
managers and some plausible reason why it would be a useful factoring of
common code between them. However, after a lot of refactoring cleaning,
we now have *entirely* shared code. The base class was just an arbitrary
division between code in one class template and a separate class
template. It didn't add anything and forced lots of indirection through
"derived_this" for no real gain.

We can always factor a base CRTP class out with common code if there is
ever some *other* analysis manager that wants to share a subset of
logic. But for now, folding things into the primary template is
a non-trivial simplification with no down sides I see. It shortens the
code considerably, removes an unhelpful abstraction, and will make
subsequent patches *dramatically* less complex which enhance the
analysis manager infrastructure to effectively cope with invalidation.

llvm-svn: 279221
2016-08-19 08:31:47 +00:00
Chandler Carruth
9056498e66 [PM] Redesign how the new PM detects whether an analysis result provides
its own invalidate method.

Previously, the technique would assume that if a result didn't have an
invalidate method that didn't exactly match the expected signature it
didn't have one at all. This is in fact not the case. And we had
analyses with incorrect signatures for the invalidate method in the
tree that would be erroneously invalidated in certain cases! Yikes.

Moreover a result might legitimately want to have multiple overloads for
the invalidate method, and if one changes or a new one is needed we
again really want a compiler error. For example in the tree we had not
added the overload for a *function* IR unit to the invalidate routine
for TLI. Doh.

So a new techique for the SFINAE detection here: if the result has *any*
member spelled "invalidate" we turn off the synthesis of a default
version. We don't care if it is a member function or a member variable
or how many overloads there are. Once a result has something by that
name it must provide suitable overloads for the contexts in which it is
used. This seems much more resilient and durable.

Huge props to Richard Smith who helped me figure out how on earth we
could even do this in C++. It took quite some doing. The technique is
remarkably clean however, and merely requires that the analysis results
are not *final* classes. I think that's a requirement we can live with
even if it is a bit odd.

I've fixed the two bad in-tree analysis results. And this will make my
next change which changes the API for invalidate much easier to
validate as correct.

llvm-svn: 279217
2016-08-19 07:49:23 +00:00
Chandler Carruth
f68dd1e089 [PM] Rework the new PM support for building the ModuleSummaryIndex to
directly produce the index as the value type result.

This requires making the index movable which is straightforward. It
greatly simplifies things by allowing us to completely avoid the builder
API and the layers of abstraction inherent there. Instead both pass
managers can directly construct these when run by value. They still
won't be constructed truly eagerly thanks to the optional in the legacy
PM. The code that directly builds the index can also just share a direct
function.

A notable change here is that the result type of the analysis for the
new PM is no longer a reference type. This was really problematic when
making changes to how we handle result types to make our interface
requirements *much* more strict and precise. But I think this is an
overall improvement.

Differential Revision: https://reviews.llvm.org/D23701

llvm-svn: 279216
2016-08-19 07:49:19 +00:00
Wei Ding
ea4d7271dc AMDGPU : Fix QSAD and MQSAD instructions' incorrect data type.
Differential Revision: http://reviews.llvm.org/D23689

llvm-svn: 279126
2016-08-18 19:51:14 +00:00
Pete Cooper
61bc562db4 Add a version of Intrinsic::getName which is more efficient when there are no overloads.
When running 'opt -O2 verify-uselistorder-nodbg.lto.bc', there are 33m allocations.  8.2m
come from std::string allocations in Intrinsic::getName().  Turns out this method only
returns a std::string because it needs to handle overloads, but that is not the common case.

This adds an overload of getName which just returns a StringRef when there are no overloads
and so saves on the allocations.

llvm-svn: 279113
2016-08-18 18:30:54 +00:00
Valery Pykhtin
ef1d950dec [AMDGPU] add s_incperflevel/s_decperflevel intrinsics.
Differential revision: https://reviews.llvm.org/D23666

llvm-svn: 279106
2016-08-18 18:06:20 +00:00
Tim Northover
9c83e8d655 GlobalISel: support irtranslation of icmp instructions.
llvm-svn: 278969
2016-08-17 20:25:25 +00:00
Tim Shen
41fe248b16 [GenericDomTree] Change GenericDomTree to use NodeRef in GraphTraits. NFC.
Summary:
Looking at the implementation, GenericDomTree has more specific
requirements on NodeRef, e.g. NodeRefObject->getParent() should compile,
and NodeRef should be a pointer. We can remove the pointer requirement,
but it seems to have little gain, given the limited use cases.

Also changed GraphTraits<Inverse<Inverse<T>> to be more accurate.

Reviewers: dblaikie, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D23593

llvm-svn: 278961
2016-08-17 20:01:58 +00:00
Adrian Prantl
5d182b6cce Support the DW_AT_noreturn DWARF flag.
This is used to mark functions with the C++11 [[ noreturn ]] or C11 _Noreturn
attributes.

Patch by Victor Leschuk!

https://reviews.llvm.org/D23167

llvm-svn: 278940
2016-08-17 16:02:43 +00:00
Duncan P. N. Exon Smith
400dc78d11 Scalar: Avoid dereferencing end() in IndVarSimplify
IndVarSimplify::sinkUnusedInvariants calls
BasicBlock::getFirstInsertionPt on the ExitBlock and moves instructions
before it.  This can return end(), so it's not safe to dereference.  Add
an iterator-based overload to Instruction::moveBefore to avoid the UB.

llvm-svn: 278886
2016-08-17 01:54:41 +00:00
Guy Blank
6b866e2b9c [X86] Add xgetbv/xsetbv intrinsics to non-windows platforms
Differential Revision: https://reviews.llvm.org/D21958

llvm-svn: 278782
2016-08-16 06:41:00 +00:00
Tim Shen
e69d467a94 [ADT] Change PostOrderIterator to use NodeRef. NFC.
Reviewers: dblaikie

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23522

llvm-svn: 278752
2016-08-15 21:52:54 +00:00
Wolfgang Pieb
a19b457b8d Local variables whose address is taken and passed on to a call are described
in debug info using their stack slots instead of as an indirection of param reg + 0
offset. This is done by detecting FrameIndexSDNodes in SelectionDAG and generating
FrameIndexDbgValues for them. This ultimately generates DBG_VALUEs with stack
location operands.

Differential Revision: http://reviews.llvm.org/D23283

llvm-svn: 278703
2016-08-15 18:18:26 +00:00
Craig Topper
96f75b2219 [X86] Mark some of the X86 SDNodes as commutative.
llvm-svn: 278653
2016-08-15 04:47:30 +00:00
Mehdi Amini
f448ca43f7 Revert "Revert "Invariant start/end intrinsics overloaded for address space""
This reverts commit 32fc6488e48eafc0ca1bac1bd9cbf0008224d530.

llvm-svn: 278609
2016-08-13 23:31:24 +00:00
Mehdi Amini
2c3de3e169 Revert "Invariant start/end intrinsics overloaded for address space"
This reverts commit r276447.

llvm-svn: 278608
2016-08-13 23:27:32 +00:00
Pete Cooper
c40a63ea39 Add support to paternmatch for simple const Value cases.
Pattern match has some paths which can operate on constant instructions,
but not all.  This adds a version of m_value() to return const Value* and
changes ICmp matching to use auto so that it can match both constant and
mutable instructions.

Tests also included for both mutable and constant ICmpInst matching.

This will be used in a future commit to constify ValueTracking.cpp.

llvm-svn: 278570
2016-08-12 22:16:05 +00:00
Duncan P. N. Exon Smith
e2d742d768 ADT: Remove the ilist_nextprev_traits customization point
No one is using the capability to implement next and prev another way
(since lld stopped doing it in r278468).  Remove the customization point
by moving the API from ilist_nextprev_traits<T> to ilist_node_access.

The old traits class is still useful/necessary API as a target for
friends of node types that inherit privately from ilist_node.
Eventually I plan to either remove it entirely or move the template
parameters to the methods.

(Note: if there's desire to bring back customization of next/prev
pointers in the future (e.g., to pack some bits in there), I think a
traits class like this is an awkward way to accomplish it.  Instead, we
should change ilist<T> to be ilist<ilist_node<T>>, and give an extra
template parameter to ilist_node.)

llvm-svn: 278532
2016-08-12 17:32:34 +00:00
Duncan P. N. Exon Smith
2e8f72520b ADT: Share code for embedded sentinel traits, NFC
Share code for the (mostly problematic) embedded sentinel traits.
- Move the LLVM_NO_SANITIZE("object-size") attribute to
  ilist_half_embedded_sentinel_traits and ilist_embedded_sentinel_traits
  (previously it spread throughout the code duplication).
- Add an ilist_full_embedded_sentinel_traits which has no UB (but has
  the downside of storing the complete node).
- Replace all the custom sentinel traits in LLVM with a declaration of
  ilist_sentinel_traits that inherits from one of the embedded sentinel
  traits classes.

There are still custom sentinel traits in other LLVM subprojects.  I'll
remove those in a follow-up.

Nothing at all should be changing here, this is just rearranging code.
Note that the final goal here is to remove the sentinel traits
altogether, settling on the memory layout of
ilist_half_embedded_sentinel_traits without the UB.  This intermediate
step moves the logic into ilist.h.

llvm-svn: 278513
2016-08-12 15:00:55 +00:00
Gor Nishanov
37bbef4980 [Coroutines]: Part6b: Add coro.id intrinsic.
Summary:
1. Make coroutine representation more robust against optimization that may duplicate instruction by introducing coro.id intrinsics that returns a token that will get fed into coro.alloc and coro.begin. Due to coro.id returning a token, it won't get duplicated and can be used as reliable indicator of coroutine identify when a particular coroutine call gets inlined.
2. Move last three arguments of coro.begin into coro.id as they will be shared if coro.begin will get duplicated.
3. doc + test + code updated to support the new intrinsic.

Reviewers: mehdi_amini, majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D23412

llvm-svn: 278481
2016-08-12 05:45:49 +00:00
David Majnemer
319d420e44 Use the range variant of find/find_if instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

llvm-svn: 278469
2016-08-12 03:55:06 +00:00
Tim Shen
f3b1df95f7 [ADT] Migrate DepthFirstIterator to use NodeRef
Summary:
Notice that the data layout is changed: instead of using
std::pair<PointerIntPair<NodeType*, 1>, ChildItTy>, now use
std::pair<NodeRef, Optional<ChildItTy>>.

A NFC but worth noticing change is operator==(), since we only compare
an iterator against end(), it's better to put an assert there and make
people noticed when it fails.

Reviewers: dblaikie, chandlerc

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D23146

llvm-svn: 278437
2016-08-11 22:36:16 +00:00
David Majnemer
85242fb9f9 Use the range variant of find instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

llvm-svn: 278433
2016-08-11 22:21:41 +00:00
Piotr Padlewski
2ede910872 Don't import variadic functions
Summary:
This patch adds IsVariadicFunction bit to summary in order
to not import variadic functions. Inliner doesn't inline
variadic functions because it is hard to reason about it.

This one small fix improves Importer by about 16%
(going from 86% to 100% of imported functions that are
inlined anywhere)
on some spec benchmarks like 'int' and others.

Reviewers: eraman, mehdi_amini, tejohnson

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D23339

llvm-svn: 278432
2016-08-11 22:13:57 +00:00
Wei Ding
f99a9aad71 AMDGPU : Add intrinsic for instruction v_cvt_pk_u8_f32
Differential Revision: http://reviews.llvm.org/D23336

llvm-svn: 278403
2016-08-11 20:34:48 +00:00
Wei Ding
afed1ab282 AMDGPU : Add LLVM intrinsics for SAD related instructions.
Differential Revision: http://reviews.llvm.org/D23133

llvm-svn: 278354
2016-08-11 16:33:53 +00:00
Changpeng Fang
179532ade9 AMDGPU/SI: Implement amdgcn image intrinsics with sampler
Summary:
  This patch define and implement amdgcn image intrinsics with sampler.

    1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty,
       and rsrc type to be llvm_anyint_ty. As a result, we expect the intrinsics name
       to have three suffixes to overload each of these three types;

    2. D128 as well as two other flags are implied in the three types, for example,
       if you use v8i32 as resource type, then r128 is 0!

    3. don't expose TFE flag, and other flags are exposed in the instruction order:
       unrm, glc, slc, lwe and da.

Differential Revision: http://reviews.llvm.org/D22838

Reviewed by:
  arsenm and tstellarAMD

llvm-svn: 278291
2016-08-10 21:15:30 +00:00
Gor Nishanov
04cd924e8d [Coroutines] Part 6: Elide dynamic allocation of a coroutine frame when possible
Summary:
A particular coroutine usage pattern, where a coroutine is created, manipulated and
destroyed by the same calling function, is common for coroutines implementing
RAII idiom and is suitable for allocation elision optimization which avoid
dynamic allocation by storing the coroutine frame as a static `alloca` in its
caller.

coro.free and coro.alloc intrinsics are used to indicate which code needs to be suppressed
when dynamic allocation elision happens:
```
entry:
  %elide = call i8* @llvm.coro.alloc()
  %need.dyn.alloc = icmp ne i8* %elide, null
  br i1 %need.dyn.alloc, label %coro.begin, label %dyn.alloc
dyn.alloc:
  %alloc = call i8* @CustomAlloc(i32 4)
  br label %coro.begin
coro.begin:
  %phi = phi i8* [ %elide, %entry ], [ %alloc, %dyn.alloc ]
  %hdl = call i8* @llvm.coro.begin(i8* %phi, i32 0, i8* null,
                          i8* bitcast ([2 x void (%f.frame*)*]* @f.resumers to i8*))
```
and
```
  %mem = call i8* @llvm.coro.free(i8* %hdl)
  %need.dyn.free = icmp ne i8* %mem, null
  br i1 %need.dyn.free, label %dyn.free, label %if.end
dyn.free:
  call void @CustomFree(i8* %mem)
  br label %if.end
if.end:
  ...
```

If heap allocation elision is performed, we replace coro.alloc with a static alloca on the caller frame and coro.free with null constant.

Also, we need to make sure that if there are any tail calls referencing the coroutine frame, we need to remote tail call attribute, since now coroutine frame lives on the stack.

Documentation and overview is here: http://llvm.org/docs/Coroutines.html.

Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization (https://reviews.llvm.org/D23229)
5.Add CGSCC restart trigger + tests. (https://reviews.llvm.org/D23234)
6.Add coroutine heap elision + tests.  <= we are here
7.Add the rest of the logic (split into more patches)

Reviewers: mehdi_amini, majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D23245

llvm-svn: 278242
2016-08-10 16:40:39 +00:00
Sean Silva
11e71061b1 Consistently use FunctionAnalysisManager
Besides a general consistently benefit, the extra layer of indirection
allows the mechanical part of https://reviews.llvm.org/D23256 that
requires touching every transformation and analysis to be factored out
cleanly.

Thanks to David for the suggestion.

llvm-svn: 278077
2016-08-09 00:28:15 +00:00
Gor Nishanov
0346aed75a [Coroutines] Part 5: Add CGSCC restart trigger
Summary:
CoroSplit pass processes the coroutine twice. First, it lets it go through
complete IPO optimization pipeline as a single function. It forces restart
of the pipeline by inserting an indirect call to an empty function "coro.devirt.trigger"
which is devirtualized by CoroElide pass that triggers a restart of the pipeline by CGPassManager.
(In later patches, when CoroSplit pass sees the same coroutine the second time, it splits it up,
adds coroutine subfunctions to the SCC to be processed by IPO pipeline.)

Documentation and overview is here: http://llvm.org/docs/Coroutines.html.

Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization (https://reviews.llvm.org/D23229)
5.Add CGSCC restart trigger + tests. <= we are here
6.Add coroutine heap elision + tests.
7.Add the rest of the logic (split into more patches)

Reviewers: mehdi_amini, majnemer

Subscribers: llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D23234

llvm-svn: 277936
2016-08-06 20:44:39 +00:00
Sanjoy Das
aca3962ed4 [InstCombine] Don't coerce non-integral pointers to integers
Reviewers: majnemer

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D23231

llvm-svn: 277910
2016-08-06 02:58:48 +00:00
Gor Nishanov
d04999a10d Part 4c: Coroutine Devirtualization: Devirtualize coro.resume and coro.destroy.
Summary:
This is the 4c patch of the coroutine series. CoroElide pass now checks if PostSplit coro.begin
is referenced by coro.subfn.addr intrinsics. If so replace coro.subfn.addrs with an appropriate coroutine
subfunction associated with that coro.begin.

Documentation and overview is here: http://llvm.org/docs/Coroutines.html.

Upstreaming sequence (rough plan)
1.Add documentation. (https://reviews.llvm.org/D22603)
2.Add coroutine intrinsics. (https://reviews.llvm.org/D22659)
3.Add empty coroutine passes. (https://reviews.llvm.org/D22847)
4.Add coroutine devirtualization + tests.
ab) Lower coro.resume and coro.destroy (https://reviews.llvm.org/D22998)
c) Do devirtualization <= we are here
5.Add CGSCC restart trigger + tests.
6.Add coroutine heap elision + tests.
7.Add the rest of the logic (split into more patches)

Reviewers: majnemer

Subscribers: mehdi_amini, llvm-commits

Differential Revision: https://reviews.llvm.org/D23229

llvm-svn: 277908
2016-08-06 02:16:35 +00:00