Commit Graph

20378 Commits

Author SHA1 Message Date
Michael Zolotukhin
1c3340dd78 Revert r206749 till a final decision about the intrinsics is made.
llvm-svn: 207313
2014-04-26 09:56:41 +00:00
Chandler Carruth
864b47743f [LCG] Rather than removing nodes from the SCC entry set when we process
them, just skip over any DFS-numbered nodes when finding the next root
of a DFS. This allows the entry set to just be a vector as we populate
it from a uniqued source. It also removes the possibility for a linear
scan of the entry set to actually do the removal which can make things
go quadratic if we get unlucky.

llvm-svn: 207312
2014-04-26 09:45:55 +00:00
Chandler Carruth
3a16e3f5fa [LCG] Hoist the main DFS loop out of the edge removal function. This
makes working through the worklist much cleaner, and makes it possible
to avoid the 'bool-to-continue-the-outer-loop' hack. Not a huge
difference, but I think this is approaching as polished as I can make
it.

llvm-svn: 207310
2014-04-26 09:06:53 +00:00
Chandler Carruth
87d8609624 [LCG] In the incremental SCC re-formation, lift the node currently being
processed in the DFS out of the stack completely. Keep it exclusively in
a variable. Re-shuffle some code structure to make this easier. This can
have a very dramatic effect in some cases because call graphs tend to
look like a high fan-out spanning tree. As a consequence, there are
a large number of leaf nodes in the graph, and this technique causes
leaf nodes to never even go into the stack. While this only reduces the
max depth by 1, it may cause the total number of round trips through the
stack to drop by a lot.

Now, most of this isn't really relevant for the incremental version. =]
But I wanted to prototype it first here as this variant is in ways more
complex. As long as I can get the code factored well here, I'll next
make the primary walk look the same. There are several refactorings this
exposes I think.

llvm-svn: 207306
2014-04-26 03:36:42 +00:00
Chandler Carruth
0e388582e5 [LCG] Refactor the duplicated code I added in my last commit here into
a helper function. Also factor the other two places where we did the
same thing into the helper function. =] Much cleaner this way. NFC.

llvm-svn: 207300
2014-04-26 01:03:46 +00:00
Duncan P. N. Exon Smith
c54b3a7e23 Revert "blockfreq: Approximate irreducible control flow"
This reverts commit r207286.  It causes an ICE on the
cmake-llvm-x86_64-linux buildbot [1]:

    llvm/lib/Analysis/BlockFrequencyInfo.cpp: In lambda function:
    llvm/lib/Analysis/BlockFrequencyInfo.cpp:182:1: internal compiler error: in get_expr_operands, at tree-ssa-operands.c:1035

[1]: http://bb.pgr.jp/builders/cmake-llvm-x86_64-linux/builds/12093/steps/build_llvm/logs/stdio

llvm-svn: 207287
2014-04-25 23:16:58 +00:00
Duncan P. N. Exon Smith
3189616c35 blockfreq: Approximate irreducible control flow
Previously, irreducible backedges were ignored.  With this commit,
irreducible SCCs are discovered on the fly, and modelled as loops with
multiple headers.

This approximation specifies the headers of irreducible sub-SCCs as its
entry blocks and all nodes that are targets of a backedge within it
(excluding backedges within true sub-loops).  Block frequency
calculations act as if we insert a new block that intercepts all the
edges to the headers.  All backedges and entries to the irreducible SCC
point to this imaginary block.  This imaginary block has an edge (with
even probability) to each header block.

The result is now reasonable enough that I've added a number of
testcases for irreducible control flow.  I've outlined in
`BlockFrequencyInfoImpl.h` ways to improve the approximation.

<rdar://problem/14292693>

llvm-svn: 207286
2014-04-25 23:08:57 +00:00
Tom Roeder
a8ef44c3ed Add an -mattr option to the gold plugin to support subtarget features in LTO
This adds support for an -mattr option to the gold plugin and to llvm-lto. This
allows the caller to specify details of the subtarget architecture, like +aes,
or +ssse3 on x86.  Note that this requires a change to the include/llvm-c/lto.h
interface: it adds a function lto_codegen_set_attr and it increments the
version of the interface.

llvm-svn: 207279
2014-04-25 21:46:51 +00:00
Duncan P. N. Exon Smith
a63154f533 SCC: Use the reference typedef
Actually use the `reference` typedef, and remove the private
redefinition of `pointer` since it has no users.

Using `reference` exposes a problem with r207257, which specified the
wrong `value_type` to `iterator_facade_base` (fixed that too).

llvm-svn: 207270
2014-04-25 20:52:08 +00:00
Adrian Prantl
7566e72bb8 This reapplies r207235 with an additional bugfixes caught by the msan
buildbot - do not insert debug intrinsics before phi nodes.

Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.

Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.

This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source

rdar://problem/16679879
http://reviews.llvm.org/D3374

llvm-svn: 207269
2014-04-25 20:49:25 +00:00
David Blaikie
7dd77bc4d4 MCAssembler: Simplify implementation of const variants of getSymbolData by calling one implementation from the other.
Code review feedback by Rafael Espindola on r207124.

llvm-svn: 207266
2014-04-25 20:19:11 +00:00
Duncan P. N. Exon Smith
0f4795de58 blockfreq: Further shift logic to LoopData
Move a lot of the loop-related logic that was sprinkled around the code
into `LoopData`.

<rdar://problem/14292693>

llvm-svn: 207258
2014-04-25 18:47:04 +00:00
Duncan P. N. Exon Smith
50ec4458de SCC: Provide operator->() through iterator_facade_base
Use the fancy new `iterator_facade_base` to add
`scc_iterator::operator->()`.  Remove other definitions where
`iterator_facade_base` does the right thing.

<rdar://problem/14292693>

llvm-svn: 207257
2014-04-25 18:43:41 +00:00
Duncan P. N. Exon Smith
e361bbbc0f SCC: Remove non-const operator*()
<rdar://problem/14292693>

llvm-svn: 207254
2014-04-25 18:26:45 +00:00
Duncan P. N. Exon Smith
a904a21581 SCC: Doxygen-ize comments, NFC
<rdar://problem/14292693>

llvm-svn: 207251
2014-04-25 18:18:46 +00:00
Adrian Prantl
319db7c542 Revert "This reapplies r207130 with an additional testcase+and a missing check for"
This reverts commit 207235 to investigate msan buildbot breakage.

llvm-svn: 207250
2014-04-25 18:18:09 +00:00
Duncan P. N. Exon Smith
41e5485e53 SCC: Un-inline long functions
These are long functions that really shouldn't be inlined.  Otherwise,
no functionality change.

<rdar://problem/14292693>

llvm-svn: 207249
2014-04-25 18:15:50 +00:00
Duncan P. N. Exon Smith
7e5c9c2a03 SCC: Remove redundant inline keywords, NFC
Functions declared in line in a class are inlined by default.  There's
no reason for the `inline` keyword.

<rdar://problem/14292693>

llvm-svn: 207248
2014-04-25 18:10:23 +00:00
Saleem Abdulrasool
ee8639eae1 ARM: remove @llvm.arm.sevl
This intrinsic is no longer needed with the new @llvm.arm.hint(i32) intrinsic
which provides a generic, extensible manner for adding hint instructions.  This
functionality can now be represented as @llvm.arm.hint(i32 5).

llvm-svn: 207246
2014-04-25 17:51:25 +00:00
Saleem Abdulrasool
710c823515 ARM: provide a new generic hint intrinsic
Introduce the llvm.arm.hint(i32) intrinsic that can be used to inject hints into
the instruction stream. This is particularly useful for generating IR from a
compiler where the user may inject an intrinsic (e.g. __yield). These are then
pattern substituted into the correct instruction which already existed.

llvm-svn: 207242
2014-04-25 17:24:24 +00:00
Adrian Prantl
7f9d1e9fd6 This reapplies r207130 with an additional testcase+and a missing check for
AllocaInst that was missing in one location.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.

Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.

This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source

rdar://problem/16679879
http://reviews.llvm.org/D3374

llvm-svn: 207235
2014-04-25 17:01:00 +00:00
Craig Topper
c0a2a29f4e [C++] Use 'nullptr'. Transforms edition.
llvm-svn: 207196
2014-04-25 05:29:35 +00:00
Duncan P. N. Exon Smith
6117699d7e blockfreq: Only one mass distribution per node
Remove the concepts of "forward" and "general" mass distributions, which
was wrong.  The split might have made sense in an early version of the
algorithm, but it's definitely wrong now.

<rdar://problem/14292693>

llvm-svn: 207195
2014-04-25 04:38:43 +00:00
Duncan P. N. Exon Smith
20240d1036 blockfreq: Document high-level functions
<rdar://problem/14292693>

llvm-svn: 207191
2014-04-25 04:38:32 +00:00
Duncan P. N. Exon Smith
73e998dc93 blockfreq: Remove dead code
<rdar://problem/14292693>

llvm-svn: 207190
2014-04-25 04:38:30 +00:00
Duncan P. N. Exon Smith
20bb8bb185 blockfreq: Separate unwrapLoops() from finalizeMetrics()
<rdar://problem/14292693>

llvm-svn: 207185
2014-04-25 04:38:17 +00:00
Duncan P. N. Exon Smith
9f46ab15a9 blockfreq: LoopData::MemberList => NodeList
<rdar://problem/14292693>

llvm-svn: 207184
2014-04-25 04:38:15 +00:00
Duncan P. N. Exon Smith
bed52c2a81 blockfreq: Expose getPackagedNode()
Make `getPackagedNode()` a member function of
`BlockFrequencyInfoImplBase` so that it's available for templated code.

<rdar://problem/14292693>

llvm-svn: 207183
2014-04-25 04:38:12 +00:00
Duncan P. N. Exon Smith
b9744991af blockfreq: Store the header with the members
<rdar://problem/14292693>

llvm-svn: 207182
2014-04-25 04:38:09 +00:00
Duncan P. N. Exon Smith
2a2a2175c7 blockfreq: Encapsulate LoopData::Header
<rdar://problem/14292693>

llvm-svn: 207181
2014-04-25 04:38:06 +00:00
Duncan P. N. Exon Smith
d60094b215 blockfreq: Embed Loop hierarchy in LoopData
Continue refactoring to make `LoopData` first-class.  Here I'm making
the `LoopData` hierarchy explicit, instead of bouncing back and forth
with `WorkingData`.  This simplifies the logic and better matches the
`LoopInfo` design.  (Eventually, `LoopInfo` should be restructured so
that it supports this pass, and `LoopData` can be removed.)

<rdar://problem/14292693>

llvm-svn: 207180
2014-04-25 04:38:03 +00:00
Duncan P. N. Exon Smith
e670959c2a blockfreq: Use LoopData directly
Instead of passing around loop headers, pass around `LoopData` directly.

<rdar://problem/14292693>

llvm-svn: 207179
2014-04-25 04:38:01 +00:00
Duncan P. N. Exon Smith
9ade70452a blockfreq: Stop using range-based for to traverse Loops
A follow-up commit will need the actual iterators.

<rdar://problem/14292693>

llvm-svn: 207178
2014-04-25 04:37:58 +00:00
Duncan P. N. Exon Smith
0b338fa955 blockfreq: Use a std::list for Loops
As pointed out by David Blaikie in code review, a `std::list<T>` is
simpler than a `std::vector<std::unique_ptr<T>>`.  Another option is a
`std::deque<T>` (which allocates in chunks), but I'd like to leave open
the option of inserting in the middle of the sequence for handling
irreducible control flow on the fly.

<rdar://problem/14292693>

llvm-svn: 207177
2014-04-25 04:30:06 +00:00
Karthik Bhat
a6070a9b75 Allow vectorization of bit intrinsics in BB Vectorizer.
This patch adds support for vectorization of  bit intrinsics such as bswap,ctpop,ctlz,cttz.

llvm-svn: 207174
2014-04-25 03:33:48 +00:00
Adrian Prantl
0338f80f17 Revert "This reapplies r207130 with an additional testcase+and a missing check for"
Typo in testcase.

llvm-svn: 207166
2014-04-25 00:42:50 +00:00
Adrian Prantl
bf019d19e9 This reapplies r207130 with an additional testcase+and a missing check for
AllocaInst that was missing in one location.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.

Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.

This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source

rdar://problem/16679879
http://reviews.llvm.org/D3374

llvm-svn: 207165
2014-04-25 00:38:40 +00:00
Adrian Prantl
0b669e8f79 Revert "Debug info for optimized code: Support variables that are on the stack and"
This reverts commit 207130 for buildbot breakage.

llvm-svn: 207162
2014-04-25 00:04:49 +00:00
Richard Smith
dbf1078568 Add missing include, found by modules build.
llvm-svn: 207158
2014-04-24 23:29:25 +00:00
Richard Smith
a8439916a2 Function defined in a header should be inline. Found by modules build.
llvm-svn: 207157
2014-04-24 23:14:32 +00:00
Chandler Carruth
aa2ff5e59f [ADT] Generalize pointee_iterator to smart pointers by using decltype.
Based on review feedback from Dave on the original patch.

llvm-svn: 207146
2014-04-24 21:10:35 +00:00
Reid Kleckner
f2a671c79a Remove dead inline function that doesn't compile
MSVC doesn't diagnose this, interestingly.

llvm-svn: 207144
2014-04-24 20:19:22 +00:00
Reid Kleckner
e7e2ccb9e9 Add 'musttail' marker to call instructions
This is similar to the 'tail' marker, except that it guarantees that
tail call optimization will occur.  It also comes with convervative IR
verification rules that ensure that tail call optimization is possible.

Reviewers: nicholas

Differential Revision: http://llvm-reviews.chandlerc.com/D3240

llvm-svn: 207143
2014-04-24 20:14:34 +00:00
Richard Smith
a49b5ce5a2 [modules] "Specialize" a function by actually specializing a function template
rather than by adding an overload and hoping that it's declared before the code
that calls it. (In a modules build, it isn't.)

llvm-svn: 207133
2014-04-24 18:27:29 +00:00
Adrian Prantl
807e5d8a9a Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.

Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.

This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine-intrinsics testcase and included source


rdar://problem/16679879
http://reviews.llvm.org/D3374

llvm-svn: 207130
2014-04-24 17:41:45 +00:00
Andrea Di Biagio
dae3a5b91a [X86] Add support for Read Time Stamp Counter x86 builtin intrinsics.
This patch:
- Adds two new X86 builtin intrinsics ('int_x86_rdtsc' and
   'int_x86_rdtscp') as GCCBuiltin intrinsics;
- Teaches the backend how to lower the two new builtins;
- Introduces a common function to lower READCYCLECOUNTER dag nodes
  and the two new rdtsc/rdtscp intrinsics;
- Improves (and extends) the existing x86 test 'rdtsc.ll'; now test 'rdtsc.ll'
  correctly verifies that both READCYCLECOUNTER and the two new intrinsics
  work fine for both 64bit and 32bit Subtargets.

llvm-svn: 207127
2014-04-24 17:18:27 +00:00
David Blaikie
52ea9cc073 Spread some const around for non-mutating uses of MCSymbolData.
I discovered this const-hole while attempting to coalesnce the Symbol
and SymbolMap data structures. There's some pending issues with that,
but I figured this change was easy to flush early.

llvm-svn: 207124
2014-04-24 16:59:40 +00:00
Chandler Carruth
9e4513f082 [LCG] Incorporate the core trick of improvements on the naive Tarjan's
algorithm here: http://dl.acm.org/citation.cfm?id=177301.

The idea of isolating the roots has even more relevance when using the
stack not just to implement the DFS but also to implement the recursive
step. Because we use it for the recursive step, to isolate the roots we
need to maintain two stacks: one for our recursive DFS walk, and another
of the nodes that have been walked. The nice thing is that the latter
will be half the size. It also fixes a complete hack where we scanned
backwards over the stack to find the next potential-root to continue
processing. Now that is always the top of the DFS stack.

While this is a really nice improvement already (IMO) it further opens
the door for two important simplifications:

1) De-duplicating some of the code across the two different walks. I've
   actually made the duplication a bit worse in some senses with this
   patch because the two are starting to converge.
2) Dramatically simplifying the loop structures of both walks.

I wanted to do those separately as they'll be essentially *just* CFG
restructuring. This patch on the other hand actually uses different
datastructures to implement the algorithm itself.

llvm-svn: 207098
2014-04-24 11:05:20 +00:00
Chandler Carruth
cd39f4c2e6 [LCG] Switch the parent SCC tracking from a SmallSetVector to
a SmallPtrSet. Currently, there is no need for stable iteration in this
dimension, and I now thing there won't need to be going forward.

If this is ever re-introduced in any form, it needs to not be
a SetVector based solution because removal cannot be linear. There will
be many SCCs with large numbers of parents. When encountering these, the
incremental SCC update for intra-SCC edge removal was quadratic due to
linear removal (kind of).

I'm really hoping we can avoid having an ordering property here at all
though...

llvm-svn: 207091
2014-04-24 09:22:31 +00:00
Chandler Carruth
ccccef94ac [LCG] We don't actually need a set in each SCC to track the nodes. We
can use the node -> SCC mapping in the top-level graph to test this on
the rare occasions we need it.

llvm-svn: 207090
2014-04-24 08:55:36 +00:00