Commit Graph

526 Commits

Author SHA1 Message Date
David Blaikie
60e6c80905 Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
Hans Wennborg
aaf76f8d38 SimplifyCFG: Range'ify some for-loops. No functional change.
llvm-svn: 222215
2014-11-18 02:37:11 +00:00
Juergen Ributzka
05cff0a244 [SimplifyCFG] Make the value type of the hole check bitmask a power-of-2.
When converting a switch to a lookup table we might have to generate a bitmaks
to encode and check for holes in the original switch statement.

The type of this mask depends on the number of switch statements, which can
result in illegal types for pretty much all architectures.

To avoid unnecessary type legalization and help FastISel this commit increases
the size of the bitmask to next power-of-2 value when necessary.

This fixes rdar://problem/18984639.

llvm-svn: 222168
2014-11-17 19:39:56 +00:00
Erik Eckstein
49292b906e Optimize switch lookup tables with linear mapping.
This is a simple optimization for switch table lookup:
It computes the output value directly with an (optional) mul and add if there is a linear mapping between index and output.
Example:

int f1(int x) {
  switch (x) {
    case 0: return 10;
    case 1: return 11;
    case 2: return 12;
    case 3: return 13;
  }
  return 0;
}

generates:

define i32 @f1(i32 %x) #0 {
entry:
  %0 = icmp ult i32 %x, 4
  br i1 %0, label %switch.lookup, label %return

switch.lookup:
  %switch.offset = add i32 %x, 10
  ret i32 %switch.offset

return:
  ret i32 0
}

llvm-svn: 222121
2014-11-17 09:13:57 +00:00
Duncan P. N. Exon Smith
8770505e4e Revert "IR: MDNode => Value"
Instead, we're going to separate metadata from the Value hierarchy.  See
PR21532.

This reverts commit r221375.
This reverts commit r221373.
This reverts commit r221359.
This reverts commit r221167.
This reverts commit r221027.
This reverts commit r221024.
This reverts commit r221023.
This reverts commit r220995.
This reverts commit r220994.

llvm-svn: 221711
2014-11-11 21:30:22 +00:00
Duncan P. N. Exon Smith
7004fd9aac IR: MDNode => Value: Instruction::getMetadata()
Change `Instruction::getMetadata()` to return `Value` as part of
PR21433.

Update most callers to use `Instruction::getMDNode()`, which wraps the
result in a `cast_or_null<MDNode>`.

llvm-svn: 221024
2014-11-01 00:10:31 +00:00
Philip Reames
9ac0ad927c Preserving 'nonnull' metadata in SimplifyCFG
When we hoist two loads above an if, we can preserve the nonnull metadata.  We could also do the same for sinking them, but we appear to not handle metadata at all in that case.

Thanks to Hal for the review.

Differential Revision: http://reviews.llvm.org/D5910

llvm-svn: 220392
2014-10-22 16:37:13 +00:00
Marcello Maggioni
ba92818a1d Switch to select optimization for two-case switches
This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block
and a test to check that this condition is handled.

llvm-svn: 219656
2014-10-14 01:58:26 +00:00
Joerg Sonnenberger
a033fe29e0 Revert r219223, it creates invalid PHI nodes.
llvm-svn: 219587
2014-10-12 17:16:04 +00:00
Arnold Schwaighofer
0d2a656327 SimplifyCFG: Don't convert phis into selects if we could remove undef behavior
instead

We used to transform this:

  define void @test6(i1 %cond, i8* %ptr) {
  entry:
    br i1 %cond, label %bb1, label %bb2

  bb1:
    br label %bb2

  bb2:
    %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ]
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

into this:

  define void @test6(i1 %cond, i8* %ptr) {
    %ptr.2 = select i1 %cond, i8* null, i8* %ptr
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

because the simplifycfg transformation into selects would happen to happen
before the simplifycfg transformation that removes unreachable control flow
(We have 'unreachable control flow' due to the store to null which is undefined
behavior).

The existing transformation that removes unreachable control flow in simplifycfg
is:

  /// If BB has an incoming value that will always trigger undefined behavior
  /// (eg. null pointer dereference), remove the branch leading here.
  static bool removeUndefIntroducingPredecessor(BasicBlock *BB)

Now we generate:

  define void @test6(i1 %cond, i8* %ptr) {
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

I did not see any impact on the test-suite + externals.

rdar://18596215

llvm-svn: 219462
2014-10-10 01:27:02 +00:00
Marcello Maggioni
8c63dce0d7 Two case switch to select optimization
This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block
to a select or a couple of selects (depending if the default block is reachable or not).

The typical case this optimization wants to be able to optimize is this one:

Example:
switch (a) {
  case 10:                %0 = icmp eq i32 %a, 10
    return 10;            %1 = select i1 %0, i32 10, i32 4
  case 20:        ---->   %2 = icmp eq i32 %a, 20
    return 2;             %3 = select i1 %2, i32 2, i32 %1
  default:
    return 4;
}

It also sets the base for further optimizations that are planned and being reviewed.

llvm-svn: 219223
2014-10-07 18:16:44 +00:00
Jingyue Wu
908e4ab376 [SimplifyCFG] threshold for folding branches with common destination
Summary:
This patch adds a threshold that controls the number of bonus instructions
allowed for folding branches with common destination. The original code allows
at most one bonus instruction. With this patch, users can customize the
threshold to allow multiple bonus instructions. The default threshold is still
1, so that the code behaves the same as before when users do not specify this
threshold.

The motivation of this change is that tuning this threshold significantly (up
to 25%) improves the performance of some CUDA programs in our internal code
base. In general, branch instructions are very expensive for GPU programs.
Therefore, it is sometimes worth trading more arithmetic computation for a more
straightened control flow. Here's a reduced example:

  __global__ void foo(int a, int b, int c, int d, int e, int n,
                      const int *input, int *output) {
    int sum = 0;
    for (int i = 0; i < n; ++i)
      sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
    *output = sum;
  }

The select statement in the loop body translates to two branch instructions "if
((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
With the default threshold, SimplifyCFG is unable to fold them, because
computing the condition of the second branch "(i | c) ^ d > e" requires two
bonus instructions. With the threshold increased, SimplifyCFG can fold the two
branches so that the loop body contains only one branch, making the code
conceptually look like:

  sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];

Increasing the threshold significantly improves the performance of this
particular example. In the configuration where both conditions are guaranteed
to be true, increasing the threshold from 1 to 2 improves the performance by
18.24%. Even in the configuration where the first condition is false and the
second condition is true, which favors shortcuts, increasing the threshold from
1 to 2 still improves the performance by 4.35%.

We are still looking for a good threshold and maybe a better cost model than
just counting the number of bonus instructions. However, according to the above
numbers, we think it is at least worth adding a threshold to enable more
experiments and tuning. Let me know what you think. Thanks!

Test Plan: Added one test case to check the threshold is in effect

Reviewers: nadav, eliben, meheff, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D5529

llvm-svn: 218711
2014-09-30 22:23:38 +00:00
Jingyue Wu
50ea4ff207 Remove dead code in SimplifyCFG
Summary: UsedByBranch is always true according to how BonusInst is defined.

Test Plan:
Passes check-all, and also verified 

if (BonusInst && !UsedByBranch) {
  ...
}

is never entered during check-all.

Reviewers: resistor, nadav, jingyue

Reviewed By: jingyue

Subscribers: llvm-commits, eliben, meheff

Differential Revision: http://reviews.llvm.org/D5324

llvm-svn: 217824
2014-09-15 20:48:13 +00:00
Hal Finkel
f8bb9b78cf Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.)
This change, which allows @llvm.assume to be used from within computeKnownBits
(and other associated functions in ValueTracking), adds some (optional)
parameters to computeKnownBits and friends. These functions now (optionally)
take a "context" instruction pointer, an AssumptionTracker pointer, and also a
DomTree pointer, and most of the changes are just to pass this new information
when it is easily available from InstSimplify, InstCombine, etc.

As explained below, the significant conceptual change is that known properties
of a value might depend on the control-flow location of the use (because we
care that the @llvm.assume dominates the use because assumptions have
control-flow dependencies). This means that, when we ask if bits are known in a
value, we might get different answers for different uses.

The significant changes are all in ValueTracking. Two main changes: First, as
with the rest of the code, new parameters need to be passed around. To make
this easier, I grouped them into a structure, and I made internal static
versions of the relevant functions that take this structure as a parameter. The
new code does as you might expect, it looks for @llvm.assume calls that make
use of the value we're trying to learn something about (often indirectly),
attempts to pattern match that expression, and uses the result if successful.
By making use of the AssumptionTracker, the process of finding @llvm.assume
calls is not expensive.

Part of the structure being passed around inside ValueTracking is a set of
already-considered @llvm.assume calls. This is to prevent a query using, for
example, the assume(a == b), to recurse on itself. The context and DT params
are used to find applicable assumptions. An assumption needs to dominate the
context instruction, or come after it deterministically. In this latter case we
only handle the specific case where both the assumption and the context
instruction are in the same block, and we need to exclude assumptions from
being used to simplify their own ephemeral values (those which contribute only
to the assumption) because otherwise the assumption would prove its feeding
comparison trivial and would be removed.

This commit adds the plumbing and the logic for a simple masked-bit propagation
(just enough to write a regression test). Future commits add more patterns
(and, correspondingly, more regression tests).

llvm-svn: 217342
2014-09-07 18:57:58 +00:00
Craig Topper
65775cc03d Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size.
llvm-svn: 216158
2014-08-21 05:55:13 +00:00
Craig Topper
aa7422b5a6 Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size."
Getting a weird buildbot failure that I need to investigate.

llvm-svn: 215870
2014-08-18 00:24:38 +00:00
Craig Topper
227456e133 Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size.
llvm-svn: 215868
2014-08-17 23:47:00 +00:00
Rafael Espindola
f8bee1313e Introduce a helper to combine instruction metadata.
Replace the old code in GVN and BBVectorize with it. Update SimplifyCFG to use
it.

Patch by Björn Steinbrink!

llvm-svn: 215723
2014-08-15 15:46:38 +00:00
Manman Ren
97601eacaa [SimplifyCFG] fix accessing deleted PHINodes in switch-to-table conversion.
When we have a covered lookup table, make sure we don't delete PHINodes that
are cached in PHIs.

rdar://17887153

llvm-svn: 214642
2014-08-02 23:41:54 +00:00
Rafael Espindola
d0d1fa9454 SimplifyCFG: Avoid miscompilations due to removed lifetime intrinsics.
The lifetime intrinsics need some work in order to make it clear which
optimizations are or are not valid.

For now dropping this optimization avoids a miscompilation.

Patch by Björn Steinbrink.

llvm-svn: 214336
2014-07-30 21:04:00 +00:00
Manman Ren
ab91a340c5 Feedback from Hans on r213815. No functionaility change.
llvm-svn: 213895
2014-07-24 21:13:20 +00:00
Aaron Ballman
995ef083ff Fixing an MSVC conversion warning about implicitly converting the shift results to 64-bits. No functional change intended.
llvm-svn: 213863
2014-07-24 14:24:59 +00:00
Manman Ren
6e6d75695d SimplifyCFG: fix a bug in switch to table conversion
We use gep to access the global array "switch.table", and the table index
should be treated as unsigned. When the highest bit is 1, this commit
zero-extends the index to an integer type with larger size.

For a switch on i2, we used to generate:
%switch.tableidx = sub i2 %0, -2
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx

It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate
%switch.tableidx = sub i2 %0, -2
%switch.tableidx.zext = zext i2 %switch.tableidx to i3
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext

rdar://17735071

llvm-svn: 213815
2014-07-23 23:13:23 +00:00
Duncan P. N. Exon Smith
2ae51d315c Revert "[C++11] Add predecessors(BasicBlock *) / successors(BasicBlock *) iterator ranges."
This reverts commit r213474 (and r213475), which causes a miscompile on
a stage2 LTO build.  I'll reply on the list in a moment.

llvm-svn: 213562
2014-07-21 17:06:51 +00:00
Manuel Jacob
8e924ddc40 [C++11] Add predecessors(BasicBlock *) / successors(BasicBlock *) iterator ranges.
Summary: This patch introduces two new iterator ranges and updates existing code to use it.  No functional change intended.

Test Plan: All tests (make check-all) still pass.

Reviewers: dblaikie

Reviewed By: dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D4481

llvm-svn: 213474
2014-07-20 09:10:11 +00:00
Hal Finkel
661274e401 Feeding isSafeToSpeculativelyExecute its DataLayout pointer
isSafeToSpeculativelyExecute can optionally take a DataLayout pointer. In the
past, this was mainly used to make better decisions regarding divisions known
not to trap, and so was not all that important for users concerned with "cheap"
instructions. However, now it also helps look through bitcasts for
dereferencable loads, and will also be important if/when we add a
dereferencable pointer attribute.

This is some initial work to feed a DataLayout pointer through to callers of
isSafeToSpeculativelyExecute, generally where one was already available.

llvm-svn: 212720
2014-07-10 14:41:31 +00:00
Sanjay Patel
e29d90184d Fix for PR17073 ( http://llvm.org/pr17073 ), simplifycfg illegally hoists an operation in a phi node that can trap.
This patch adds to an existing loop over phi nodes in SimplifyCondBranchToCondBranch() to check for trapping ops and bails out of the optimization if we find one of those.

The test cases verify that trapping ops are not hoisted and non-trapping ops are still optimized as expected.

llvm-svn: 212490
2014-07-07 21:19:00 +00:00
Sanjay Patel
81fb1ce2eb fixed some typos in comments
llvm-svn: 212423
2014-07-06 23:10:24 +00:00
Marcello Maggioni
f4e8a4110d Minor stylistic fix in SimplifyCFG (test commit)
llvm-svn: 212259
2014-07-03 08:29:06 +00:00
Hans Wennborg
0b74da6314 Don't build switch tables for dllimport and TLS variables in GEPs
This is a follow-up to r211331, which failed to notice that we were
returning early from ValidLookupTableConstant for GEPs.

llvm-svn: 211753
2014-06-26 00:30:52 +00:00
Hans Wennborg
3869faa1a3 Don't build switch lookup tables for dllimport or TLS variables
We would previously put dllimport variables in switch lookup tables, which
doesn't work because the address cannot be used in a constant initializer.
This is basically the same problem that we have in PR19955.

Putting TLS variables in switch tables also desn't work, because the
address of such a variable is not constant.

Differential Revision: http://reviews.llvm.org/D4220

llvm-svn: 211331
2014-06-20 00:38:12 +00:00
Matt Arsenault
09fc34b10a Make bitcast, extractelement, and insertelement considered cheap for speculation.
This helps more branches into selects. On R600,
vectors are cheap and anything that helps
remove branches is very good.

llvm-svn: 209914
2014-05-30 18:34:43 +00:00
Jay Foad
e0eac700cb Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

llvm-svn: 208811
2014-05-14 21:14:37 +00:00
Louis Gerbarg
011e17086c Add ExtractValue instruction to SimplifyCFG's ComputeSpeculationCost
Since ExtractValue is not included in ComputeSpeculationCost CFGs containing
ExtractValueInsts cannot be simplified. In particular this interacts with
InstCombineCompare's tendency to insert add.with.overflow intrinsics for
certain idiomatic math operations, preventing optimization.

This patch adds ExtractValue to the ComputeSpeculationCost. Test case included

rdar://14853450

llvm-svn: 208434
2014-05-09 17:02:46 +00:00
Craig Topper
c0a2a29f4e [C++] Use 'nullptr'. Transforms edition.
llvm-svn: 207196
2014-04-25 05:29:35 +00:00
Chandler Carruth
6f9ba6a633 [Modules] Fix potential ODR violations by sinking the DEBUG_TYPE
definition below all of the header #include lines, lib/Transforms/...
edition.

This one is tricky for two reasons. We again have a couple of passes
that define something else before the includes as well. I've sunk their
name macros with the DEBUG_TYPE.

Also, InstCombine contains headers that need DEBUG_TYPE, so now those
headers #define and #undef DEBUG_TYPE around their code, leaving them
well formed modular headers. Fixing these headers was a large motivation
for all of these changes, as "leaky" macros of this form are hard on the
modules implementation.

llvm-svn: 206844
2014-04-22 02:55:47 +00:00
Hans Wennborg
a2aafbb7b5 Allow switch-to-lookup table for tables with holes by adding bitmask check
This allows us to generate table lookups for code such as:

  unsigned test(unsigned x) {
    switch (x) {
      case 100: return 0;
      case 101: return 1;
      case 103: return 2;
      case 105: return 3;
      case 107: return 4;
      case 109: return 5;
      case 110: return 6;
      default: return f(x);
    }
  }

Since cases 102, 104, etc. are not constants, the lookup table has holes
in those positions. We therefore guard the table lookup with a bitmask check.

Patch by Jasper Neumann!

llvm-svn: 203694
2014-03-12 18:35:40 +00:00
Benjamin Kramer
488ab03435 SimplifyCFG: Simplify the weight scaling algorithm.
No change in functionality.

llvm-svn: 203413
2014-03-09 14:42:55 +00:00
Chandler Carruth
fad39ebe19 [C++11] Add range based accessors for the Use-Def chain of a Value.
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
   detail
2) Change it to actually be a *Use* iterator rather than a *User*
   iterator.
3) Add an adaptor which is a User iterator that always looks through the
   Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
   they wanted a use_iterator (and to explicitly dig out the User when
   needed), or a user_iterator which makes the Use itself totally
   opaque.

Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.

The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.

However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]

llvm-svn: 203364
2014-03-09 03:16:01 +00:00
Chandler Carruth
436597fe00 [Modules] Move the ConstantRange class into the IR library. This is
a bit surprising, as the class is almost entirely abstracted away from
any particular IR, however it encodes the comparsion predicates which
mutate ranges as ICmp predicate codes. This is reasonable as they're
used for both instructions and constants. Thus, it belongs in the IR
library with instructions and constants.

llvm-svn: 202838
2014-03-04 12:24:34 +00:00
Chandler Carruth
248195469c [Modules] Move the NoFolder into the IR library as it creates
instructions.

llvm-svn: 202834
2014-03-04 12:05:47 +00:00
Chandler Carruth
075812f27c [Modules] Move CFG.h to the IR library as it defines graph traits over
IR types.

llvm-svn: 202827
2014-03-04 11:45:46 +00:00
Chandler Carruth
d0657fe39f [Modules] Move the LLVM IR pattern match header into the IR library, it
obviously is coupled to the IR.

llvm-svn: 202818
2014-03-04 11:08:18 +00:00
Benjamin Kramer
e4eb1b495f [C++11] Replace llvm::next and llvm::prior with std::next and std::prev.
Remove the old functions.

llvm-svn: 202636
2014-03-02 12:27:27 +00:00
Rafael Espindola
83f8550fb2 Rename many DataLayout variables from TD to DL.
I am really sorry for the noise, but the current state where some parts of the
code use TD (from the old name: TargetData) and other parts use DL makes it
hard to write a patch that changes where those variables come from and how
they are passed along.

llvm-svn: 201827
2014-02-21 00:06:31 +00:00
Rafael Espindola
e8856107f0 Fix pr14893.
When simplifycfg moves an instruction, it must drop metadata it doesn't know
is still valid with the preconditions changes. In particular, it must drop
the range and tbaa metadata.

The patch implements this with an utility function to drop all metadata not
in a white list.

llvm-svn: 200322
2014-01-28 16:56:46 +00:00
Manman Ren
c3f51e8e54 PGO branch weight: keep halving the weights until they can fit into
uint32.

When folding branches to common destination, the updated branch weights
can exceed uint32 by more than factor of 2. We should keep halving the
weights until they can fit into uint32.

llvm-svn: 200262
2014-01-27 23:39:03 +00:00
Alp Toker
1c4b33e8e5 Fix known typos
Sweep the codebase for common typos. Includes some changes to visible function
names that were misspelt.

llvm-svn: 200018
2014-01-24 17:20:08 +00:00
Hans Wennborg
efa9ef0e63 Switch-to-lookup tables: set threshold to 3 cases
There has been an old FIXME to find the right cut-off for when it's worth
analyzing and potentially transforming a switch to a lookup table.

The switches always have two or more cases. I could not measure any speed-up
by transforming a switch with two cases. A switch with three cases gets a nice
speed-up, and I couldn't measure any compile-time regression, so I think this
is the right threshold.

In a Clang self-host, this causes 480 new switches to be transformed,
and reduces the final binary size with 8 KB.

llvm-svn: 199294
2014-01-15 05:00:27 +00:00
Hans Wennborg
f5c5f6e123 Switch-to-lookup tables: Don't require a result for the default
case when the lookup table doesn't have any holes.

This means we can build a lookup table for switches like this:

  switch (x) {
    case 0: return 1;
    case 1: return 2;
    case 2: return 3;
    case 3: return 4;
    default: exit(1);
  }

The default case doesn't yield a constant result here, but that doesn't matter,
since a default result is only necessary for filling holes in the lookup table,
and this table doesn't have any holes.

This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB
off the resulting clang binary.

llvm-svn: 199025
2014-01-12 00:44:41 +00:00