10261 Commits

Author SHA1 Message Date
Jun Bum Lim
910c8cc532 [InlineCost] Do not take INT_MAX when Cost is negative
Summary: visitSwitchInst should not take INT_MAX when Cost is negative. Instead of INT_MAX , we also use a valid upperbound cost when overflow occurs in Cost.

Reviewers: hans, echristo, dmgreen

Reviewed By: dmgreen

Subscribers: mcrosier, javed.absar, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D34436

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306118 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-23 16:12:37 +00:00
Anna Thomas
2cfdb4aa6c [InstCombine] Recognize and simplify three way comparison idioms
Summary:
Many languages have a three way comparison idiom where comparing two values
produces not a boolean, but a tri-state value. Typical values (e.g. as used in
the lcmp/fcmp bytecodes from Java) are -1 for less than, 0 for equality, and +1
for greater than.

We actually do a great job already of converting three way comparisons into
binary comparisons when the result produced has one a single use. Unfortunately,
such values can have more than one use, and in that case, our existing
optimizations break down.

The patch adds a peephole which converts a three-way compare + test idiom into a
binary comparison on the original inputs. It focused on replacing the test on
the result of the three way compare and does nothing about removing the three
way compare itself. That's left to other optimizations (which do actually kick
in commonly.)
We currently recognize one idiom on signed integer compare. In the future, we
plan to recognize and simplify other comparison idioms on
other signed/unsigned datatypes such as floats, vectors etc.

This is a resurrection of Philip Reames' original patch:
https://reviews.llvm.org/D19452

Reviewers: majnemer, apilipenko, reames, sanjoy, mkazantsev

Reviewed by: mkazantsev

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34278

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306100 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-23 13:41:45 +00:00
Chandler Carruth
9dc2b94a11 [LoopSimplify] Factor the logic to form dedicated exits into a utility.
I want to use the same logic as LoopSimplify to form dedicated exits in
another pass (SimpleLoopUnswitch) so I wanted to factor it out here.

I also noticed that there is a pretty significantly more efficient way
to implement this than the way the code in LoopSimplify worked. We don't
need to actually retain the set of unique exit blocks, we can just
rewrite them as we find them and use only a set to deduplicate.

This did require changing one part of LoopSimplify to not re-use the
unique set of exits, but it only used it to check that there was
a single unique exit. That part of the code is about to walk the exiting
blocks anyways, so it seemed better to rewrite it to use those exiting
blocks to compute this property on-demand.

I also had to ditch a statistic, but it doesn't seem terribly valuable.

Differential Revision: https://reviews.llvm.org/D34049

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306081 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-23 04:03:04 +00:00
Craig Topper
9c13e87ea8 [LVI] Teach LVI to reason about ORs of icmps similar to how it reasons about ANDs of icmps
Summary: LVI can reason about an AND of icmps on the true dest of a branch. I believe we can do similar for the false dest of ORs. This allows us to get the same answer for the demorganed versions of some of the AND test cases as you can see.

Reviewers: anna, reames

Reviewed By: reames

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34431

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306076 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-23 01:08:16 +00:00
Andrew Kaylor
c539eea7c6 Restrict the definition of loop preheader to avoid EH blocks
Differential Revision: https://reviews.llvm.org/D34487

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306070 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 23:27:16 +00:00
whitequark
e4b1890fda Define behavior of "stack-probe-size" attribute when inlining.
Also document the attribute, since "probe-stack" already is.

Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D34528

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306069 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 23:22:36 +00:00
Farhana Aleen
e83d2eccef Supported lowerInterleavedStore() in X86InterleavedAccess.
Reviewers: RKSimon, DavidKreitzer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D32658

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306068 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 22:59:04 +00:00
Eric Christopher
e1ae008085 Remove the LoadCombine pass. It was never enabled and is unsupported.
Based on discussions with the author on mailing lists.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306067 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 22:58:12 +00:00
Anna Thomas
980c01094a [LoopDeletion] Update exits correctly when multiple duplicate edges from an exiting block
Summary:
Currently, we incorrectly update exit blocks of loops when there are multiple
edges from a single exiting block to the exit block. This can happen when we
have switches as the terminator of the exiting blocks.
The fix here is to correctly update the phi nodes in the exit block, and remove
all incoming values *except* for one which is from the preheader.

Note: Currently, this error can manifest only while deleting non-executed loops. However, it
is possible to trigger this error in invariant loops, once we enhance the logic
around the exit conditions for the loop check.

Reviewers: chandlerc, dberlin, sanjoy, efriedma

Reviewed by: efriedma

Subscribers: mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D34516

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306048 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 20:20:56 +00:00
Craig Topper
bba5503eed [InstCombine] Teach foldSelectICmpAndOr to recognize (select (icmp slt (trunc (X)), 0), Y, (or Y, C2))
Summary:
InstCombine likes to turn (icmp eq (and X, C1), 0) into (icmp slt (trunc (X)), 0) sometimes. This breaks foldSelectICmpAndOr's ability to recognize (select (icmp eq (and X, C1), 0), Y, (or Y, C2))->(or (shl (and X, C1), C3), y).

This patch tries to recover this. I had to flip around some of the early out checks so that I could create a new And instruction during the compare processing without it possibly never getting used.

Reviewers: spatel, majnemer, davide

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34184

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306029 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 16:23:30 +00:00
Craig Topper
b776efaa09 [InstCombine] Add one use checks to or/and->xnor folding
If the components of the and/or had multiple uses, this transform created an additional instruction.

This patch makes sure we remove one of the components.

Differential Revision: https://reviews.llvm.org/D34498

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306027 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 16:12:02 +00:00
Sanjay Patel
4c45e36dd8 [InstCombine] reverse bitcast + bitwise-logic canonicalization (PR33138)
There are 2 parts to this patch made simultaneously to avoid a regression.

We're reversing the canonicalization that moves bitwise vector ops before bitcasts. 
We're moving bitwise vector ops *after* bitcasts instead. That's the 1st and 3rd hunks 
of the patch. The motivation is that there's only one fold that currently depends on 
the existing canonicalization (see next), but there are many folds that would 
automatically benefit from the new canonicalization. 
PR33138 ( https://bugs.llvm.org/show_bug.cgi?id=33138 ) shows why/how we have these 
patterns in IR.

There's an or(and,andn) pattern that requires an adjustment in order to continue matching
to 'select' because the bitcast changes position. This match is unfortunately complicated 
because it requires 4 logic ops with optional bitcast and sext ops.

Test diffs:

  1. The bitcast.ll and bitcast-bigendian.ll changes show the most basic difference - 
     bitcast comes before logic.
  2. There are also tests with no diffs in bitcast.ll that verify that we're still doing 
     folds that were enabled by the previous canonicalization.
  3. icmp-xor-signbit.ll shows the payoff. We don't need to adjust existing icmp patterns 
     to look through bitcasts.
  4. logical-select.ll contains several tests for the or(and,andn) --> select fold to 
     verify that we are still handling those cases. The lone diff shows the movement of 
     the bitcast from the new canonicalization rule.

Differential Revision: https://reviews.llvm.org/D33517



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@306011 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 15:46:54 +00:00
Diana Picus
e36adbda88 Revert "Enable vectorizer-maximize-bandwidth by default."
This reverts commit r305960 because it broke self-hosting on AArch64.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305990 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 10:00:28 +00:00
Craig Topper
e696366e67 [InstCombine] Add test cases to demonstrate that and->xnor and or->xnor folding can create more instructions than it removed when there are multiple uses. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305985 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-22 05:20:39 +00:00
Dehao Chen
998914d301 Enable vectorizer-maximize-bandwidth by default.
Summary:
vectorizer-maximize-bandwidth is generally useful in terms of performance. I've tested the impact of changing this to default on speccpu benchmarks on sandybridge machines. The result shows non-negative impact:

spec/2006/fp/C++/444.namd                 26.84  -0.31%
spec/2006/fp/C++/447.dealII               46.19  +0.89%
spec/2006/fp/C++/450.soplex               42.92  -0.44%
spec/2006/fp/C++/453.povray               38.57  -2.25%
spec/2006/fp/C/433.milc                   24.54  -0.76%
spec/2006/fp/C/470.lbm                    41.08  +0.26%
spec/2006/fp/C/482.sphinx3                47.58  -0.99%
spec/2006/int/C++/471.omnetpp             22.06  +1.87%
spec/2006/int/C++/473.astar               22.65  -0.12%
spec/2006/int/C++/483.xalancbmk           33.69  +4.97%
spec/2006/int/C/400.perlbench             33.43  +1.70%
spec/2006/int/C/401.bzip2                 23.02  -0.19%
spec/2006/int/C/403.gcc                   32.57  -0.43%
spec/2006/int/C/429.mcf                   40.35  +0.27%
spec/2006/int/C/445.gobmk                 26.96  +0.06%
spec/2006/int/C/456.hmmer                  24.4  +0.19%
spec/2006/int/C/458.sjeng                 27.91  -0.08%
spec/2006/int/C/462.libquantum            57.47  -0.20%
spec/2006/int/C/464.h264ref               46.52  +1.35%

geometric mean                                   +0.29%

The regression on 453.povray seems real, but is due to secondary effects as all hot functions are bit-identical with and without the flag.

I started this patch to consult upstream opinions on this. It will be greatly appreciated if the community can help test the performance impact of this change on other architectures so that we can decided if this should be target-dependent.

Reviewers: hfinkel, mkuper, davidxl, chandlerc

Reviewed By: chandlerc

Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari, RKSimon, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D33341

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305960 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-21 22:01:32 +00:00
whitequark
4c34d0afe1 Add a "probe-stack" attribute
This attribute is used to ensure the guard page is triggered on stack
overflow. Stack frames larger than the guard page size will generate
a call to __probestack to touch each page so the guard page won't
be skipped.

Reviewed By: majnemer

Differential Revision: https://reviews.llvm.org/D34386

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305939 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-21 18:46:50 +00:00
Dehao Chen
5f67a41bab Do not inline recursive direct calls in sample loader pass.
Summary: r305009 disables recursive inlining for indirect calls in sample loader pass. The same logic applies to direct recursive calls.

Reviewers: iteratee, davidxl

Reviewed By: iteratee

Subscribers: sanjoy, llvm-commits, eraman

Differential Revision: https://reviews.llvm.org/D34456

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305934 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-21 17:57:43 +00:00
Sanjay Patel
2c60ba8943 [x86] set the datalayout to match the RUN line triple; NFC
I don't think there's any visible difference from having the wrong layout
for the 32-bit case at this point, but that could change in the future.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305931 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-21 17:06:24 +00:00
Craig Topper
ebc007dabb [InstCombine] Add range metadata to cttz/ctlz/ctpop intrinsic calls based on known bits
Summary:
I noticed that passing known bits across these intrinsics isn't great at capturing the information we really know. Turning known bits of the input into known bits of a count output isn't able to convey a lot of what we really know.

This patch adds range metadata to these intrinsics based on the known bits.

Currently the patch punts if we already have range metadata present.

Reviewers: spatel, RKSimon, davide, majnemer

Reviewed By: RKSimon

Subscribers: sanjoy, hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D32582

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305927 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-21 16:32:35 +00:00
Craig Topper
f2fe26d60e [InstCombine] Don't let folding (select (icmp eq (and X, C1), 0), Y, (or Y, C2)) create more instructions than it removes
Summary:
Previously this folding had no checks to see if it was going to result in less instructions. This was pointed out during the review of D34184

This patch adds code to count how many instructions its going to create vs how many its going to remove so we can make a proper decision.

Reviewers: spatel, majnemer

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34437

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305926 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-21 16:07:13 +00:00
Craig Topper
e44557f019 [Reassociate] Support xor reassociating for splat vectors
Summary: This patch adds support for xors of splat vectors.

Reviewers: mcrosier

Reviewed By: mcrosier

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34354

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305925 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-21 16:07:09 +00:00
Max Kazantsev
a0c83f81b9 [SCEV] Make MulOpsInlineThreshold lower to avoid excessive compilation time
MulOpsInlineThreshold option of SCEV is defaulted to 1000, which is inadequately high.
When constructing SCEVs of expressions like:

  x1 = a * a
  x2 = x1 * x1
  x3 = x2 * x2
    ...

We actually have huge SCEVs with max allowed amount of operands inlined.
Such expressions are easy to get from unrolling of loops looking like

  x = a
  for (i = 0; i < n; i++)
    x = x * x

Or more tricky cases where big powers are involved. If some non-linear analysis
tries to work with a SCEV that has 1000 operands, it may lead to excessively long
compilation. The attached test does not pass within 1 minute with default threshold.

This patch decreases its default value to 32, which looks much more reasonable if we
use analyzes with complexity O(N^2) or O(N^3) working with SCEV.

Differential Revision: https://reviews.llvm.org/D34397


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305882 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-21 07:28:13 +00:00
Matt Arsenault
84b3660bac AMDGPU: Allow vectorization of packed types
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305844 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 20:38:06 +00:00
Sanjay Patel
aab686b3f7 [x86] enable CGP memcmp() expansion for 2/4/8 byte sizes
There are a couple of potential improvements as seen in the IR and asm:
1. We're unnecessarily extending to a larger type to compare values.
2. The codegen for (select cond, 1, -1) could avoid a cmov.
(or we could change the order of the compares, so we have a select with 0 operand)


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305802 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 15:58:30 +00:00
Sanjay Patel
031043b243 [InstCombine] fix code/test comments for r305792; NFC
These diffs were in the last version of the patch in D33342,
but I accidentally committed the previous rev. 


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305793 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 12:45:46 +00:00
Sanjay Patel
3e4188dc6c [InstCombine] try to canonicalize xor-of-icmps to and-of-icmps
We have a large portfolio of folds for and-of-icmps and or-of-icmps in InstSimplify and InstCombine, 
but hardly anything for xor-of-icmps. Rather than trying to rethink and translate all of those folds, 
we can use the truth table definition of xor:

X ^ Y --> (X | Y) & !(X & Y)

...to see if we can convert the xor to and/or and then use the existing folds.

http://rise4fun.com/Alive/J9v

Differential Revision: https://reviews.llvm.org/D33342


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305792 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 12:40:55 +00:00
Ana Pazos
264bdd8966 [PATCH] [PGO] Fixed cast operation in emIntrinsicVisitor::instrumentOneMemIntrinsic.
Reviewers: xur, efriedma, davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34293

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305737 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 20:04:33 +00:00
Taewook Oh
9f93c9df69 Improve profile-guided heuristics to use estimated trip count.
Summary:
Existing heuristic uses the ratio between the function entry
frequency and the loop invocation frequency to find cold loops. However,
even if the loop executes frequently, if it has a small trip count per
each invocation, vectorization is not beneficial. On the other hand,
even if the loop invocation frequency is much smaller than the function
invocation frequency, if the trip count is high it is still beneficial
to vectorize the loop.

This patch uses estimated trip count computed from the profile metadata
as a primary metric to determine coldness of the loop. If the estimated
trip count cannot be computed, it falls back to the original heuristics.

Reviewers: Ayal, mssimpso, mkuper, danielcdh, wmi, tejohnson

Reviewed By: tejohnson

Subscribers: tejohnson, mzolotukhin, llvm-commits

Differential Revision: https://reviews.llvm.org/D32451

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305729 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 18:48:58 +00:00
Bjorn Pettersson
763224da2b [InstCombine] Make sure AddReachableCodeToWorklist sets MadeIRChange
Summary:
Some optimizations in AddReachableCodeToWorklist did not update
the MadeIRChange state. This could happen both when removing
trivially dead instructions (DCE) and at constant folds.

It is essential that changes to the IR is reported correctly,
since for example InstCombinePass::run() will indicate that all
analyses are preserved otherwise.
And the CGPassManager determines if the CallGraph is up-to-date
based on status from InstructionCombiningPass::runOnFunction().

The new test case early_dce_clobbers_callgraph.ll is a reproducer
for some asserts that started to trigger after changes in the
inliner in r305245. With this patch the test case passes again.

Reviewers: sanjoy, craig.topper, dblaikie

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34346

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305725 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 18:00:27 +00:00
Hans Wennborg
08030e7683 Revert r304824 "Fix PR23384 (part 3 of 3)"
This seems to be interacting badly with ASan somehow, causing false reports of
heap-buffer overflows: PR33514.

> Summary:
> The patch makes instruction count the highest priority for
> LSR solution for X86 (previously registers had highest priority).
>
> Reviewers: qcolombet
>
> Differential Revision: http://reviews.llvm.org/D30562
>
> From: Evgeny Stupachenko <evstupac@gmail.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305720 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 17:57:15 +00:00
Craig Topper
b1e0fcd931 [InstCombine] Cleanup some duplicated one use checks
Summary:
These 4 patterns have the same one use check repeated twice for each. Once without a cast and one with. But the cast has no effect on what method is called.

For the OR case I believe it is always profitable regardless of the number of uses since we'll never increase the instruction count.

For the AND case I believe it is profitable if the pair of xors has one use such that we'll get rid of it completely. Or if the C value is something freely invertible, in which case the not doesn't cost anything.

Reviewers: spatel, majnemer

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34308

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305705 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 16:23:49 +00:00
Craig Topper
b9bca3e50b [Reassociate] Support some reassociation of vector xors
Summary:
Currently we don't try to do anything with vector xors.

This patch adds support for removing duplicate pairs from a chain of vector xors as its pretty easy to support. We still dont' try to combine the xors with and/ors, but I might try that in a future patch.

Reviewers: mcrosier, davide, resistor

Reviewed By: mcrosier

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34338

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305704 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 16:23:46 +00:00
Xin Tong
8db6b00572 [TRE] Improve code motion in TRE, use AA to tell whether a load can be moved before a call that writes to memory.
Summary: use AA to tell whether a load can be moved before a call that writes to memory.

Reviewers: dberlin, davide, sanjoy, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: https://reviews.llvm.org/D34115

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305698 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 15:21:18 +00:00
Max Kazantsev
9026c3d3fa [SCEV] Teach SCEVExpander to expand BinPow
Current implementation of SCEVExpander demonstrates a very naive behavior when
it deals with power calculation. For example, a SCEV for x^8 looks like

  (x * x * x * x * x * x * x * x)

If we try to expand it, it generates a very straightforward sequence of muls, like:

  x2 = mul x, x
  x3 = mul x2, x
  x4 = mul x3, x
      ...
  x8 = mul x7, x

This is a non-efficient way of doing that. A better way is to generate a sequence of
binary power calculation. In this case the expanded calculation will look like:

  x2 = mul x, x
  x4 = mul x2, x2
  x8 = mul x4, x4

In some cases the code size reduction for such SCEVs is dramatic. If we had a loop:

  x = a;
  for (int i = 0; i < 3; i++)
    x = x * x;

And this loop have been fully unrolled, we have something like:

  x = a;
  x2 = x * x;
  x4 = x2 * x2;
  x8 = x4 * x4;

The SCEV for x8 is the same as in example above, and if we for some reason
want to expand it, we will generate naively 7 multiplications instead of 3.
The BinPow expansion algorithm here allows to keep code size reasonable.

This patch teaches SCEV Expander to generate a sequence of BinPow multiplications
if we have repeating arguments in SCEVMulExpressions.

Differential Revision: https://reviews.llvm.org/D34025


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305663 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 06:24:53 +00:00
Daniel Berlin
47c282c3df NewGVN: Fix PR 33461, caused by slightly overzealous verification.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305657 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 00:24:00 +00:00
Xin Tong
0de5bc00b8 Add argmononly attribute to strlen and wcslen, i.e. they only read memory (string) passed to them.
Summary:
This allows strlen to be moved out of the loop in case its argument is
not modified in the loop in LICM.

Reviewers: hfinkel, davide, sanjoy, dberlin

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34323

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305641 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-18 03:10:26 +00:00
Sanjoy Das
f54df2f626 [SROA] Add support for non-integral pointers
Summary: C.f. http://llvm.org/docs/LangRef.html#non-integral-pointer-type

Reviewers: chandlerc, loladiro

Reviewed By: loladiro

Subscribers: reames, loladiro, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D32203

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305639 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-17 20:28:13 +00:00
Davide Italiano
aae294a712 [InstCombine] Make FPMathOperator working with ConstantExpression(s).
Fixes PR33453.

Differential Revision:  https://reviews.llvm.org/D34303

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305618 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-17 00:07:22 +00:00
Wei Mi
2244b2d0d0 Revert rL305578. There is still some buildbot failure to be fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305603 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 23:14:35 +00:00
Anna Thomas
512a2c924a [InstCombine] Set correct insertion point for selects generated while folding phis
Summary:
When we fold vector constants that are operands of phi's that feed into select,
we need to set the correct insertion point for the *new* selects that get generated.
The correct insertion point is the incoming block for the phi.
Such cases can occur with patch r298845, which fixed folding of
vector constants, but the new selects could be inserted incorrectly (as the added
test case shows).

Reviewers: majnemer, spatel, sanjoy

Reviewed by: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34162

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305591 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 21:08:37 +00:00
Evgeniy Stepanov
e9eb0b8592 Change YAML traits for vector<string> to flow_vector.
This is a workaround for an ODR conflict with the definition in
AMDGPUCodeObjectMetadata.cpp.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305584 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 20:50:41 +00:00
Wei Mi
abc4fc5ad2 [GVN] Recommit the patch "Add phi-translate support in scalarpre".
The recommit fixes two bugs: The first one is to use CurrentBlock instead of
PREInstr's Parent as param of performScalarPREInsertion because the Parent
of a clone instruction may be uninitialized. The second one is stop PRE when
CurrentBlock to its predecessor is a backedge and an operand of CurInst is
defined inside of CurrentBlock. The same value defined inside of loop in last
iteration can not be regarded as available.

Right now scalarpre doesn't have phi-translate support, so it will miss some
simple pre opportunities. Like the following testcase, current scalarpre cannot
recognize the last "a * b" is fully redundent because a and b used by the last
"a * b" expr are both defined by phis.

long a[100], b[100], g1, g2, g3;
__attribute__((pure)) long goo();

void foo(long a, long b, long c, long d) {

  g1 = a * b;
  if (__builtin_expect(g2 > 3, 0)) {
    a = c;
    b = d;
    g2 = a * b;
  }
  g3 = a * b;      // fully redundant.

}
The patch adds phi-translate support in scalarpre. This is only a temporary
solution before the newpre based on newgvn is available.

Differential Revision: https://reviews.llvm.org/D32252


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305578 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 20:21:01 +00:00
Craig Topper
998d866d47 [InstCombine] Add test cases to show missed opportunities due to overly conservative single use checks. NFC
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305562 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 16:44:36 +00:00
Daniel Neilson
470c6959b7 [Atomics] Rename and change prototype for atomic memcpy intrinsic
Summary:

Background: http://lists.llvm.org/pipermail/llvm-dev/2017-May/112779.html

This change is to alter the prototype for the atomic memcpy intrinsic. The prototype itself is being changed to more closely resemble the semantics and parameters of the llvm.memcpy intrinsic -- to ease later combination of the llvm.memcpy and atomic memcpy intrinsics. Furthermore, the name of the atomic memcpy intrinsic is being changed to make it clear that it is not a generic atomic memcpy, but specifically a memcpy is unordered atomic.

Reviewers: reames, sanjoy, efriedma

Reviewed By: reames

Subscribers: mzolotukhin, anna, llvm-commits, skatkov

Differential Revision: https://reviews.llvm.org/D33240

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305558 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 14:43:59 +00:00
Craig Topper
091e96c54f [InstCombine] Fold (!iszero(A & K1) & !iszero(A & K2)) -> (A & (K1 | K2)) == (K1 | K2) if K1 and K2 are a 1-bit mask
Summary: This is the demorganed version of the case we already handle for the OR of iszero.

Reviewers: spatel

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D34244

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305548 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 05:10:37 +00:00
Evgeniy Stepanov
3184f8d7bb [cfi] CFI-ICall for ThinLTO.
Implement ControlFlowIntegrity for indirect function calls in ThinLTO.
Design follows the RFC in llvm-dev, see
https://groups.google.com/d/msg/llvm-dev/MgUlaphu4Qc/kywu0AqjAQAJ

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305533 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 00:18:29 +00:00
Craig Topper
66f76f7576 [InstCombine] Add test cases to demonstrate instcombine increasing instruction count when trying to fold (select (icmp eq (and X, C1), 0), Y, (or Y, C2))->(or (shl (and X, C1), C3), y) when the pieces have multiple uses.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305509 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 21:38:44 +00:00
Craig Topper
ccc684d065 [InstCombine] Pre-commit test cases for the transform proposed in D34244.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305492 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 18:56:05 +00:00
Craig Topper
a14f1bc634 [InstCombine] Handle (iszero(A & K1) | iszero(A & K2)) -> (A & (K1 | K2)) != (K1 | K2) when the one of the Ands is commuted relative to the other
Currently we expect A to be on the same side in both Ands but nothing guarantees that.

While there also switch to using matchers for some of the code.

Differential Revision: https://reviews.llvm.org/D34230

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305487 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 17:55:20 +00:00
Craig Topper
c9069cd453 [BasicAA] Add test case that goes with r305481.
Forgot to 'git add' the file.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305483 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 17:27:56 +00:00