Commit Graph

707 Commits

Author SHA1 Message Date
Arthur Eubanks
d74ec65308 [ConstProp] Remove ConstantPropagation
As discussed in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html.

Currently no users outside of unit tests.

Replace all instances in tests of -constprop with -instsimplify.
Notable changes in tests:
* vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead
* InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef
llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp

Reviewed By: lattner, nikic

Differential Revision: https://reviews.llvm.org/D85159
2020-08-26 15:51:30 -07:00
Sam Parker
50697e16b0 [NFC][SimplifyCFG] More tests for Arm 2020-08-25 12:13:48 +01:00
Sam Parker
2ad592033d [NFC][SimplifyCFG] Add some more tests for Arm. 2020-08-25 11:44:17 +01:00
Roman Lebedev
29a87631f2 Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"
As disscussed in post-commit review starting with
	https://reviews.llvm.org/D84108#2227365
while this appears to be mostly a win overall, especially code-size-wise,
this appears to shake //certain// code pattens in a way that is extremely
unfavorable for performance (+30% runtime regression)
on certain CPU's (i personally can't reproduce).

So until the behaviour is better understood, and a path forward is mapped,
let's back this out for now.

This reverts commit 1d51dc38d89bd33fb8874e242ab87b265b4dec1c.
2020-08-22 00:33:22 +03:00
Sam Parker
25faf23408 [NFC] Add SimplifyCFG for ARM
Add some phi elimination threshold testing.
2020-08-21 11:52:31 +01:00
Sam Parker
76932d3b0f [SimplifyCFG] Cost required selects
Before we speculatively execute a basic block, query the cost of
inserting the necessary select instructions against the phi folding
threshold. For non-trivial insertions, a more accurate decision can
probably be made during machine if-conversion. With minsize we query
the CodeSize cost, otherwise we use SizeAndLatency.

Differential Revision: https://reviews.llvm.org/D82438
2020-08-21 09:52:52 +01:00
Sam Parker
e6a76709fd [NFC][ARM][SimplifyCFG] Add some tests.
Add some tests around thresholds and minsize.
2020-08-11 15:13:58 +01:00
Roman Lebedev
903dd081e7 [SimplifyCFG] Fix invoke->call fold w/ multiple invokes in presence of lifetime intrinsics
SimplifyCFG has two main folds for resumes - one when resume is directly
using the landingpad, and the other one where resume is using a PHI node.

While for the first case, we were already correctly ignoring all the
PHI nodes, and both the debug info intrinsics and lifetime intrinsics,
in the PHI-based-one, we weren't ignoring PHI's in the resume block,
and weren't ignoring lifetime intrinsics. That is clearly a bug.

On RawSpeed library, this results in +9.34% (+81) more invoke->call folds,
-0.19% (-39) landing pads, -0.24% (-81) invoke instructions
but +51 call instructions and -132 basic blocks.

Though, the run-time performance impact appears to be within the noise.
2020-08-08 20:00:28 +03:00
Roman Lebedev
b96154f169 [NFC][SimplifyCFG] Add a test showing invoke->call simplification failure 2020-08-08 20:00:28 +03:00
Roman Lebedev
4a9109b967 [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108
2020-07-29 20:05:30 +03:00
Max Kazantsev
bb4a569d92 [SimplifyCFG] Do not create unneeded PR Phi in block with convergent calls
We do not thread blocks with convergent calls, but this check was missing
when we decide to insert PR Phis into it (which we only do for threading).

Differential Revision: https://reviews.llvm.org/D83936
Reviewed By: nikic
2020-07-22 13:53:50 +07:00
Chen Zheng
c7344cef99 [PowerPC] add store (load float*) pattern to isProfitableToHoist
store (load float*) can be optimized to store(load i32*) in InstCombine pass.

Add store (load float*) to isProfitableToHoist to make sure we don't break
the opt in InstCombine pass.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D82341
2020-07-21 20:55:13 -04:00
Roman Lebedev
39a69897ce [NFCI][SimplifyCFG] Guard common code hoisting with a (default-on) flag
Common code sinking is already guarded with a (with default-off!) flag,
so add a flag for hoisting, too.

D84108 will hopefully make hoisting off-by-default too.
2020-07-20 10:29:57 +03:00
Roman Lebedev
1ff9b75d60 [NFC][SimplifyCFG] Add standalone test for common code hoisting xform option
Also, move one test into it's correct place
2020-07-20 10:29:29 +03:00
Chen Zheng
25aa0c9508 [PowerPC]add testcase for adding store (load float*) pattern, nfc 2020-07-17 22:57:08 -04:00
Sam Parker
ef04808683 [NFC][ARM] Add SimplifyCFG test 2020-07-17 14:07:40 +01:00
Jon Roelofs
7f5ac9d171 [SimplifyCFG] Fix crash in the EXPENSIVE_CHECKS build
SimplifyCFG was incorrectly reporting to the pass manager that it had not made
changes after folding away a PHI.  This is detected in the EXPENSIVE_CHECKS
build when the function's hash changes.

Differential Revision: https://reviews.llvm.org/D83985
2020-07-16 15:34:41 -06:00
Max Kazantsev
4858da5c88 [Test] Add test that shows how SimplifyCFG may insert redunant Phi
It happens when a block cannot be threaded because of a convergent function.
2020-07-16 16:23:11 +07:00
Sam Parker
3a3181b7a3 [NFC][ARM] Add SimplifyCFG tests 2020-07-14 11:10:11 +01:00
Nikita Popov
b94ca47521 [InstSimplify] Handle not inserted instruction gracefully (PR46638)
When simplifying comparisons using a dominating assume, bail out
if the context instruction is not inserted.
2020-07-08 21:43:32 +02:00
Roman Lebedev
347c3e4e9c [InstCombine] Always try to invert non-canonical predicate of an icmp
Summary:
The actual transform i was going after was:
https://rise4fun.com/Alive/Tp9H
```
Name: zz
Pre: isPowerOf2(C0) && isPowerOf2(C1) && C1 == C0
%t0 = and i8 %x, C0
%r = icmp eq i8 %t0, C1
  =>
%t = icmp eq i8 %t0, 0
%r = xor i1 %t, -1

Name: zz
Pre: isPowerOf2(C0)
%t0 = and i8 %x, C0
%r = icmp ne i8 %t0, 0
  =>
%t = icmp eq i8 %t0, 0
%r = xor i1 %t, -1
```
but as it can be seen from the current tests, we already canonicalize most of it,
and we are only missing handling multi-use non-canonical icmp predicates.

If we have both `!=0` and `==0`, even though we can CSE them,
we end up being stuck with them. We should canonicalize to the `==0`.

I believe this is one of the cleanup steps i'll need after `-scalarizer`
if i end up proceeding with my WIP alloca promotion helper pass.

Reviewers: spatel, jdoerfert, nikic

Reviewed By: nikic

Subscribers: zzheng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83139
2020-07-04 18:12:04 +03:00
Sam Parker
44a5a2927b [NFC][SimplifyCFG] Move X86 tests into subdir 2020-07-03 14:28:27 +01:00
Max Kazantsev
c1e656184f [SimplifyCFG] Fix inconsistency in block size assessment for threading
Sometimes SimplifyCFG may decide to perform jump threading. In order
to do it, it follows the following algorithm:

1. Checks if the block is small enough for threading;
2. If yes, inserts a PR Phi relying that the next iteration will remove it
   by performing jump threading;
3. The next iteration checks the block again and performs the threading.

This logic has a corner case: inserting the PR Phi increases block's size
by 1. If the block size at first check was max possible, one more Phi will
exceed this size, and we will neither perform threading nor remove the
created Phi node. As result, we will end up with worse IR than before.

This patch fixes this situation by excluding Phis from block size computation.
Excluding Phis from size computation for threading also makes sense by
itself because in case of threadign all those Phis will be removed.

Differential Revision: https://reviews.llvm.org/D81835
Reviewed By: asbirlea, nikic
2020-06-30 12:40:07 +07:00
Nikita Popov
881c1da5d8 [SimplifyCFG] Make test more robust (NFC)
Avoid changing this test if blocks get merged.
2020-06-28 20:51:03 +02:00
Nikita Popov
8a71a5f127 [SimplifyCFG] Regenerate test checks (NFC) 2020-06-28 20:51:02 +02:00
Roman Lebedev
e1cb0d103c [Analysis] isDereferenceableAndAlignedPointer(): don't crash on bitcast <1 x ???*> to ???* 2020-06-27 18:30:59 +03:00
Roman Lebedev
1167cf8023 [CostModel] Avoid traditional ConstantExpr crashy pitfails
I'm not sure if this is a regression from D81448 + D81643,
which moved at least the code cast from elsewhere,
or somehow no one triggered that before.
But now we can reach it with a non-instruction..

It is not straight-forward to write cost-model tests for constantexprs,
`-cost-model -analyze -cost-kind=` does not appear to look at them,
or maybe i'm doing it wrong.

I've encountered that via a SimplifyCFG crash,
so reduced (currently-crashing) test is added.
There are likely other instances.

For now, simply restore previous status quo of
not crashing and returning TTI::TCC_Basic.
2020-06-26 22:48:10 +03:00
Vedant Kumar
8bce4cf299 [SimplifyCFG] Drop debug loc in SpeculativelyExecuteBB
Summary:
According to HowToUpdateDebugInfo.rst:

```
Preserving the debug locations of speculated instructions can make
it seem like a condition is true when it's not (or vice versa), which
leads to a confusing single-stepping experience
```

This patch follows the recommendation to drop debug locations on
speculated instructions.

Reviewers: aprantl, davide

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82420
2020-06-23 18:25:52 -07:00
Yevgeny Rouban
d58eac3927 [IR] Convert profile metadata in createCallMatchingInvoke()
When an invoke instruction is converted to a call its
profile metadata is dropped because it has incompatible
format (see commit 16ad6eeb94ff).
This patch adds an attempt to convert profile data to
format of the call instruction. This used to work well
before the commit dcfa78a4ccec.

Reviewers: reames
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82071
2020-06-20 12:10:31 +07:00
Davide Italiano
f3c25b4ad7 [SimplifyCFG] Update debug location when folding branch to common destination
Sometimes a dead block gets folded and the debug information is still
retained. This manifests as jumpy stepping in lldb, see the bugzilla PR
for an end-to-end C testcase.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46008

Differential Revision:  https://reviews.llvm.org/D82062
2020-06-18 12:33:32 -07:00
Hans Wennborg
a97775fde4 [IR] Don't copy profile metadata in createCallMatchingInvoke()
The invoke instruction can have profile metadata with branch_weights,
which does not make sense for a call instruction and will be
rejected by the verifier.

Differential revision: https://reviews.llvm.org/D81996
2020-06-17 11:18:23 +02:00
Max Kazantsev
ecfcde54b7 [Test] Add an example of unprofitable PR Phi insertion
This test demonstrates weird behavior of SimplifyCFG: seems that bigger
size of block leads to worse optimization choice.
2020-06-15 15:56:06 +07:00
serge-sans-paille
6e187aff39 Correctly update Changed status for SimplifyCFG
Interestingly, this leads to better output in one of the test case.

Differential Revision: https://reviews.llvm.org/D81237
2020-06-10 16:54:15 +02:00
Max Kazantsev
23bde48f84 [InstCombine] Sink pure instructions down to return and unreachable blocks
If the only user of `Instr` is in a return or unreachable block, we can
sink `Instr` to the`User` safely (unless it reads/writes memory).
Return or unreachable blocks are guaranteed to execute zero
or one time, and `Instr` always dominates `User`, so they either will
be executed together (execution of `User` always implies execution
of `Instr`) or not executed at all.

Differential Revision: https://reviews.llvm.org/D80120
Reviewed By: asbirlea, jdoerfert
2020-05-22 14:33:42 +07:00
Eli Friedman
202bb919c0 Make Value::getPointerAlignment() return an Align, not a MaybeAlign.
If we don't know anything about the alignment of a pointer, Align(1) is
still correct: all pointers are at least 1-byte aligned.

Included in this patch is a bugfix for an issue discovered during this
cleanup: pointers with "dereferenceable" attributes/metadata were
assumed to be aligned according to the type of the pointer.  This
wasn't intentional, as far as I can tell, so Loads.cpp was fixed to
stop making this assumption. Frontends may need to be updated.  I
updated clang's handling of C++ references, and added a release note for
this.

Differential Revision: https://reviews.llvm.org/D80072
2020-05-20 16:37:20 -07:00
Nikita Popov
cf8ee33937 [IR] Convert null-pointer-is-valid into an enum attribute
The "null-pointer-is-valid" attribute needs to be checked by many
pointer-related combines. To make the check more efficient, convert
it from a string into an enum attribute.

In the future, this attribute may be replaced with data layout
properties.

Differential Revision: https://reviews.llvm.org/D78862
2020-05-15 19:41:07 +02:00
Eli Friedman
f5d3346387 Infer alignment of unmarked loads in IR/bitcode parsing.
For IR generated by a compiler, this is really simple: you just take the
datalayout from the beginning of the file, and apply it to all the IR
later in the file. For optimization testcases that don't care about the
datalayout, this is also really simple: we just use the default
datalayout.

The complexity here comes from the fact that some LLVM tools allow
overriding the datalayout: some tools have an explicit flag for this,
some tools will infer a datalayout based on the code generation target.
Supporting this properly required plumbing through a bunch of new
machinery: we want to allow overriding the datalayout after the
datalayout is parsed from the file, but before we use any information
from it. Therefore, IR/bitcode parsing now has a callback to allow tools
to compute the datalayout at the appropriate time.

Not sure if I covered all the LLVM tools that want to use the callback.
(clang? lli? Misc IR manipulation tools like llvm-link?). But this is at
least enough for all the LLVM regression tests, and IR without a
datalayout is not something frontends should generate.

This change had some sort of weird effects for certain CodeGen
regression tests: if the datalayout is overridden with a datalayout with
a different program or stack address space, we now parse IR based on the
overridden datalayout, instead of the one written in the file (or the
default one, if none is specified). This broke a few AVR tests, and one
AMDGPU test.

Outside the CodeGen tests I mentioned, the test changes are all just
fixing CHECK lines and moving around datalayout lines in weird places.

Differential Revision: https://reviews.llvm.org/D78403
2020-05-14 13:03:50 -07:00
Zequan Wu
570033ed62 Add nomerge function attribute to supress tail merge optimization in simplifyCFG
We want to add a way to avoid merging identical calls so as to keep the
separate debug-information for those calls. There is also an asan
usecase where having this attribute would be beneficial to avoid
alternative work-arounds.

Here is the link to the feature request:
https://bugs.llvm.org/show_bug.cgi?id=42783.

`nomerge` is different from `noline`. `noinline` prevents function from
inlining at callsites, but `nomerge` prevents multiple identical calls
from being merged into one.

This patch adds `nomerge` to disable the optimization in IR level. A
followup patch will be needed to let backend understands `nomerge` and
avoid tail merge at backend.

Reviewed By: asbirlea, rnk

Differential Revision: https://reviews.llvm.org/D78659
2020-05-12 16:49:20 -07:00
Zequan Wu
1f84a491a4 Fix lifetime call in landingpad blocking Simplifycfg pass
Fix lifetime call in landingpad blocks simplifycfg from removing the
landingpad.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D77188
2020-04-09 13:07:32 -07:00
Jonathan Roelofs
c9acf39233 [llvm] Fix missing FileCheck directive colons
https://reviews.llvm.org/D77352
2020-04-06 09:59:08 -06:00
Matt Arsenault
1600386ddc Allow replacing intrinsic operands with variables
Since intrinsics can now specify when an argument is required to be
constant, it is now OK to replace arguments with variables if they
aren't. This means intrinsics must now be accurately marked with
immarg.
2020-03-23 15:51:57 -04:00
Chen Zheng
4ee17cafbe [PowerPC] implement target hook isProfitableToHoist
On Powerpc fma is faster than fadd + fmul for some types,
(PPCTargetLowering::isFMAFasterThanFMulAndFAdd). we should implement target
hook isProfitableToHoist to prevent simplifyCFGpass from breaking fma
pattern by hoisting fmul to predecessor block.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D76207
2020-03-19 00:17:25 -04:00
Chen Zheng
96c74cf653 [PowerPC] add test cases for target hook isProfitableToHoist - NFC 2020-03-16 23:07:30 -04:00
Sanjay Patel
6180cf61cf [SimplifyCFG] add test for chain of empty block conditional branches; NFC 2020-03-13 14:39:31 -04:00
Sanjay Patel
57850e1de0 [SimplifyCFG] regenerate complete test checks; NFC 2020-03-13 14:12:28 -04:00
Sanjay Patel
4c46383ca2 [SimplifyCFG] regenerate test checks; NFC 2020-03-13 14:12:28 -04:00
Jonas Paulsson
a150110be1 [SimplifyCFG] Skip merging return blocks if it would break a CallBr.
SimplifyCFG should not merge empty return blocks and leave a CallBr behind
with a duplicated destination since the verifier will then trigger an
assert. This patch checks for this case and avoids the transformation.

CodeGenPrepare has a similar check which also has a FIXME comment about why
this is needed. It seems perhaps better if these two passes would eventually
instead update the CallBr instruction instead of just checking and avoiding.

This fixes https://bugs.llvm.org/show_bug.cgi?id=45062.

Review: Craig Topper

Differential Revision: https://reviews.llvm.org/D75620
2020-03-10 14:59:13 +01:00
Nikita Popov
730509657a [InstCombine] DCE instructions earlier
When InstCombine initially populates the worklist, it already
performs constant folding and DCE. However, as the instructions
are initially visited in program order, this DCE can pick up only
the last instruction of a dead chain, the rest would only get
picked up in the main InstCombine run.

To avoid this, we instead perform the DCE in separate pass over the
collected instructions in reverse order, which will allow us to
pick up full dead instruction chains. We already need to do this
reverse iteration anyway to populate the worklist, so this
shouldn't add extra cost.

This by itself only fixes a small part of the problem though:
The same basic issue also applies during the main InstCombine loop.
We generally always want DCE to occur as early as possible,
because it will allow one-use folds to happen. Address this by also
performing DCE while adding deferred instructions to the main worklist.

This drops the number of tests that perform more than 2 InstCombine
iterations from ~80 to ~40. There's some spurious test changes due
to operand order / icmp toggling.

Differential Revision: https://reviews.llvm.org/D75008
2020-02-27 18:45:59 +01:00
stozer
b6fc689526 Re-revert: Recover debug intrinsics when killing duplicated/empty blocks
This reverts commit 61b35e4111160fe834a00c33d040e01150b576ac.

This commit causes a timeout in chromium builds; likely to have a
similar cause to the previous timeout issue caused by this commit (see
6ded69f294a9 for more details). It is possible that there is no way to
fix this bug that will not cause this issue; further investigations as
to the efficiency of handling large amounts of debug info will be
necessary.
2020-02-13 11:48:19 +00:00
stozer
e5bafa8b36 Re-reapply: Recover debug intrinsics when killing duplicated/empty blocks
This reverts commit 636c93ed11a5f98b5c3ff88241460d61cb7785bb.

The original patch caused build failures on TSan buildbots. Commit 6ded69f294a9
fixes this issue by reducing the rate at which empty debug intrinsics
propagate, reducing the memory footprint and preventing a fatal spike.
2020-02-12 14:36:30 +00:00