15950 Commits

Author SHA1 Message Date
Roman Lebedev
bf8ef29ff5 [NFC][InstCombine] Add a PHI-of-insertvalues test with different base aggregate types 2020-08-26 09:57:50 +03:00
Roman Lebedev
a2f15b5ff5 Revert "[InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad"
This reverts commit fcb51d8c2460faa23b71e06abb7e826243887dd6.

As buildbots report, there's apparently some missing check to ensure
that the types of incoming values match the type of PHI.
Let's revert for a moment.
2020-08-26 09:23:22 +03:00
Roman Lebedev
5ec7b497bf [InstCombine] PHI-of-extractvalues -> extractvalue-of-PHI, aka invokes are bad
While since D86306 we do it's sibling fold for `insertvalue`,
we should also do this for `extractvalue`'s.

And unlike that one, the results here are, quite honestly, shocking,
as it can be observed here on vanilla llvm test-suite + RawSpeed results:

```
| statistic name                                     | baseline  | proposed  |       Δ |       % |    |%| |
|----------------------------------------------------|-----------|-----------|--------:|--------:|-------:|
| asm-printer.EmittedInsts                           | 7945095   | 7942507   |   -2588 |  -0.03% |  0.03% |
| assembler.ObjectBytes                              | 273209920 | 273069800 | -140120 |  -0.05% |  0.05% |
| early-cse.NumCSE                                   | 2183363   | 2183398   |      35 |   0.00% |  0.00% |
| early-cse.NumSimplify                              | 541847    | 550017    |    8170 |   1.51% |  1.51% |
| instcombine.NumAggregateReconstructionsSimplified  | 2139      | 108       |   -2031 | -94.95% | 94.95% |
| instcombine.NumCombined                            | 3601364   | 3635448   |   34084 |   0.95% |  0.95% |
| instcombine.NumConstProp                           | 27153     | 27157     |       4 |   0.01% |  0.01% |
| instcombine.NumDeadInst                            | 1694521   | 1765022   |   70501 |   4.16% |  4.16% |
| instcombine.NumPHIsOfExtractValues                 | 0         | 37546     |   37546 |   0.00% |  0.00% |
| instcombine.NumSunkInst                            | 63158     | 63686     |     528 |   0.84% |  0.84% |
| instcount.NumBrInst                                | 874304    | 871857    |   -2447 |  -0.28% |  0.28% |
| instcount.NumCallInst                              | 1757657   | 1758402   |     745 |   0.04% |  0.04% |
| instcount.NumExtractValueInst                      | 45623     | 11483     |  -34140 | -74.83% | 74.83% |
| instcount.NumInsertValueInst                       | 4983      | 580       |   -4403 | -88.36% | 88.36% |
| instcount.NumInvokeInst                            | 61018     | 59478     |   -1540 |  -2.52% |  2.52% |
| instcount.NumLandingPadInst                        | 35334     | 34215     |   -1119 |  -3.17% |  3.17% |
| instcount.NumPHIInst                               | 344428    | 331116    |  -13312 |  -3.86% |  3.86% |
| instcount.NumRetInst                               | 100773    | 100772    |      -1 |   0.00% |  0.00% |
| instcount.TotalBlocks                              | 1081154   | 1077166   |   -3988 |  -0.37% |  0.37% |
| instcount.TotalFuncs                               | 101443    | 101442    |      -1 |   0.00% |  0.00% |
| instcount.TotalInsts                               | 8890201   | 8833747   |  -56454 |  -0.64% |  0.64% |
| instsimplify.NumSimplified                         | 75822     | 75707     |    -115 |  -0.15% |  0.15% |
| simplifycfg.NumHoistCommonCode                     | 24203     | 24197     |      -6 |  -0.02% |  0.02% |
| simplifycfg.NumHoistCommonInstrs                   | 48201     | 48195     |      -6 |  -0.01% |  0.01% |
| simplifycfg.NumInvokes                             | 2785      | 4298      |    1513 |  54.33% | 54.33% |
| simplifycfg.NumSimpl                               | 997332    | 1018189   |   20857 |   2.09% |  2.09% |
| simplifycfg.NumSinkCommonCode                      | 7088      | 6464      |    -624 |  -8.80% |  8.80% |
| simplifycfg.NumSinkCommonInstrs                    | 15117     | 14021     |   -1096 |  -7.25% |  7.25% |
```
... which tells us that this new fold fires whopping 38k times,
increasing the amount of SimplifyCFG's `invoke`->`call` transforms by +54% (+1513) (again, D85787 did that last time),
decreasing total instruction count by -0.64% (-56454),
and sharply decreasing count of `insertvalue`'s (-88.36%, i.e. 9 times less)
and `extractvalue`'s (-74.83%, i.e. four times less).

This causes geomean -0.01% binary size decrease
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=size-text
and, ignoring `O0-g`, is a geomean -0.01%..-0.05% compile-time improvement
http://llvm-compile-time-tracker.com/compare.php?from=4d5ca22b8adfb6643466e4e9f48ba14bb48938bc&to=97dacca0111cb2ae678204e52a3cee00e3a69208&stat=instructions

The other thing that tells is, is that while this is a massive win for `invoke`->`call` transform
`InstCombinerImpl::foldAggregateConstructionIntoAggregateReuse()` fold,
which is supposed to be dealing with such aggregate reconstructions,
fires a lot less now. There are two reasons why:
1. After this fold, as it can be seen in tests, we may (will) end up with trivially redundant PHI nodes.
   We don't CSE them in InstCombine presently, which means that EarlyCSE needs to run and then InstCombine rerun.
2. But then, EarlyCSE not only manages to fold such redundant PHI's,
   it also sees that the extract-insert chain recreates the original aggregate,
   and replaces it with the original aggregate.

The take-aways are
1. We maybe should do most trivial, same-BB PHI CSE in InstCombine
2. I need to check if what other patterns remain, and how they can be resolved.
   (i.e. i wonder if `foldAggregateConstructionIntoAggregateReuse()` might go away)

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86530
2020-08-26 09:08:24 +03:00
Mircea Trofin
b186c6758c [MLInliner] Simplify TFUTILS_SUPPORTED_TYPES
We only need the C++ type and the corresponding TF Enum. The other
parameter was used for the output spec json file, but we can just
standardize on the C++ type name there.

Differential Revision: https://reviews.llvm.org/D86549
2020-08-25 14:19:39 -07:00
Juneyoung Lee
65c4cb9e7b [ValueTracking] Let getGuaranteedNonPoisonOp find multiple non-poison operands
This patch helps getGuaranteedNonPoisonOp find multiple non-poison operands.

Instead of special-casing llvm.assume, I think it is also a viable option to
add noundef to Intrinsics.td. If it makes sense, I'll make a patch for that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86477
2020-08-26 04:40:21 +09:00
Juneyoung Lee
e44db8bec4 [ValueTracking] Add a noundef test for D86477; NFC 2020-08-26 04:40:21 +09:00
Arthur Eubanks
de0524e48b [test] Add -inject-tli-mapping to -loop-vectorize -vector-library tests
The legacy LoopVectorize has a dependency on InjectTLIMappingsLegacy.
That cannot be expressed in the new PM since they are both normal
passes. Explicitly add -inject-tli-mappings as a pass.

Follow-up to https://reviews.llvm.org/D86492.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86561
2020-08-25 11:55:11 -07:00
Arthur Eubanks
b28c9e3931 [NewPM][test] Fix accelerate-vector-functions.ll under NPM
The legacy SLPVectorizer has a dependency on InjectTLIMappingsLegacy.
That cannot be expressed in the new PM since they are both normal
passes. Explicitly add -inject-tli-mappings as a pass.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86492
2020-08-25 10:50:14 -07:00
Sanjay Patel
66453001f1 [InstCombine] improve demanded element analysis for vector insert-of-extract (2nd try)
The 1st attempt (rG557b890) was reverted because it caused miscompiles.
That bug is avoided here by changing the order of folds and as verified
in the new tests.

Original commit message:
InstCombine currently has odd rules for folding insert-extract chains to shuffles,
so we miss collapsing seemingly simple cases as shown in the tests here.

But poison makes this not quite as easy as we might have guessed. Alive2 tests to
show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)

SLP tends to create these patterns (as shown in the SLP tests), and this could
help with solving PR16739.

Differential Revision: https://reviews.llvm.org/D86460
2020-08-25 11:19:36 -04:00
Sanjay Patel
c2dd03cd0f [InstCombine] add vector demanded elements tests with shuffles; NFC
The 1st draft of D86460 (reverted) would show miscompiles with these tests
because the undef element tracking went wrong and became visible in the
shuffle masks.
2020-08-25 11:19:35 -04:00
Sjoerd Meijer
1cd139275c [LV] get.active.lane.mask consuming tripcount instead of backedge-taken count
This adapts LV to the new semantics of get.active.lane.mask as discussed in
D86147, which means that the LV now emits intrinsic get.active.lane.mask with
the loop tripcount instead of the backedge-taken count as its second argument.
The motivation for this is described in D86147.

Differential Revision: https://reviews.llvm.org/D86304
2020-08-25 13:49:19 +01:00
Sam Parker
50697e16b0 [NFC][SimplifyCFG] More tests for Arm 2020-08-25 12:13:48 +01:00
Sam Parker
2ad592033d [NFC][SimplifyCFG] Add some more tests for Arm. 2020-08-25 11:44:17 +01:00
Roman Lebedev
8908aecdf3 [NFC][InstCombine] Tests for PHI-of-extractvalues
Much like with it's sibling fold HI-of-insertvalues,
it appears to be much more worthwhile than it would seem.
2020-08-25 13:01:07 +03:00
Benjamin Kramer
cc40144ebc Revert "[InstCombine] improve demanded element analysis for vector insert-of-extract"
This reverts commit 557b890ff4f4dd5fa979c232df5b31cf3fef04c1. Causing
miscompiles, test case is on llvm-commits.
2020-08-25 11:31:31 +02:00
David Sherwood
82c9874179 [SVE] Fix TypeSize related warnings with IR truncates of scalable vectors
In getCastInstrCost when the instruction is a truncate we were relying
upon the implicit TypeSize -> uint64_t cast when asking if a given type
has the same size as a legal integer. I've changed the code to only
ask the question if the type is fixed length.

I have also changed InstCombinerImpl::SimplifyDemandedUseBits to bail
out for now if the type is a scalable vector.

I've added the following new tests:

  Analysis/CostModel/AArch64/sve-trunc.ll
  Transforms/InstCombine/AArch64/sve-trunc.ll

for both of these fixes.

Differential revision: https://reviews.llvm.org/D86432
2020-08-25 09:17:56 +01:00
Roman Lebedev
ed8ecc651f [InstCombine] PHI-of-insertvalues -> insertvalue-of-PHI's
As per statistic, this happens pretty exceedingly rare,
but i have seen it in exactly the situations the
Phi-aware aggregate reconstruction would have handled,
eventually, and allowed invoke -> call fold later on.

So while this might be something that other fold
will have to learn about, i believe we should be
doing this transform in general.

Here, we are okay with adding two PHI's to get both the base aggregate,
and the inserted value. I'm not sure it makes much sense to restrict
it to a single phi (to just the inserted value?), because originally
we'd be receiving the final aggregate already..

llvm test-suite + RawSpeed:
```
| statistic name                             | baseline  | proposed  |    Δ |      % | \|%\| |
|--------------------------------------------|-----------|-----------|-----:|-------:|------:|
| instcombine.NumPHIsOfInsertValues          | 0         | 12        |  12  |  0.00% | 0.00% |
| asm-printer.EmittedInsts                   | 8926643   | 8926595   | -48  |  0.00% | 0.00% |
| instcombine.NumCombined                    | 3846614   | 3846640   |  26  |  0.00% | 0.00% |
| instcombine.NumConstProp                   | 24302     | 24293     |  -9  | -0.04% | 0.04% |
| instcombine.NumDeadInst                    | 1620140   | 1620112   | -28  |  0.00% | 0.00% |
| instcount.NumBrInst                        | 898466    | 898464    |  -2  |  0.00% | 0.00% |
| instcount.NumCallInst                      | 1760819   | 1760875   |  56  |  0.00% | 0.00% |
| instcount.NumExtractValueInst              | 45659     | 45649     | -10  | -0.02% | 0.02% |
| instcount.NumInsertValueInst               | 4991      | 4981      | -10  | -0.20% | 0.20% |
| instcount.NumIntToPtrInst                  | 27084     | 27087     |   3  |  0.01% | 0.01% |
| instcount.NumPHIInst                       | 371435    | 371429    |  -6  |  0.00% | 0.00% |
| instcount.NumStoreInst                     | 906011    | 906019    |   8  |  0.00% | 0.00% |
| instcount.TotalBlocks                      | 1105520   | 1105518   |  -2  |  0.00% | 0.00% |
| instcount.TotalInsts                       | 9795737   | 9795776   |  39  |  0.00% | 0.00% |
| simplifycfg.NumInvokes                     | 2784      | 2786      |   2  |  0.07% | 0.07% |
| simplifycfg.NumSimpl                       | 1001840   | 1001850   |  10  |  0.00% | 0.00% |
| simplifycfg.NumSinkCommonInstrs            | 15174     | 15170     |  -4  | -0.03% | 0.03% |
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86306
2020-08-25 10:38:11 +03:00
Mircea Trofin
9c71d4e1d1 [MLInliner] Support training that doesn't require partial rewards
If we use training algorithms that don't need partial rewards, we don't
need to worry about an ir2native model. In that case, training logs
won't contain a 'delta_size' feature either (since that's the partial
reward).

Differential Revision: https://reviews.llvm.org/D86481
2020-08-24 17:36:29 -07:00
Sanjay Patel
8f9cb71b9c [InstCombine] improve demanded element analysis for vector insert-of-extract
InstCombine currently has odd rules for folding insert-extract chains to shuffles,
so we miss collapsing seemingly simple cases as shown in the tests here.

But poison makes this not quite as easy as we might have guessed. Alive2 tests to
show the subtle difference (similar to the regression tests):
https://alive2.llvm.org/ce/z/hp4hv3 (this is ok)
https://alive2.llvm.org/ce/z/ehEWaN (poison leakage)

SLP tends to create these patterns (as shown in the SLP tests), and this could
help with solving PR16739.

Differential Revision: https://reviews.llvm.org/D86460
2020-08-24 17:00:16 -04:00
Sanjay Patel
663055a339 [SLP] avoid 'tmp' names in regression tests; NFC
That can cause problems for update_test_checks.py (it warns when updating this file).
2020-08-24 17:00:16 -04:00
Sanjay Patel
07008dabee [InstCombine] add tests for insert+extract demanded elements; NFC 2020-08-24 17:00:16 -04:00
Bjorn Pettersson
d32d86bc38 [Scalarizer] Avoid updating the name of globals
The "takeName" logic at the end of ScalarizerVisitor::finish
could end up renaming global variables when having simplified
and extractelement instruction to simply pick a single vector
element. If the input vector to the extractelement instruction
held pointers to global variables we ended up renaming the global
variable.
The patch make sure we only take the name of the replaced Op when
we have added new instructions that might need a useful name.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D86472
2020-08-24 21:55:03 +02:00
Roman Lebedev
c9233e6b8a [NFC][InstCombine] Multi-level aggregate test for phi-of-insertvalue pattern
See https://reviews.llvm.org/D86306
2020-08-24 22:39:34 +03:00
Fangrui Song
9e7d7f3f68 Revert D85812 "[coroutine] should disable inline before calling coro split"
This reverts commit 2e43acfed89b1903de473f682c65878bdebc395a.

LLVMCoroutines (the library which contains Coroutines.h) depends on LLVMipo (the
library which contains SampleProfile.cpp). It is inappropriate for
SampleProfile.cpp to depent on Coroutines.h (circular dependency).

The test inverted dependencies as well:
llvm/test/Transforms/Coroutines/coro-inline.ll uses -sample-profile.
2020-08-24 11:41:05 -07:00
dongAxis
7a35eee5d4 [coroutine] should disable inline before calling coro split
summary:
When callee coroutine function is inlined into caller coroutine
function before coro-split pass, llvm will emits "coroutine should
have exactly one defining @llvm.coro.begin". It seems that coro-early
pass can not handle this quiet well.
So we believe that unsplited coroutine function should not be inlined.
This patch fix such issue by not inlining function if it has attribute
"coroutine.presplit" (it means the function has not been splited) to
fix this issue

TestPlan: check-llvm

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D85812
2020-08-24 22:22:08 +08:00
Florian Hahn
726de666d8 [DSE,MemorySSA] Delay PointerMayBeCaptured calls until actually needed.
Avoid computing InvisibleToCallerBefore/AfterRet up front. In most
cases, this information is not really needed. Instead, introduce helper
functions to compute and cache the result on demand.

Notably, this also does not use PointerMayBeCapturedBefore for
isInvisibleToCallerBeforeRet, as it requires the killing MemoryDef as
starting instruction, making the caching ineffective. But it appears the
use of PointerMayBeCapturedBefore has very limited benefits in practice
(e.g. on SPEC2000/SPEC2006/MultiSource there are no binary changes with
-O3 -flto). Refrain from using it for now, to limit-compile-time.

This gives some nice compile-time improvements:
http://llvm-compile-time-tracker.com/compare.php?from=db9345f6810f379a36752dc52caf5230585d0ebd&to=b4d091047e1b8a3d377d200137b79d03aca65663&stat=instructions
2020-08-24 14:05:44 +01:00
Anna Welker
1f8e3db230 [ARM][MVE] Allow tail predication for strides !=1 with gather/scatters
If gather/scatters are enabled, ARMTargetTransformInfo now allows
tail predication for loops with a much wider range of strides, up
to anything that is loop invariant.

Differential Revision: https://reviews.llvm.org/D85410
2020-08-24 13:54:47 +01:00
Florian Hahn
8ba75d1814 [DSE,MemorySSA] Regnerate some check lines.
The check lines where generated before align was added for all
instructions. Re-generate them, to reduce diff noise for actual
functional changes.
2020-08-24 13:24:44 +01:00
Florian Hahn
fd197bfffa [DSE,MemorySSA] Limit elimination at end of function to single UO.
Limit elimination of stores at the end of a function to MemoryDefs with
a single underlying object, to save compile time.

In practice, the case with multiple underlying objects seems not very
important in practice. For -O3 -flto on MultiSource/SPEC2000/SPEC2006
this results in a total of 2 more stores being eliminated.

We can always re-visit that in the future.
2020-08-24 13:00:17 +01:00
Sanjay Patel
8e77949af5 [InstCombine] fold abs of select with negated op (PR39474)
Similar to the existing transform - peek through a select
to match a value and its negation.

https://alive2.llvm.org/ce/z/MXi5KG

  define i8 @src(i1 %b, i8 %x) {
  %0:
    %neg = sub i8 0, %x
    %sel = select i1 %b, i8 %x, i8 %neg
    %abs = abs i8 %sel, 1
    ret i8 %abs
  }
  =>
  define i8 @tgt(i1 %b, i8 %x) {
  %0:
    %abs = abs i8 %x, 1
    ret i8 %abs
  }
  Transformation seems to be correct!
2020-08-24 07:37:55 -04:00
Sanjay Patel
ee2a844238 [InstCombine] add tests for abs of select with negated op; NFC (PR39474) 2020-08-24 07:37:54 -04:00
Roman Lebedev
6072801a3e [InstCombine] Negator: freeze is freely negatible if it's operand is negatible 2020-08-23 23:28:19 +03:00
Roman Lebedev
324948b801 [NFC][InstCombine] Add tests for negation of freeze 2020-08-23 23:28:19 +03:00
Sanjay Patel
8998944672 [InstCombine] canonicalize 'not' ops before logical shifts
This reverses the existing transform that would uniformly canonicalize any 'xor' after any shift. In the case of logical shifts, that turns a 'not' into an arbitrary 'xor' with constant, and that's probably not as good for analysis, SCEV, or codegen.

The SCEV motivating case is discussed in:
http://bugs.llvm.org/PR47136

There's an analysis motivating case at:
http://bugs.llvm.org/PR38781

I did draft a patch that would do the same for 'ashr' but that's questionable because it's just swapping the position of a 'not' and uncovers at least 2 missing folds that we would probably need to deal with as preliminary steps.

Alive proofs:
https://rise4fun.com/Alive/BBV

  Name: shift right of 'not'
  Pre: C2 == (-1 u>> C1)
  %a = lshr i8 %x, C1
  %r = xor i8 %a, C2
  =>
  %n = xor i8 %x, -1
  %r = lshr i8 %n, C1

  Name: shift left of 'not'
  Pre: C2 == (-1 << C1)
  %a = shl i8 %x, C1
  %r = xor i8 %a, C2
  =>
  %n = xor i8 %x, -1
  %r = shl i8 %n, C1

  Name: ashr of 'not'
  %a = ashr i8 %x, C1
  %r = xor i8 %a, -1
  =>
  %n = xor i8 %x, -1
  %r = ashr i8 %n, C1

Differential Revision: https://reviews.llvm.org/D86243
2020-08-22 09:38:13 -04:00
Fangrui Song
bd670142b0 [Attributor][test] Add REQUIRES: asserts after D86129 2020-08-21 16:20:41 -07:00
Roman Lebedev
29a87631f2 Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"
As disscussed in post-commit review starting with
	https://reviews.llvm.org/D84108#2227365
while this appears to be mostly a win overall, especially code-size-wise,
this appears to shake //certain// code pattens in a way that is extremely
unfavorable for performance (+30% runtime regression)
on certain CPU's (i personally can't reproduce).

So until the behaviour is better understood, and a path forward is mapped,
let's back this out for now.

This reverts commit 1d51dc38d89bd33fb8874e242ab87b265b4dec1c.
2020-08-22 00:33:22 +03:00
Arthur Eubanks
5e4555a20b [opt][NewPM] Add basic-aa in legacy PM compatibility mode
The legacy PM alias analysis pipeline by default includes basic-aa.
When running `opt -foo-pass` under the NPM and -disable-basic-aa is not
specified, use basic-aa.

This decreases the number of check-llvm failures under NPM from 913 to 752.

Reviewed By: ychen, asbirlea

Differential Revision: https://reviews.llvm.org/D86167
2020-08-21 14:05:07 -07:00
kuterd
bc2c6ef6db [Attributor] Function seed allow list
-  Adds a command line option to seed only selected functions.
  - Makes seed allow listing exclusive to assertions enabled builds.

Reviewed By: sstefan1

Differential Revision: https://reviews.llvm.org/D86129
2020-08-21 23:55:26 +03:00
Shinji Okumura
a7f07cdc6b [Attributor] fix AANoUndef initialization
Currently, `AANoUndefImpl::initialize` mistakenly always indicates optimistic fixpoint for function returned position.
 This is because an associated value is `Function` in the case, and `isGuaranteedNotToBeUndefOrPoison` returns true for Function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86361
2020-08-22 05:06:14 +09:00
Serguei Katkov
facbe0803e [InstCombine] Remove unused entries in gc-live bundle of statepoint
If some of gc live value are not used in gc.relocate we can remove them
from gc-live bundle of statepoint instruction.

Also the CL removes duplicated Values in gc-live bundle.

Reviewers: reames, dantrushin
Reviewed By: dantrushin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D85959
2020-08-22 01:36:22 +07:00
Florian Hahn
5440177a2e [LoopIdiom,LSR] Add additional tests for SCEVExpander cleanups. 2020-08-21 13:48:31 +01:00
Sam Parker
25faf23408 [NFC] Add SimplifyCFG for ARM
Add some phi elimination threshold testing.
2020-08-21 11:52:31 +01:00
Mirko Brkusanin
08706e7bce [AMDGPU] Reorganize GCN subtarget features for unaligned access
Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine
whether hardware supports such access.
UnalignedAccessMode should be used to enable them.
hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be
now used to quickly check both.

Differential Revision: https://reviews.llvm.org/D84522
2020-08-21 12:26:31 +02:00
Mirko Brkusanin
49f2d14543 [AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores
Adjust alignment requirements for ds_read/write_b96/b128.
GFX9 and onwards allow misaligned access for reads and writes but only if
SH_MEM_CONFIG.alignment_mode allows it.
UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we
can relax alignment requirements.
UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions
but only from GFX9 onward and is supposed to match alignment_mode. By default
alignment of 4 is required.

Differential Revision: https://reviews.llvm.org/D82788
2020-08-21 12:26:31 +02:00
Florian Hahn
48a439a77a [DSE,MemorySSA] Handle atomicrmw/cmpxchg conservatively.
This adds conservative handling of AtomicRMW/AtomicCmpXChg to
isDSEBarrier, similar to atomic loads and stores.
2020-08-21 10:42:42 +01:00
Florian Hahn
20d85a73f8 [DSE,MemorySSA] Regenerate check lines for atomic.ll tests. 2020-08-21 10:18:06 +01:00
sstefan1
8f1b61f465 [Attributor][NFC] run update_test_checks with --check-attributes. 2020-08-21 11:12:41 +02:00
Sam Parker
76932d3b0f [SimplifyCFG] Cost required selects
Before we speculatively execute a basic block, query the cost of
inserting the necessary select instructions against the phi folding
threshold. For non-trivial insertions, a more accurate decision can
probably be made during machine if-conversion. With minsize we query
the CodeSize cost, otherwise we use SizeAndLatency.

Differential Revision: https://reviews.llvm.org/D82438
2020-08-21 09:52:52 +01:00
David Green
53fac1f9ad [ARM][LV] Add a preferPredicatedReductionSelect target hook
As part of D84741, this adds a target hook for the
preferPredicatedReductionSelect option and makes use
of it under MVE, allowing us to tail predicate most
reduction loops.

Differential Revision: https://reviews.llvm.org/D85980
2020-08-21 08:48:12 +01:00
Roman Lebedev
c0a69dfec4 [NFC][InstCombine] Tests for PHI-of-insertvalue's
Currently we don't do anything about these,
neither in InstCombine, nor in SimplifyCFG's sinking.
These happen exceedingly rarely, but i've seen them in the cases where
PHI-aware aggregate reconstruction would have fired if not for them.
2020-08-20 20:16:31 +03:00