Commit Graph

203208 Commits

Author SHA1 Message Date
Juneyoung Lee
55591e689c [ValueTracking] isKnownNonZero, computeKnownBits for freeze
This implements support for isKnownNonZero, computeKnownBits when freeze is involved.

```
  br (x != 0), BB1, BB2
BB1:
  y = freeze x
```

In the above program, we can say that y is non-zero. The reason is as follows:

(1) If x was poison, `br (x != 0)` raised UB
(2) If x was fully undef, the branch again raised UB
(3) If x was non-zero partially undef, say `undef | 1`, `freeze x` will return a nondeterministic value which is also non-zero.
(4) If x was just a concrete value, it is trivial

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D75808
2020-09-10 08:07:38 +09:00
dfukalov
272bea1511 [AMDGPU] Fix for folding v2.16 literals.
It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are
incorrectly processed so one of two packed values were lost.

Introduced new function to check immediate 32-bit operand can be folded.
Converted condition about current op_sel flags value to fall-through.

Fixes: SWDEV-247595

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87158
2020-09-10 01:39:25 +03:00
Krzysztof Parzyszek
6bc3ff08f3 Mark masked.{store,scatter,compressstore} intrinsics as write-only 2020-09-09 17:28:21 -05:00
Jessica Paquette
4b96a7282b [AArch64][GlobalISel] Share address mode selection code for memops
We were missing support for the G_ADD_LOW + ADRP folding optimization in the
manual selection code for G_LOAD, G_STORE, and G_ZEXTLOAD.

As a result, we were missing cases like this:

```
@foo = external hidden global i32*
define void @baz(i32* %0) {
store i32* %0, i32** @foo
ret void
}
```

https://godbolt.org/z/16r7ad

This functionality already existed in the addressing mode functions for the
importer. So, this patch makes the manual selection code use
`selectAddrModeIndexed` rather than duplicating work.

This is a 0.2% geomean code size improvement for CTMark at -O3.

There is one code size increase (0.1% on lencod) which is likely because
`selectAddrModeIndexed` doesn't look through constants.

Differential Revision: https://reviews.llvm.org/D87397
2020-09-09 15:14:46 -07:00
Florian Hahn
4ab31e1cd4 [DSE,MemorySSA] Handle atomic stores explicitly in isReadClobber.
Atomic stores are modeled as MemoryDef to model the fact that they may
not be reordered, depending on the ordering constraints.

Atomic stores that are monotonic or weaker do not limit re-ordering, so
we do not have to treat them as potential read clobbers.

Note that llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll
already contains a set of negative test cases.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87386
2020-09-09 23:01:58 +01:00
Nikita Popov
5865bd4cc8 [DAGCombiner] Fold fmin/fmax of NaN
fminnum(X, NaN) is X, fminimum(X, NaN) is NaN. This mirrors the
behavior of existing InstSimplify folds.

This is expected to improve the reduction lowerings in D87391,
which use NaN as a neutral element.

Differential Revision: https://reviews.llvm.org/D87415
2020-09-09 23:53:32 +02:00
Nikita Popov
65570213eb [ARM] Add additional fmin/fmax with nan tests (NFC)
Adding these to ARM which has both FMINNUM and FMINIMUM.
2020-09-09 23:53:32 +02:00
Amara Emerson
d476bfb259 Add REQUIRES: asserts to a test that uses an asserts only flag. 2020-09-09 14:31:12 -07:00
Amara Emerson
fcbefce153 [GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator.
We weren't using this before, so none of the MachineFunction CFG edges had the
branch probability information added. As a result, block placement later in the
pipeline was flying blind.

This is enabled only with optimizations enabled like SelectionDAG.

Differential Revision: https://reviews.llvm.org/D86824
2020-09-09 14:31:12 -07:00
Nikita Popov
31aa934ebf [X86] Add tests for minnum/maxnum with constant NaN (NFC) 2020-09-09 22:36:51 +02:00
Amara Emerson
6f86f1afef [GlobalISel][IRTranslator] Generate better conditional branch lowering.
This is a port of the functionality from SelectionDAG, which tries to find
a tree of conditions from compares that are then combined using OR or AND,
before using that result as the input to a branch. Instead of naively
lowering the code as is, this change converts that into a sequence of
conditional branches on the sub-expressions of the tree.

Like SelectionDAG, we re-use the case block codegen functionality from
the switch lowering utils, which causes us to generate some different code.
The result of which I've tried to mitigate in earlier combine patches.

Differential Revision: https://reviews.llvm.org/D86665
2020-09-09 13:16:11 -07:00
Amara Emerson
a7636dc8f8 [GlobalISel] Rewrite the elide-br-by-swapping-icmp-ops combine to do less.
This combine previously tried to take sequences like:
  %cond = G_ICMP pred, a, b
  G_BRCOND %cond, %truebb
  G_BR %falsebb
%truebb:
  ...
%falsebb:
  ...

and by inverting the compare predicate and swapping branch targets, delete the
G_BR and instead have a single conditional branch to the falsebb. Since in an
earlier patch we have a combine to fold not(icmp) into just an inverted icmp,
we don't need this combine to do as much. This patch instead generalizes the
combine by just looking for:
  G_BRCOND %cond, %truebb
  G_BR %falsebb
%truebb:
  ...
%falsebb:
  ...

and then inverting the condition using a not (xor). The xor can be folded away
in a separate combine. This change also lets us avoid some optimization code
in the IRTranslator.

I also think that deleting G_BRs in the combiner is unnecessary. That's
something that targets can decide to do at selection time and could simplify
generic code in future.

Differential Revision: https://reviews.llvm.org/D86664
2020-09-09 13:08:16 -07:00
Hiroshi Yamauchi
dc79f6327a [X86] Add support for using fast short rep mov for memcpy lowering.
Disabled by default behind an option.

Differential Revision: https://reviews.llvm.org/D86883
2020-09-09 12:46:40 -07:00
Tony
06c11c6b12 [AMDGPU] Correct gfx1031 XNACK setting documentation
- gfx1031 does not support XNACK.

Differential Revision: https://reviews.llvm.org/D87198
2020-09-09 19:43:02 +00:00
Jian Cai
cec86ef133 [MC] Resolve the difference of symbols in consecutive MCDataFragements
Try to resolve the difference of two symbols in consecutive MCDataFragments.
This is important for an idiom like "foo:instr; .if . - foo; instr; .endif"
(https://bugs.llvm.org/show_bug.cgi?id=43795).

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D69411
2020-09-09 12:35:43 -07:00
Fangrui Song
cc25215e10 [gcov] Give the __llvm_gcov_ctr load instruction a name for more readable output 2020-09-09 12:34:43 -07:00
Krzysztof Parzyszek
90ea855f41 [Hexagon] Account for truncating pairs to non-pairs when widening truncates
Added missing selection patterns for vpackl.
2020-09-09 14:31:52 -05:00
Sanjay Patel
09e9f6b632 [InstCombine] add tests for add/sub-of-shl; NFC 2020-09-09 15:29:08 -04:00
Fangrui Song
ce51bde5c3 [gcov] Don't split entry block; add a synthetic entry block instead
The entry block is split at the first instruction where `shouldKeepInEntry`
returns false. The created basic block has a br jumping to the original entry
block. The new basic block causes the function label line and the other entry
block lines to be covered by different basic blocks, which can affect line
counts with special control flows (fork/exec in the entry block requires
heuristics in llvm-cov gcov to get consistent line counts).

  int main() { // BB0
    return 0;  // BB2 (due to entry block splitting)
  }
  // BB1 is the exit block (since gcov 4.8)

This patch adds a synthetic entry block (like PGOInstrumentation and GCC) and
inserts an edge from the synthetic entry block to the original entry block. We
can thus remove the tricky `shouldKeepInEntry` and entry block splitting. The
number of basic blocks does not change, but the emitted .gcno files will be
smaller because we can save one GCOV_TAG_LINES tag.

  // BB0 is the synthetic entry block with a single edge to BB2
  int main() { // BB2
    return 0;  // BB2
  }
  // BB1 is the exit block (since gcov 4.8)
2020-09-09 12:25:24 -07:00
Guillaume Chatelet
21ae6eda63 [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD)
This is preparatory work to unable storing alignment for AtomicCmpXchgInst.
See D83136 for context and bug: https://bugs.llvm.org/show_bug.cgi?id=27168

This is the fixed version of D83375, which was submitted and reverted.

Differential Revision: https://reviews.llvm.org/D87373
2020-09-09 19:10:30 +00:00
Mark de Wever
63ced1a0ad Implements [[likely]] and [[unlikely]] in IfStmt.
This is the initial part of the implementation of the C++20 likelihood
attributes. It handles the attributes in an if statement.

Differential Revision: https://reviews.llvm.org/D85091
2020-09-09 20:48:37 +02:00
Krzysztof Parzyszek
c8c11e1378 [DSE] Explicitly not use MSSA in testcase for now
It fails for some reason, but it shouldn't stop switching to MSSA in DSE.
2020-09-09 13:45:55 -05:00
Krzysztof Parzyszek
3d66ccadfc [DSE] Handle masked stores 2020-09-09 13:31:31 -05:00
Johannes Doerfert
a45efb6cec Revert "[Attributor] Re-enable a run line in noalias.ll"
The underlying issue is still there, just hides on most systems, even
some Windows builds :(

See:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/25479/steps/test-check-all/logs/FAIL%3A%20LLVM%3A%3Anoalias.ll

This reverts commit 2600c9e2efce1dc4c64870b00a45ae0082c685fc.
2020-09-09 13:28:22 -05:00
Ulrich Weigand
487a2ff693 [DAGCombine] Skip re-visiting EntryToken to avoid compile time explosion
During the main DAGCombine loop, whenever a node gets replaced, the new
node and all its users are pushed onto the worklist.  Omit this if the
new node is the EntryToken (e.g. if a store managed to get optimized
out), because re-visiting the EntryToken and its users will not uncover
any additional opportunities, but there may be a large number of such
users, potentially causing compile time explosion.

This compile time explosion showed up in particular when building the
SingleSource/UnitTests/matrix-types-spec.cpp test-suite case on any
platform without SIMD vector support.

Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86963
2020-09-09 19:13:46 +02:00
Mircea Trofin
7134ae5a9c [NFC][MLInliner] Don't initialize in an assert.
Since the build bots have assertions enabled, this flew under the radar.
2020-09-09 09:56:07 -07:00
Simon Pilgrim
82fd96da09 X86CallFrameOptimization.cpp - use const references where possible. NFCI. 2020-09-09 16:35:08 +01:00
Krzysztof Parzyszek
d45cf136d8 [DSE] Add testcase that uses masked loads and stores 2020-09-09 10:30:32 -05:00
Simon Pilgrim
a943063683 X86FrameLowering::adjustStackWithPops - cleanup auto usage. NFCI.
Don't use auto for non-obvious types, and use const references.
2020-09-09 16:15:02 +01:00
Qiu Chaofan
6053db5a2e [PowerPC] Fix STRICT_FRINT/STRICT_FNEARBYINT lowering
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.

One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87220
2020-09-09 22:40:58 +08:00
Jay Foad
d81eb83109 [AMDGPU] Simplify S_SETREG_B32 case in EmitInstrWithCustomInserter
NFC.
2020-09-09 15:18:31 +01:00
Dmitry Preobrazhensky
ba5e71a51e [AMDGPU][MC] Improved diagnostic messages for invalid registers
Corrected parser to issue meaningful error messages for invalid and malformed registers.

See bug 41303: https://bugs.llvm.org/show_bug.cgi?id=41303

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D87234
2020-09-09 16:44:03 +03:00
Alon Kom
183a8b689c [MachinePipeliner] Fix II_setByPragma initialization
II_setByPragma was not reset between 2 calls of the MachinePipleiner pass

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D87088
2020-09-09 13:38:35 +00:00
Denis Antrushin
a5e6ed283a [Statepoints] Update DAG root after emitting statepoint.
Since we always generate CopyToRegs for statepoint results,
we must update DAG root after emitting statepoint, so that
these copies are scheduled before any possible local uses.
Note: getControlRoot() flushes all PendingExports, not only
those we generates for relocates. If that'll become a problem,
we can change it to flushing relocate exports only.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D87251
2020-09-09 20:22:10 +07:00
Simon Pilgrim
a1a1eb4962 CommandLine.h - use auto const reference in ValuesClass::apply for range loop. NFCI. 2020-09-09 14:21:14 +01:00
Ronak Chauhan
7073a2ba14 Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors"
This reverts commit 487a80531006add8102d50dbcce4b6fd729ab1f6.

Tests fail on big endian machines.
2020-09-09 18:01:28 +05:30
Simon Pilgrim
2c3a34bc05 [KnownBits] Move SelectionDAG::computeKnownBits ISD::ABS handling to KnownBits::abs
Move the ISD::ABS handling to a KnownBits::abs handler, to simplify future implementations in ValueTracking/GlobalISel.
2020-09-09 13:22:58 +01:00
Simon Pilgrim
558faf7f34 APInt.h - return directly from clearUnusedBits in single word cases. NFCI.
Consistently use the same pattern of returning *this from the clearUnusedBits() call to allow us to early out from the isSingleWord() path and avoid an else statement.
2020-09-09 13:22:57 +01:00
Xing GUO
a4b3197f72 [elf2yaml] Fix dumping a debug section whose name is not recognized.
If the debug section's name isn't recognized, it should be
dumped as a raw content section.

Reviewed By: jhenderson, grimar

Differential Revision: https://reviews.llvm.org/D87346
2020-09-09 20:07:05 +08:00
David Stenberg
7d4bb5d4ca [UnifyFunctionExitNodes] Fix Modified status for unreachable blocks
If a function had at most one return block, the pass would return false
regardless if an unified unreachable block was created.

This patch fixes that by refactoring runOnFunction into two separate
helper functions for handling the unreachable blocks respectively the
return blocks, as suggested by @bjope in a review comment.

This was caught using the check introduced by D80916.

Reviewed By: serge-sans-paille

Differential Revision: https://reviews.llvm.org/D85818
2020-09-09 13:36:03 +02:00
Juneyoung Lee
624f3d3053 [BuildLibCalls] Add more noundef to library functions
This patch follows D85345 and adds more noundef attributes to return values/arguments of library functions
that are mostly about accessing the file system or processes.

A few functions like `chmod` or `times` use typedef `mode_t` and `clock_t`.
They are neither struct nor union, so they cannot contain undef even if they're lowered to iN in IR. So, it is fine to add noundef to them.

- clock_t's actual type is size_t (C17, 7.27.1.3), so it isn't struct or union.

- For mode_t, either int or long is used in practice because programmers use bit manipulation. So, I think it is okay that it's never aggregate in practice.

After this patch, the remaining library functions are those that eagerly participate in optimizations: they can be removed, reordered, or
introduced by a transformation from primitive IR operations.
For them, a few testings is needed, since it may not be valid to add noundef anymore even if C standard says it's okay.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85894
2020-09-09 20:33:35 +09:00
Juneyoung Lee
bd9d25252a [ValueTracking] Add UndefOrPoison/Poison-only version of relevant functions
This patch adds isGuaranteedNotToBePoison and programUndefinedIfUndefOrPoison.

isGuaranteedNotToBePoison will be used at D75808. The latter function is used at isGuaranteedNotToBePoison.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84242
2020-09-09 20:00:26 +09:00
Simon Pilgrim
3a3a752c6d TrigramIndex.cpp - remove unnecessary includes. NFCI.
TrigramIndex.h already includes most of these.
2020-09-09 11:38:31 +01:00
Simon Pilgrim
a2525d173c ARMTargetParser.cpp - use auto const references in for range loops. NFCI.
Fix static analysis warnings about unnecessary copies.
2020-09-09 11:38:31 +01:00
Simon Pilgrim
095c0be790 [APFloat] Fix uninitialized variable in IEEEFloat constructors
Some constructors of IEEEFloat do not initialize member variable exponent.
Fix it by initializing exponent with the following values:

For NaNs, the `exponent` is `maxExponent+1`.
For Infinities, the `exponent` is `maxExponent+1`.
For Zeroes, the `exponent` is `maxExponent-1`.

Patch by: @nullptr.cpp (Yang Fan)

Differential Revision: https://reviews.llvm.org/D86997
2020-09-09 11:38:30 +01:00
Florian Hahn
97c9619e22 [DomTree] Use SmallVector<DomTreeNodeBase *, 4> instead of std::vector.
Currentl DomTreeNodeBase is using std::vectot to store it's children.
Using SmallVector should be more efficient in terms of compile-time.

A size of 4 seems to be the sweet-spot in terms of compile-time,
according to

http://llvm-compile-time-tracker.com/compare.php?from=9933188c90615c9c264ebb69117f09726e909a25&to=d7a801d027648877b20f0e00e822a7a64c58d976&stat=instructions

This results in the following geomean improvements

```
                       geomean insts     max rss
O3                          -0.31 %       +0.02 %
ReleaseThinLTO              -0.35 %       -0.12 %
ReleaseLTO                  -0.28 %       -0.12 %
O0                          -0.06 %       -0.02 %
NewPM O3                    -0.36 %       +0.05 %
ReleaseThinLTO (link only)  -0.44 %       -0.10 %
ReleaseLTO-g (link only):   -0.32 %       -0.03 %
```

I am not sure if there's any other benefits of using std::vector over
SmallVector.

Reviewed By: kuhar, asbirlea

Differential Revision: https://reviews.llvm.org/D87319
2020-09-09 11:20:13 +01:00
Sjoerd Meijer
0297a27d72 [ARM] Fixup of a few test cases. NFC.
After changing the semantics of get.active.lane.mask, I missed a few tests
that should use now the tripcount instead of the backedge taken count.
2020-09-09 11:14:44 +01:00
Mirko Brkusanin
049222554b [AMDGPU] Workaround for LDS Misalignment bug on GFX10
Add subtarget feature check to avoid using ds_read/write_b96/128 with too
low alignment if a bug is present on that specific hardware.
Add this "feature" to GFX 10.1.1 as it is also affected.
Add global-isel test.
2020-09-09 11:46:09 +02:00
Max Kazantsev
97b38e1002 [Test] Add failing test for pr47457 2020-09-09 15:45:35 +07:00
Florian Hahn
1aaacc4182 [EarlyCSE] Explicitly require AAResultsWrapperPass.
The MemorySSAWrapperPass depends on AAResultsWrapperPass and if
MemorySSA is preserved but AAResultsWrapperPass is not, this could lead
to a crash when updating the last user of the MemorySSAWrapperPass.

Alternatively AAResultsWrapperPass could be marked preserved by GVN, but
I am not sure if that would be safe. I am not sure what is required in
order to preserve AAResultsWrapperPass. At the moment, it seems like a
couple of passes that do similar transforms to GVN are preserving it.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87137
2020-09-09 09:14:50 +01:00