Commit Graph

202877 Commits

Author SHA1 Message Date
Venkataramanan Kumar
36b1d49191 [InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X)
These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)":
1.0/sqrt(X) * X => X/sqrt(X)
X * 1.0/sqrt(X) => X/sqrt(X)

We already handle more general cases, and we are intentionally not creating extra (and likely expensive)
fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce
X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags.

Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X).

Differential Revision: https://reviews.llvm.org/D86726
2020-09-02 08:23:48 -04:00
Sanjay Patel
4e9822e551 [VectorCombine] allow vector loads with mismatched insert type
This is an enhancement to D81766 to allow loading the minimum target
vector type into an IR vector with a different number of elements.

In one of the motivating tests from PR16739, SLP creates <2 x float>
load ops mixed with <4 x float> insert ops, so we want to handle that
pattern in addition to potential oversized vectors created by the
vectorizers.

For now, we are assuming the insert/extract subvector with undef is
free because there is no exact corresponding TTI modeling for that.

Differential Revision: https://reviews.llvm.org/D86160
2020-09-02 08:11:36 -04:00
Max Kazantsev
260b1e427d [Test] Simplify test by removing unneeded variable 2020-09-02 18:39:43 +07:00
Paul Walker
aeb7dd335b [SVE] Don't reorder subvector/binop sequences when the resulting binop is not legal.
When lowering fixed length vector operations for SVE the subvector
operations are used extensively to marshall data between scalable
and fixed-length vectors. This means that sequences like:

  extract_subvec(binop(insert_subvec(a), insert_subvec(b)))

are very common. DAGCombine only checks if the resulting binop is
legal or can be custom lowered when undoing such sequences. When
it's custom lowering that is introducing them the result is an
infinite legalise->combine->legalise loop.

This patch extends the isOperationLegalOr... functions to include
a "LegalOnly" parameter to restrict the check to legal operations
only. Although isOperationLegal could be used it's common for
the affected code paths to be visited pre and post legalisation,
so the extra parameter keeps the code tidy.

Differential Revision: https://reviews.llvm.org/D86450
2020-09-02 11:01:33 +01:00
Jay Foad
02f664086c [AMDGPU] Fix offset for REL32_HI relocs
The addend in a REL32 reloc needs to be adjusted to account for the
offset from the PC value returned by the s_getpc instruction to the
point where the reloc is applied. This was being done correctly for
(GOTPC)REL32_LO but not for (GOTPC)REL32_HI. This will only make a
difference if the target symbol happens to get loaded almost exactly
a multiple of 4G away from the relocated instructions.

Differential Revision: https://reviews.llvm.org/D86938
2020-09-02 10:55:55 +01:00
Sander de Smalen
40471e4fe8 [AArch64][SVE] Preserve full vector regs over EH edge.
Unwinders may only preserve the lower 64bits of Neon and SVE registers,
as only the registers in the base ABI are guaranteed to be preserved
over the exception edge. The caller will need to preserve additional
registers for when the call throws an exception and the unwinder has
tried to recover state.

For  e.g.

    svint32_t bar(svint32_t);
    svint32_t foo(svint32_t x, bool *err) {
      try { bar(x); } catch (...) { *err = true; }
      return x;
    }

`z0` needs to be spilled before the call to `bar(x)` and reloaded before
returning from foo, as the exception handler may have clobbered z0.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84737
2020-09-02 10:54:18 +01:00
Igor Kudrin
7e0c7ec095 [DebugInfo] Emit a 1-byte value as a terminator of entries list in the name index.
As stated in section 6.1.1.2, DWARFv5, p. 142,
| The last entry for each name is followed by a zero byte that
| terminates the list. There may be gaps between the lists.

The patch changes emitting a 4-byte zero value to a 1-byte one, which
effectively removes the gap between entry lists, and thus saves
approximately 3 bytes per name; the calculation is not exact because
the total size of the table is aligned to 4.

Differential Revision: https://reviews.llvm.org/D86927
2020-09-02 16:12:39 +07:00
Igor Kudrin
ab5e60fa50 [DebugInfo] Remove Dwarf5AccelTableWriter::Header::UnitLength. NFC.
The member is not in use; the unit length for the table is emitted as
a difference between two labels. Moreover, the type of the member might
be misleading, because for DWARF64 the field should be 64 bit long.

Differential Revision: https://reviews.llvm.org/D86912
2020-09-02 16:11:45 +07:00
Martin Storsjö
a68f07462a [X86] Remove superfluous trailing semicolons, fixing warnings. NFC. 2020-09-02 11:43:27 +03:00
Simon Pilgrim
da3091b1cb [X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add general shuffle combining support
This patch uses partial DemandedElts masks to further simplify target shuffle chains and finally starts making target shuffle combining part of SimplifyDemandedBits/SimplifyDemandedVectorElts.

We already manage this for Depth == 0 cases, where combineX86ShuffleChain would early-out if the shuffle combined to the same op, but the patch generalizes this by manipulating the depth handling of combineX86ShufflesRecursively - calling with a new Depth = 0 and reducing the maximum shuffle combine depth accordingly.

Differential Revision: https://reviews.llvm.org/D66004
2020-09-02 09:24:46 +01:00
Shinji Okumura
ad9adda18c [Attributor] Make use of AANoUndef in AAUndefinedBehavior
This patch makes it possible for AAUB to use information from AANoUndef.
This is the next patch of D86983

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86984
2020-09-02 16:08:03 +09:00
Shinji Okumura
e194827980 [Attributor] Fix AANoUndef initialization
When the associated value is undef, we immediately forced to indicate a pessimistic fixpoint so far.
This patch changes the initialization to check the attribute given in IR at first and to indicate an optimistic fixpoint when it is given.
This change will enable us to catch , for example, the following case in AAUB.
```
call void @foo(i32 noundef undef)
```

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86983
2020-09-02 15:40:43 +09:00
Zi Xuan Wu
ed26c15759 [RFC][Target] Add a new triple called Triple::csky
Before upstream a new target called CSKY, make a new triple of that called Triple::csky.
For now, it's a 32-bit little endian target and the detail can be referred at D86269.

This is the split part of D86269, which add a new target called CSKY.

Differential Revision: https://reviews.llvm.org/D86505
2020-09-02 12:46:09 +08:00
Fangrui Song
a714230414 [CMake] Remove -Wl,-allow-shlib-undefined which was added in rL221530
In GNU ld, gold and LLD, --no-allow-shlib-undefined is the default when
linking an executable. The option disallows unresolved symbols in shared objects.
(gold and LLD catch fewer cases than GNU ld. See D57385 for details)
See D57569 why it is bad idea to use --allow-shlib-undefined for executables [a].

GNU ld traditionally copied DT_NEEDED entries transitively. This was
deemed not good, so GNU ld 2.22 defaulted to --no-copy-dt-needed-entries.
gold and LLD always behave like --no-copy-dt-needed-entries.
rL221530 added -Wl,-allow-shlib-undefined to make some old releases of GNU ld's
--no-copy-dt-needed-entries to actually work.

Due to [a] and [b], this patch drops -Wl,-allow-shlib-undefined.

[b]: In a -DBUILD_SHARED_LIBS=on build, `--as-needed --allow-shlib-undefined`
can unexpectedly suppress some .dynsym entries.  The issue can cause
mlir-cpu-runner to fail at runtime. Note, on Debian, gcc newer than (gcc-9-20190125-2) enable
--as-needed by default.
See https://sourceware.org/bugzilla/show_bug.cgi?id=26551 for a reduced example.

Reviewed By: mehdi_amini, echristo

Differential Revision: https://reviews.llvm.org/D86839
2020-09-01 21:13:45 -07:00
Lang Hames
65c420c8ff [ORC] Remove stray debugging output. 2020-09-01 20:53:49 -07:00
Alina Sbirlea
aa76d4b7d6 Fix build-bots.
BasicAA can be freed (and it is not recomputed).
2020-09-01 20:24:15 -07:00
Lang Hames
4a706c5ba3 [ORC] Add an early out for MachOPlatform's init-scraper plugin setup.
If there's no initializer symbol in the current MaterializationResponsibility
then bail out without installing JITLink passes: they're going to be no-ops
anyway.
2020-09-01 20:12:23 -07:00
Lang Hames
cbe0e339f3 [ORC] Fix MachOPlatform's synthetic symbol dependence registration.
A think-o in the existing code meant that dependencies were never registered.
This failure could lead to crashes rather than orderly error propagation if
initialization dependencies failed to materialize.

No test case: The bug was discovered in an out-of-tree code and requires
pathalogically misconfigured JIT to generate the original error that lead to
the crash.
2020-09-01 20:12:23 -07:00
Xing GUO
db9f7d8731 [DebugInfo] Simplify string table dumpers.
This patch adds a helper function DumpStrSection to simplify codes.
Besides, nonprintable chars in debug_str and debug_str.dwo sections
are printed as escaped chars.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D86918
2020-09-02 08:41:10 +08:00
Alina Sbirlea
d314cb9743 [MemCpyOptimizer] Preserve analyses and replace use of lambdas to get them.
Summary:
Analyses are preserved in MemCpyOptimizer.
Get analyses before running the pass and store the pointers, instead of
using lambdas and getting them every time on demand.

Reviewers: lenary, deadalnix, mehdi_amini, nikic, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74494
2020-09-01 17:35:40 -07:00
Jordan Rupprecht
c2af995048 [NFC] Fix unused var in release builds.
This was always unused, but the change in D86354 upgraded this to a compiler warning.
2020-09-01 16:38:24 -07:00
Varun Gandhi
99d53ed697 [ADT] Make Optional a literal type.
This allows returning Optional values from constexpr contexts.

Reviewed By: fhahn, dblaikie, rjmccall

Differential Revision: https://reviews.llvm.org/D86354
2020-09-01 16:13:40 -07:00
Amy Kwan
2cc81d44bf [PowerPC] Implement builtins for xvcvspbf16 and xvcvbf16spn
This patch adds the builtin implementation for the xvcvspbf16 and xvcvbf16spn
instructions.

Differential Revision: https://reviews.llvm.org/D86795
2020-09-01 17:16:43 -05:00
Sergej Jaskiewicz
1842e5f0f3 [llvm] [unittests] Fix failing test 'FileCollectorTest.addDirectory'
This fixes a regression in the test suite introduced by
fad75598d272b9a5591fb7d9b591cf00cdf5022c
2020-09-02 00:54:37 +03:00
Cameron McInally
7b182e9bab [SVE] Update INSERT_SUBVECTOR DAGCombine to use getVectorElementCount().
A small piece of the project to replace getVectorNumElements() with getVectorElementCount().

Differential Revision: https://reviews.llvm.org/D86894
2020-09-01 16:51:44 -05:00
Sergej Jaskiewicz
0975a7654a [llvm] [unittests] Remove temporary files after they're not needed
Some LLVM unit tests forget to clean up temporary files and
directories. Introduce RAII classes for cleaning them up.

Refactor the tests to use those classes.

Differential Revision: https://reviews.llvm.org/D83228
2020-09-02 00:34:44 +03:00
Amara Emerson
e10dc0379d Revert "Revert "[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)" (and dependent patch "Optimize away a Not feeding a brcond by using tbz instead of tbnz.")"
This reverts commit 8693ddc74371dedc742c9f3d3e4eda1da72c13ea.

Re-committing with the test requiring asserts.
2020-09-01 14:29:04 -07:00
Michael Kruse
292847821f [LangRef] Fix condition for when a loop is considered parallel.
The wording before this patch applies to llvm.mem.parallel_loop_access, not access groups.

Reviewed By: mppf, hfinkel

Differential Revision: https://reviews.llvm.org/D83781
2020-09-01 15:41:59 -05:00
Jordan Rupprecht
deb09864b8 Revert "[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)" (and dependent patch "Optimize away a Not feeding a brcond by using tbz instead of tbnz.")
This reverts commit 8ad8f484b63ca507417b58c9016d2761f2b1a1a8. It causes crashes when running `ninja check-llvm-codegen-aarch64-globalisel`, e.g.
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24132/steps/test-stage1-compiler/logs/stdio.
Note that the crash does not seem to reproduce in debug builds.

5ded4442520d3dbb1aa72e6fe03cddef8828c618 depends on this, so revert that too.
2020-09-01 13:31:57 -07:00
Michael Liao
192b060a21 [amdgpu] Run SROA after loop unrolling.
Summary: - There are promotable `alloca`s after loop unrolling.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, nikic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84252
2020-09-01 16:09:56 -04:00
Jordan Rupprecht
e72bcb7267 [NFC] Fix unused var in release build 2020-09-01 13:05:56 -07:00
Florian Hahn
2248875209 [Loads] Add canReplacePointersIfEqual helper.
This patch adds an initial, incomeplete and unsound implementation of
canReplacePointersIfEqual to check if a pointer value A can be replaced
by another pointer value B, that are deemed to be equivalent through
some means (e.g. information from conditions).

Note that is in general not sound to blindly replace pointers based on
equality, for example if they are based on different underlying objects.

LLVM's memory model is not completely settled as of now; see
https://bugs.llvm.org/show_bug.cgi?id=34548 for a more detailed
discussion.

The initial version of canReplacePointersIfEqual only rejects a very
specific case: replacing a pointer with a constant expression that is
not dereferenceable. Such a replacement is problematic and can be
restricted relatively easily without impacting most code. Using it to
limit replacements in GVN/SCCP/CVP only results in small differences in
7 programs out of MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto.

This patch is supposed to be an initial step to improve the current
situation and the helper should be made stricter in the future. But this
will require careful analysis of the impact on performance.

Reviewed By: aqjune

Differential Revision: https://reviews.llvm.org/D85524
2020-09-01 20:57:41 +01:00
Aaron Liu
a410101daf [LV] Interleave to expose ILP for small loops with scalar reductions.
Interleave for small loops that have reductions inside,
which breaks dependencies and expose.

This gives very significant performance improvements for some benchmarks.
Because small loops could be in very hot functions in real applications.

Differential Revision: https://reviews.llvm.org/D81416
2020-09-01 19:47:32 +00:00
Craig Topper
d3e914ec24 [MachineCopyPropagation] In isNopCopy, check the destination registers match in addition to the source registers.
Previously if the source match we asserted that the destination
matched. But GPR <-> mask register copies on X86 can violate this
since we use the same K-registers for multiple sizes.

Fixes this ISPC issue https://github.com/ispc/ispc/issues/1851

Differential Revision: https://reviews.llvm.org/D86507
2020-09-01 12:44:32 -07:00
Arthur Eubanks
81eaf47f84 [Bindings] Add LLVMAddInstructionSimplifyPass
Reviewed By: sroland

Differential Revision: https://reviews.llvm.org/D86764
2020-09-01 12:38:49 -07:00
Lang Hames
4dfbee4abb [ORC] Add unit test for HasMaterializationSideEffectsOnly failure behavior. 2020-09-01 12:34:34 -07:00
Owen Anderson
f252e214b7 Revert "Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain""
This reverts commit bc9a29b9ee6ade4894252b1470977142c32b4602.

The reasoning that this patch was wrong was itself incorrect
(see discussion on llvm-commits). This patch does seem to be exposing
a latent SVE code generation bug on non-public tests, which should
not block a correctness fix for public, non-SVE use cases.
2020-09-01 19:29:03 +00:00
Alina Sbirlea
deb30c80fe [MemorySSA] Update phi map with replacement value. 2020-09-01 11:56:40 -07:00
Hans Wennborg
df821d52a8 First commit on the release/11.x branch. 2020-09-01 11:44:02 -07:00
LLVM GN Syncbot
cbf0672b28 [gn build] Port 3e1e5f54492 2020-09-01 18:30:13 +00:00
LLVM GN Syncbot
8a4a1f29c0 [gn build] Port 3d90a61cf2e 2020-09-01 18:30:13 +00:00
Nico Weber
5e12eaec49 [gn build] port 5ffd940ac02 a bit more 2020-09-01 14:30:01 -04:00
Sean Fertile
42dd72a403 [PowerPC][AIX] Update save/restore offset for frame and base pointers.
General purpose registers 30 and 31 are handled differently when they are
reserved as the base-pointer and frame-pointer respectively. This fixes the
offset of their fixed-stack objects when there are fpr calle-saved registers.

Differential Revision: https://reviews.llvm.org/D85850
2020-09-01 14:13:05 -04:00
Craig Topper
d9375ed652 [Bitstream] Use alignTo to make code more readable. NFC
I was recently debugging a similar issue to https://reviews.llvm.org/D86500 only with a large metadata section. Only after I finished debugging it did I discover it was fixed very recently.

My version of the fix was going to alignTo since that uses uint64_t and improves the readability of the code. So I though I would go ahead and share it.

Differential Revision: https://reviews.llvm.org/D86957
2020-09-01 11:06:45 -07:00
Amara Emerson
b1b6d87965 [AArch64][GlobalISel] Optimize away a Not feeding a brcond by using tbz instead of tbnz.
Usually brconds are fed by compares, but not always, in which case we would
miss this fold.

Differential Revision: https://reviews.llvm.org/D86413
2020-09-01 11:06:06 -07:00
Amara Emerson
399486642d [GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)
This is needed for an upcoming change to how we translate conditional branches
which might generate these.

Differential Revision: https://reviews.llvm.org/D86383
2020-09-01 10:57:17 -07:00
Eric Astor
6aff3ef558 x87 FPU state instructions do not use an f32 memory location
These instructions actually use a 512-byte location, where bytes 464-511 are ignored.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D86942
2020-09-01 13:50:07 -04:00
Matt Arsenault
5e713f621a GlobalISel: Implement computeNumSignBits for G_SELECT 2020-09-01 12:50:19 -04:00
Matt Arsenault
731e73be24 GlobalISel: Port smarter known bits for umin/umax from DAG 2020-09-01 12:50:15 -04:00
Matt Arsenault
942cf2892e GlobalISel: Implement computeKnownBits for G_BSWAP and G_BITREVERSE 2020-09-01 12:49:57 -04:00