Commit Graph

138778 Commits

Author SHA1 Message Date
Geoffrey Martin-Noble
60cb217dc7 Improve error handling for SmallVector programming errors
This patch changes errors in `SmallVector::grow` that are independent of
memory capacity to be reported using report_fatal_error or
std::length_error instead of report_bad_alloc_error, which falsely signals
an OOM.

It also cleans up a few related things:
- makes report_bad_alloc_error to print the failure reason passed
  to it.
- fixes the documentation to indicate that report_bad_alloc_error
  calls `abort()` not "an assertion"
- uses a consistent name for the size/capacity argument to `grow`
  and `grow_pod`

Reviewed By: mehdi_amini, MaskRay

Differential Revision: https://reviews.llvm.org/D86892
2020-09-02 15:00:26 -07:00
Hongtao Yu
e918d57031 [ThinLTO] Fix a metadata lost issue with DICompileUnit import.
For ThinLTO importing we don't need to import all the fields of the DICompileUnit, such as enums, macros, retained types lists. The importation of those fields were previously disabled by setting their value map entries to nullptr. Unfortunately a metadata node can be shared by multiple metadata operands. Setting the map entry to nullptr might result in not importing other metadata unexpectedly.  The issue is fixed by explicitly setting the original DICompileUnit fields (still a copy of the source module metadata) to null.

Reviewed By: wenlei, dblaikie

Differential Revision: https://reviews.llvm.org/D86675
2020-09-02 14:40:41 -07:00
Nemanja Ivanovic
cba7028fdb [PowerPC] Do not legalize vector FDIV without VSX
Quite a while ago, we legalized these nodes as we added custom
handling for reciprocal estimates in the back end. We have since
moved to target-independent combines but neglected to turn off
legalization. As a result, we can now get selection failures on
non-VSX subtargets as evidenced in the listed PR.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=47373
2020-09-02 16:03:36 -05:00
Jay Foad
53df1fb508 [APInt] New member function setBitVal
Differential Revision: https://reviews.llvm.org/D87033
2020-09-02 21:40:31 +01:00
Albion Fung
d56113a35e [PowerPC] Implemented Vector Multiply Builtins
This patch implements the builtins for Vector Multiply Builtins (vmulxxd family of instructions), and adds the appropriate test cases for these builtins. The builtins utilize the vector multiply instructions itnroduced with ISA 3.1.

Differential Revision: 	https://reviews.llvm.org/D83955
2020-09-02 14:16:21 -05:00
Arthur Eubanks
1226447680 [Bindings] Move LLVMAddInstructionSimplifyPass to Scalar.cpp
Should not be with the pass, but alongside all the other C bindings.

Reviewed By: sroland

Differential Revision: https://reviews.llvm.org/D87041
2020-09-02 10:35:39 -07:00
Dmitry Preobrazhensky
9c1ffe0e18 [AMDGPU][MC] Corrected parser to avoid generation of excessive error messages
Summary of changes:
- Changed parser to eliminate generation of excessive error messages;
- Corrected lit tests to match all expected error messages;
- Corrected lit tests to guard against unwanted extra messages (added option "--implicit-check-not=error:");
- Added missing checks and fixed some typos in tests.

See bug 46907: https://bugs.llvm.org/show_bug.cgi?id=46907

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D86940
2020-09-02 19:42:18 +03:00
Eric Astor
24f72a81f9 [ms] [llvm-ml] Add support for line continuations in MASM
Add support for line continuations (the "backslash operator") in MASM by modifying the Parser's Lex method.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D83347
2020-09-02 12:12:04 -04:00
Simon Pilgrim
0eabaf370e [X86][SSE] Fold vselect(pshufb,pshufb) -> or(pshufb,pshufb)
If the PSHUFBs have no other uses, then we can force the unselected elements to zero to OR them instead, avoiding both an extra mask load and a costly variable blend.

Eventually we should try to bring this into shuffle combining, once we can more easily convert between shuffles + select patterns.
2020-09-02 16:55:00 +01:00
Congzhe Cao
654fb2b9d5 [IPSCCP] Fix a bug that the "returned" attribute is not cleared when function is optimized to return undef
In IPSCCP when a function is optimized to return undef, it should clear the returned attribute for all its input arguments
and its corresponding call sites.

The bug is exposed when the value of an input argument of the function is assigned to a physical register and
because of the argument having a returned attribute, the value of this physical register will continue to be used
as the function return value right after the call instruction returns, even if the value that this register holds may
be clobbered during the function call. This potentially results in incorrect values being used afterwards.

Reviewed By: jdoerfert, fhahn

Differential Revision: https://reviews.llvm.org/D84220
2020-09-02 11:21:48 -04:00
Anna Thomas
042179924b [ImplicitNullChecks] NFC: Refactor dependence safety check
After computing dependence, we check if it is safe to hoist by
identifying if it clobbers any liveIns in the sibling block (NullSucc).
This check is moved to its own function which will be used in the
soon-to-be modified dependence checking algorithm for implicit null
checks pass.

Tests-Run: lit tests on X86/implicit-*
2020-09-02 10:29:44 -04:00
Anna Thomas
d6da4d8aad [ImplicitNullChecks] NFC: Separated out checks and added comments
Separated out some checks in isSuitableMemoryOp and added comments
explaining why some of those checks are done.

Tests-Run:X86 implicit null checks tests.
2020-09-02 10:29:44 -04:00
David Stenberg
44cdc22e09 [GlobalOpt] Fix an incorrect Modified status
When marking a global variable constant, and simplifying users using
CleanupConstantGlobalUsers(), the pass could incorrectly return false if
there were still some uses left, and no further optimizations was done.

This was caught using the check introduced by D80916.

This fixes PR46749.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D85837
2020-09-02 15:00:45 +02:00
Venkataramanan Kumar
36b1d49191 [InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X)
These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)":
1.0/sqrt(X) * X => X/sqrt(X)
X * 1.0/sqrt(X) => X/sqrt(X)

We already handle more general cases, and we are intentionally not creating extra (and likely expensive)
fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce
X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags.

Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X).

Differential Revision: https://reviews.llvm.org/D86726
2020-09-02 08:23:48 -04:00
Sanjay Patel
4e9822e551 [VectorCombine] allow vector loads with mismatched insert type
This is an enhancement to D81766 to allow loading the minimum target
vector type into an IR vector with a different number of elements.

In one of the motivating tests from PR16739, SLP creates <2 x float>
load ops mixed with <4 x float> insert ops, so we want to handle that
pattern in addition to potential oversized vectors created by the
vectorizers.

For now, we are assuming the insert/extract subvector with undef is
free because there is no exact corresponding TTI modeling for that.

Differential Revision: https://reviews.llvm.org/D86160
2020-09-02 08:11:36 -04:00
Paul Walker
aeb7dd335b [SVE] Don't reorder subvector/binop sequences when the resulting binop is not legal.
When lowering fixed length vector operations for SVE the subvector
operations are used extensively to marshall data between scalable
and fixed-length vectors. This means that sequences like:

  extract_subvec(binop(insert_subvec(a), insert_subvec(b)))

are very common. DAGCombine only checks if the resulting binop is
legal or can be custom lowered when undoing such sequences. When
it's custom lowering that is introducing them the result is an
infinite legalise->combine->legalise loop.

This patch extends the isOperationLegalOr... functions to include
a "LegalOnly" parameter to restrict the check to legal operations
only. Although isOperationLegal could be used it's common for
the affected code paths to be visited pre and post legalisation,
so the extra parameter keeps the code tidy.

Differential Revision: https://reviews.llvm.org/D86450
2020-09-02 11:01:33 +01:00
Jay Foad
02f664086c [AMDGPU] Fix offset for REL32_HI relocs
The addend in a REL32 reloc needs to be adjusted to account for the
offset from the PC value returned by the s_getpc instruction to the
point where the reloc is applied. This was being done correctly for
(GOTPC)REL32_LO but not for (GOTPC)REL32_HI. This will only make a
difference if the target symbol happens to get loaded almost exactly
a multiple of 4G away from the relocated instructions.

Differential Revision: https://reviews.llvm.org/D86938
2020-09-02 10:55:55 +01:00
Sander de Smalen
40471e4fe8 [AArch64][SVE] Preserve full vector regs over EH edge.
Unwinders may only preserve the lower 64bits of Neon and SVE registers,
as only the registers in the base ABI are guaranteed to be preserved
over the exception edge. The caller will need to preserve additional
registers for when the call throws an exception and the unwinder has
tried to recover state.

For  e.g.

    svint32_t bar(svint32_t);
    svint32_t foo(svint32_t x, bool *err) {
      try { bar(x); } catch (...) { *err = true; }
      return x;
    }

`z0` needs to be spilled before the call to `bar(x)` and reloaded before
returning from foo, as the exception handler may have clobbered z0.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84737
2020-09-02 10:54:18 +01:00
Igor Kudrin
7e0c7ec095 [DebugInfo] Emit a 1-byte value as a terminator of entries list in the name index.
As stated in section 6.1.1.2, DWARFv5, p. 142,
| The last entry for each name is followed by a zero byte that
| terminates the list. There may be gaps between the lists.

The patch changes emitting a 4-byte zero value to a 1-byte one, which
effectively removes the gap between entry lists, and thus saves
approximately 3 bytes per name; the calculation is not exact because
the total size of the table is aligned to 4.

Differential Revision: https://reviews.llvm.org/D86927
2020-09-02 16:12:39 +07:00
Igor Kudrin
ab5e60fa50 [DebugInfo] Remove Dwarf5AccelTableWriter::Header::UnitLength. NFC.
The member is not in use; the unit length for the table is emitted as
a difference between two labels. Moreover, the type of the member might
be misleading, because for DWARF64 the field should be 64 bit long.

Differential Revision: https://reviews.llvm.org/D86912
2020-09-02 16:11:45 +07:00
Martin Storsjö
a68f07462a [X86] Remove superfluous trailing semicolons, fixing warnings. NFC. 2020-09-02 11:43:27 +03:00
Simon Pilgrim
da3091b1cb [X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add general shuffle combining support
This patch uses partial DemandedElts masks to further simplify target shuffle chains and finally starts making target shuffle combining part of SimplifyDemandedBits/SimplifyDemandedVectorElts.

We already manage this for Depth == 0 cases, where combineX86ShuffleChain would early-out if the shuffle combined to the same op, but the patch generalizes this by manipulating the depth handling of combineX86ShufflesRecursively - calling with a new Depth = 0 and reducing the maximum shuffle combine depth accordingly.

Differential Revision: https://reviews.llvm.org/D66004
2020-09-02 09:24:46 +01:00
Shinji Okumura
ad9adda18c [Attributor] Make use of AANoUndef in AAUndefinedBehavior
This patch makes it possible for AAUB to use information from AANoUndef.
This is the next patch of D86983

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86984
2020-09-02 16:08:03 +09:00
Shinji Okumura
e194827980 [Attributor] Fix AANoUndef initialization
When the associated value is undef, we immediately forced to indicate a pessimistic fixpoint so far.
This patch changes the initialization to check the attribute given in IR at first and to indicate an optimistic fixpoint when it is given.
This change will enable us to catch , for example, the following case in AAUB.
```
call void @foo(i32 noundef undef)
```

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86983
2020-09-02 15:40:43 +09:00
Zi Xuan Wu
ed26c15759 [RFC][Target] Add a new triple called Triple::csky
Before upstream a new target called CSKY, make a new triple of that called Triple::csky.
For now, it's a 32-bit little endian target and the detail can be referred at D86269.

This is the split part of D86269, which add a new target called CSKY.

Differential Revision: https://reviews.llvm.org/D86505
2020-09-02 12:46:09 +08:00
Lang Hames
65c420c8ff [ORC] Remove stray debugging output. 2020-09-01 20:53:49 -07:00
Lang Hames
4a706c5ba3 [ORC] Add an early out for MachOPlatform's init-scraper plugin setup.
If there's no initializer symbol in the current MaterializationResponsibility
then bail out without installing JITLink passes: they're going to be no-ops
anyway.
2020-09-01 20:12:23 -07:00
Lang Hames
cbe0e339f3 [ORC] Fix MachOPlatform's synthetic symbol dependence registration.
A think-o in the existing code meant that dependencies were never registered.
This failure could lead to crashes rather than orderly error propagation if
initialization dependencies failed to materialize.

No test case: The bug was discovered in an out-of-tree code and requires
pathalogically misconfigured JIT to generate the original error that lead to
the crash.
2020-09-01 20:12:23 -07:00
Xing GUO
db9f7d8731 [DebugInfo] Simplify string table dumpers.
This patch adds a helper function DumpStrSection to simplify codes.
Besides, nonprintable chars in debug_str and debug_str.dwo sections
are printed as escaped chars.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D86918
2020-09-02 08:41:10 +08:00
Alina Sbirlea
d314cb9743 [MemCpyOptimizer] Preserve analyses and replace use of lambdas to get them.
Summary:
Analyses are preserved in MemCpyOptimizer.
Get analyses before running the pass and store the pointers, instead of
using lambdas and getting them every time on demand.

Reviewers: lenary, deadalnix, mehdi_amini, nikic, efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74494
2020-09-01 17:35:40 -07:00
Jordan Rupprecht
c2af995048 [NFC] Fix unused var in release builds.
This was always unused, but the change in D86354 upgraded this to a compiler warning.
2020-09-01 16:38:24 -07:00
Amy Kwan
2cc81d44bf [PowerPC] Implement builtins for xvcvspbf16 and xvcvbf16spn
This patch adds the builtin implementation for the xvcvspbf16 and xvcvbf16spn
instructions.

Differential Revision: https://reviews.llvm.org/D86795
2020-09-01 17:16:43 -05:00
Cameron McInally
7b182e9bab [SVE] Update INSERT_SUBVECTOR DAGCombine to use getVectorElementCount().
A small piece of the project to replace getVectorNumElements() with getVectorElementCount().

Differential Revision: https://reviews.llvm.org/D86894
2020-09-01 16:51:44 -05:00
Amara Emerson
e10dc0379d Revert "Revert "[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)" (and dependent patch "Optimize away a Not feeding a brcond by using tbz instead of tbnz.")"
This reverts commit 8693ddc74371dedc742c9f3d3e4eda1da72c13ea.

Re-committing with the test requiring asserts.
2020-09-01 14:29:04 -07:00
Jordan Rupprecht
deb09864b8 Revert "[GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)" (and dependent patch "Optimize away a Not feeding a brcond by using tbz instead of tbnz.")
This reverts commit 8ad8f484b63ca507417b58c9016d2761f2b1a1a8. It causes crashes when running `ninja check-llvm-codegen-aarch64-globalisel`, e.g.
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24132/steps/test-stage1-compiler/logs/stdio.
Note that the crash does not seem to reproduce in debug builds.

5ded4442520d3dbb1aa72e6fe03cddef8828c618 depends on this, so revert that too.
2020-09-01 13:31:57 -07:00
Michael Liao
192b060a21 [amdgpu] Run SROA after loop unrolling.
Summary: - There are promotable `alloca`s after loop unrolling.

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, nikic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D84252
2020-09-01 16:09:56 -04:00
Jordan Rupprecht
e72bcb7267 [NFC] Fix unused var in release build 2020-09-01 13:05:56 -07:00
Florian Hahn
2248875209 [Loads] Add canReplacePointersIfEqual helper.
This patch adds an initial, incomeplete and unsound implementation of
canReplacePointersIfEqual to check if a pointer value A can be replaced
by another pointer value B, that are deemed to be equivalent through
some means (e.g. information from conditions).

Note that is in general not sound to blindly replace pointers based on
equality, for example if they are based on different underlying objects.

LLVM's memory model is not completely settled as of now; see
https://bugs.llvm.org/show_bug.cgi?id=34548 for a more detailed
discussion.

The initial version of canReplacePointersIfEqual only rejects a very
specific case: replacing a pointer with a constant expression that is
not dereferenceable. Such a replacement is problematic and can be
restricted relatively easily without impacting most code. Using it to
limit replacements in GVN/SCCP/CVP only results in small differences in
7 programs out of MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto.

This patch is supposed to be an initial step to improve the current
situation and the helper should be made stricter in the future. But this
will require careful analysis of the impact on performance.

Reviewed By: aqjune

Differential Revision: https://reviews.llvm.org/D85524
2020-09-01 20:57:41 +01:00
Aaron Liu
a410101daf [LV] Interleave to expose ILP for small loops with scalar reductions.
Interleave for small loops that have reductions inside,
which breaks dependencies and expose.

This gives very significant performance improvements for some benchmarks.
Because small loops could be in very hot functions in real applications.

Differential Revision: https://reviews.llvm.org/D81416
2020-09-01 19:47:32 +00:00
Craig Topper
d3e914ec24 [MachineCopyPropagation] In isNopCopy, check the destination registers match in addition to the source registers.
Previously if the source match we asserted that the destination
matched. But GPR <-> mask register copies on X86 can violate this
since we use the same K-registers for multiple sizes.

Fixes this ISPC issue https://github.com/ispc/ispc/issues/1851

Differential Revision: https://reviews.llvm.org/D86507
2020-09-01 12:44:32 -07:00
Arthur Eubanks
81eaf47f84 [Bindings] Add LLVMAddInstructionSimplifyPass
Reviewed By: sroland

Differential Revision: https://reviews.llvm.org/D86764
2020-09-01 12:38:49 -07:00
Owen Anderson
f252e214b7 Revert "Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain""
This reverts commit bc9a29b9ee6ade4894252b1470977142c32b4602.

The reasoning that this patch was wrong was itself incorrect
(see discussion on llvm-commits). This patch does seem to be exposing
a latent SVE code generation bug on non-public tests, which should
not block a correctness fix for public, non-SVE use cases.
2020-09-01 19:29:03 +00:00
Alina Sbirlea
deb30c80fe [MemorySSA] Update phi map with replacement value. 2020-09-01 11:56:40 -07:00
Sean Fertile
42dd72a403 [PowerPC][AIX] Update save/restore offset for frame and base pointers.
General purpose registers 30 and 31 are handled differently when they are
reserved as the base-pointer and frame-pointer respectively. This fixes the
offset of their fixed-stack objects when there are fpr calle-saved registers.

Differential Revision: https://reviews.llvm.org/D85850
2020-09-01 14:13:05 -04:00
Craig Topper
d9375ed652 [Bitstream] Use alignTo to make code more readable. NFC
I was recently debugging a similar issue to https://reviews.llvm.org/D86500 only with a large metadata section. Only after I finished debugging it did I discover it was fixed very recently.

My version of the fix was going to alignTo since that uses uint64_t and improves the readability of the code. So I though I would go ahead and share it.

Differential Revision: https://reviews.llvm.org/D86957
2020-09-01 11:06:45 -07:00
Amara Emerson
b1b6d87965 [AArch64][GlobalISel] Optimize away a Not feeding a brcond by using tbz instead of tbnz.
Usually brconds are fed by compares, but not always, in which case we would
miss this fold.

Differential Revision: https://reviews.llvm.org/D86413
2020-09-01 11:06:06 -07:00
Amara Emerson
399486642d [GlobalISel] Fold xor(cmp(pred, _, _), 1) -> cmp(inverse(pred), _, _)
This is needed for an upcoming change to how we translate conditional branches
which might generate these.

Differential Revision: https://reviews.llvm.org/D86383
2020-09-01 10:57:17 -07:00
Eric Astor
6aff3ef558 x87 FPU state instructions do not use an f32 memory location
These instructions actually use a 512-byte location, where bytes 464-511 are ignored.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D86942
2020-09-01 13:50:07 -04:00
Matt Arsenault
5e713f621a GlobalISel: Implement computeNumSignBits for G_SELECT 2020-09-01 12:50:19 -04:00
Matt Arsenault
731e73be24 GlobalISel: Port smarter known bits for umin/umax from DAG 2020-09-01 12:50:15 -04:00
Matt Arsenault
942cf2892e GlobalISel: Implement computeKnownBits for G_BSWAP and G_BITREVERSE 2020-09-01 12:49:57 -04:00
Qiu Chaofan
594eb7ba1d [PowerPC] Handle STRICT_FSETCC(S) in more cases
On -O0, i1 strict_fsetcc will be promoted to i32. We don't handle that
in TD patterns. This patch fills logic in PPCISelDAGToDAG to handle more
cases.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D86595
2020-09-02 00:33:21 +08:00
Sam Tebbs
745bb7deaa [DAGCombiner] Fold an AND of a masked load into a zext_masked_load
This patch folds an AND of a masked load and build vector into a zero
extended masked load.

Differential Revision: https://reviews.llvm.org/D86789
2020-09-01 17:02:07 +01:00
Amy Kwan
5ef8594309 [PowerPC] Implement instruction definitions/MC Tests for xvcvspbf16 and xvcvbf16spn
This patch adds the td instruction definitions of the xvcvspbf16 and xvcvbf16spn
instructions, along with their respective MC tests.

Differential Revision: https://reviews.llvm.org/D86794
2020-09-01 10:59:43 -05:00
Volkan Keles
72df60b8c0 GlobalISel: Add combines for extend operations
https://reviews.llvm.org/D86516
2020-09-01 08:50:06 -07:00
Matt Arsenault
b0476e1c15 GlobalISel: Implement computeNumSignBits for G_SEXTLOAD/G_ZEXTLOAD 2020-09-01 11:20:02 -04:00
Matt Arsenault
fbbea16c6b GlobalISel: Implement computeKnownBits for G_UNMERGE_VALUES 2020-09-01 11:19:27 -04:00
Paul Walker
b7e7bd3c3d Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain"
This reverts commit e9d9a612084b47fc4277523561d61e675370c854.

This patch was previously revert by 04879086b44348cad600a0a1ccbe1f7776cc3cf9
with the reapplication being done after breaking the assert used to
ensure SP is always 16-byte aligned, which is a requirement of the AAPCS.

For extra context the latest patch caused runtime failures when
building with "-march=armv8-a+sve -mllvm -aarch64-sve-vector-bits-min=256".
2020-09-01 16:09:37 +01:00
Matt Arsenault
2c2bfb8ff0 GlobalISel: Artifact combine unmerge of unmerge
Unmerges have the same fundamental problem as G_TRUNC, and G_TRUNC
could be implemented in terms of G_UNMERGE_VALUES. Reducing the number
of elements in unmerge results ends up producing the original unmerge
type profile, so the artifact combiner needs to eliminate the
intermediate illegal registers. This avoids infinite looping in the
legalizer in a future change.

Assuming an unmerge has each result unmerged the same way, this ends
up producing a new unmerge of the source for every definition. I'm not
sure if the artifact combiner should either insert temporary merges
here and erase the original merge, or if the combiner should look at
uses from defs rather than defs from uses for unmerges.

In a few cases this regresses from using 16-bit shifts for 8-bit
values to using 32-bit shifts, but I think these can be legalized
later (the other legalization rules don't try very hard to use 16-bit
shifts either).
2020-09-01 11:01:33 -04:00
Anh Tuyen Tran
c3a3d1596a [LoopIdiomRecognizePass] Options to disable part or the entire Loop Idiom Recognize Pass
Loop Idiom Recognize Pass (LIRP) attempts to transform loops with subscripted arrays
into memcpy/memset function calls. In some particular situation, this transformation
introduces negative impacts. For example: https://bugs.llvm.org/show_bug.cgi?id=47300

This patch will enable users to disable a particular part of the transformation, while
he/she can still enjoy the benefit brought about by the rest of LIRP. The default
behavior stays unchanged: no part of LIRP is disabled by default.

Reviewed By: etiotto (Ettore Tiotto)

Differential Revision: https://reviews.llvm.org/D86262
2020-09-01 13:59:24 +00:00
Raphael Isemann
c3c7a59967 Reland [FileCheck] Move FileCheck implementation out of LLVMSupport into its own library
This relands e9a3d1a401b07cbf7b11695637f1b549782a26cd which was originally
missing linking LLVMSupport into LLMVFileCheck which broke the SHARED_LIBS build.

Original summary:

The actual FileCheck logic seems to be implemented in LLVMSupport. I don't see a
good reason for having FileCheck implemented there as it has a very specific use
while LLVMSupport is a dependency of pretty much every LLVM tool there is. In
fact, the only use of FileCheck I could find (outside the FileCheck tool and the
FileCheck unit test) is a single call in GISelMITest.h.

This moves the FileCheck logic to its own LLVMFileCheck library. This way only
FileCheck and the GlobalISelTests now have a dependency on this code.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D86344
2020-09-01 14:59:28 +02:00
Sourabh Singh Tomar
0940e21689 [NFCI] Removed an un-used declaration got accidentally introduced in f91d18eaa946b2d2ea5a9 2020-09-01 15:59:04 +05:30
Raphael Isemann
02ebf2f3b4 Revert "[lldb] Add reproducer verifier"
This reverts commit 297f69afac58fc9dc13897857a5e70131c5adc85. It broke
the Fedora 33 x86-64 bot. See the review for more info.
2020-09-01 12:21:44 +02:00
David Sherwood
c4d572ac0e [SVE][CodeGen] Fix TypeSize/ElementCount related warnings in sve-split-load.ll
I have fixed up a number of warnings resulting from TypeSize -> uint64_t
casts and calling getVectorNumElements() on scalable vector types. I
think most of the changes are fairly trivial except for those in
DAGTypeLegalizer::SplitVecRes_MLOAD I've tried to ensure we create
the MachineMemoryOperands in a sensible way for scalable vectors.

I have added a CHECK line to the following test:

  CodeGen/AArch64/sve-split-load.ll

that ensures no new warnings are added.

Differential Revision: https://reviews.llvm.org/D86697
2020-09-01 07:47:59 +01:00
David Green
c8b43e910b Revert "[ARM] Register pressure with -mthumb forces register reload before each call"
Expensive checks are failing, complaining about additional MMO operands
added to the branch.
2020-09-01 07:39:54 +01:00
Petr Hosek
11e2ca0270 [CMake] Use find_library for ncurses
Currently it is hard to avoid having LLVM link to the system install of
ncurses, since it uses check_library_exists to find e.g. libtinfo and
not find_library or find_package.

With this change the ncurses lib is found with find_library, which also
considers CMAKE_PREFIX_PATH. This solves an issue for the spack package
manager, where we want to use the zlib installed by spack, and spack
provides the CMAKE_PREFIX_PATH for it.

This is a similar change as https://reviews.llvm.org/D79219, which just
landed in master.

Patch By: haampie

Differential Revision: https://reviews.llvm.org/D85820
2020-08-31 20:06:21 -07:00
Alina Sbirlea
c9a8d991e6 [MemorySSA] Clean up single value phis.
MemoryPhis with a single value are correct, but can lead to errors when
updating. Clean up single entry Phis newly added when cloning blocks.
Resolves PR46574.
2020-08-31 19:26:08 -07:00
Xing GUO
45321f3e73 [DWARFYAML] Make the debug_str section optional.
This patch makes the debug_str section optional. When the debug_str
section exists but doesn't contain anything, yaml2obj will emit a
section header for it.

Reviewed By: grimar

Differential Revision: https://reviews.llvm.org/D86860
2020-09-01 10:02:09 +08:00
Hamilton Tobon Mosquera
aef078ab19 [OpenMPOpt][NFC] Moving constants as struct static attributes 2020-08-31 19:05:00 -05:00
Lang Hames
8c3159bb6b [ORC] Remove an unused variable.
The unused Main variable was accidentally left in an earlier commit.
2020-08-31 15:35:55 -07:00
Jonas Devlieghere
7448020ce4 [lldb] Add reproducer verifier
Add a reproducer verifier that catches:

 - Missing or invalid home directory
 - Missing or invalid working directory
 - Missing or invalid module/symbol paths
 - Missing files from the VFS

The verifier is enabled by default during replay, but can be skipped by
passing --reproducer-no-verify.

Differential revision: https://reviews.llvm.org/D86497
2020-08-31 15:14:18 -07:00
Hamilton Tobon Mosquera
b5757ca2c4 [OpenMPOpt][HideMemTransfersLatency] Get values stored in offload arrays
getValuesInOffloadArrays goes through the offload arrays in __tgt_target_data_begin_mapper getting the values stored in them before the call is issued.

call void @__tgt_target_data_begin_mapper(arg0, arg1,
    i8** %offload_baseptrs, i8** %offload_ptrs, i64* %offload_sizes,
...)

Diferential Revision: https://reviews.llvm.org/D86300
2020-08-31 15:33:05 -05:00
Sanjay Patel
251968146e [IR][GVN] allow intrinsics in Instruction's isCommutative query (2nd try)
The 1st try was reverted because I missed an assert that
needed softening.

As discussed in D86798 / rG09652721 , we were potentially
returning a different result for whether an Instruction
is commutable depending on if we call the base class or
derived class method.

This requires relaxing asserts in GVN, but that pass
seems to be working otherwise.

NewGVN requires more work because it uses different
code paths for numbering binops and calls.
2020-08-31 16:01:19 -04:00
Christopher Tetreault
6b50d3055a [SVE] Remove calls to VectorType::getNumElements from InstCombine
Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D82237
2020-08-31 12:59:10 -07:00
Roman Lebedev
4aba95741f [NFC][InstCombine] visitPHINode(): cleanup PHI CSE instruction replacement
As @nikic is pointing out in https://reviews.llvm.org/rGbf21ce7b908e#inline-4647
this must be sufficient otherwise `EliminateDuplicatePHINodes()`
would have hit issues with it already.
2020-08-31 22:29:39 +03:00
Prathamesh Kulkarni
e87ae397a6 [ARM] Register pressure with -mthumb forces register reload before each call
This patch implements the foldMemoryOperand hook in Thumb1InstrInfo,
allowing tBLXr and a spilled function address to be combined back into a
tBL. This can help with codesize at Oz, especailly in the tinycrypt
library.

Differential Revision: https://reviews.llvm.org/D79785
2020-08-31 20:00:30 +01:00
Qiu Chaofan
55050110d2 [NFC] [DAGCombiner] Refactor bitcast folding within fabs/fneg
fabs and fneg share a common transformation:

(fneg (bitconvert x)) -> (bitconvert (xor x sign))
(fabs (bitconvert x)) -> (bitconvert (and x ~sign))

This patch separate the code into a single method.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86862
2020-09-01 00:48:12 +08:00
Qiu Chaofan
9b5b508e71 [NFC] [DAGCombiner] Remove unnecessary negation in visitFNEG
In visitFNEG of DAGCombiner, the folding of (fneg (fsub c, x)) is
redundant since getNegatedExpression already handles it.
2020-09-01 00:35:01 +08:00
Sanjay Patel
4302187593 [DAGCombiner] skip reciprocal divisor optimization for x/sqrt(x), better
I tried to fix this in:
rG716e35a0cf53
...but that patch depends on the order that we encounter the
magic "x/sqrt(x)" expression in the combiner's worklist.

This patch should improve that by waiting until we walk the
user list to decide if there's a use to skip.

The AArch64 test reveals another (existing) ordering problem
though - we may try to create an estimate for plain sqrt(x)
before we see that it is part of a 1/sqrt(x) expression.
2020-08-31 09:35:59 -04:00
Sourabh Singh Tomar
b75e186b38 [NFCI] Silent a build warning due to an extra semi-colon 2020-08-31 17:49:31 +05:30
Raphael Isemann
20366c891d Revert "[FileCheck] Move FileCheck implementation out of LLVMSupport into its own library"
This reverts commit e9a3d1a401b07cbf7b11695637f1b549782a26cd. Seems the new
FileCheck library doesn't link on some bots. Reverting for now.
2020-08-31 11:38:40 +02:00
Raphael Isemann
aecb52031b [FileCheck] Move FileCheck implementation out of LLVMSupport into its own library
The actual FileCheck logic seems to be implemented in LLVMSupport. I don't see a
good reason for having FileCheck implemented there as it has a very specific use
while LLVMSupport is a dependency of pretty much every LLVM tool there is. In
fact, the only use of FileCheck I could find (outside the FileCheck tool and the
FileCheck unit test) is a single call in GISelMITest.h.

This moves the FileCheck logic to its own LLVMFileCheck library. This way only
FileCheck and the GlobalISelTests now have a dependency on this code.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D86344
2020-08-31 11:24:41 +02:00
Fangrui Song
c727302d39 [Sink] Optimize/simplify sink candidate finding with nearest common dominator
For an instruction in the basic block BB, SinkingPass enumerates basic blocks
dominated by BB and BB's successors. For each enumerated basic block,
SinkingPass uses `AllUsesDominatedByBlock` to check whether the basic
block dominates all of the instruction's users. This is inefficient.

Use the nearest common dominator of all users to avoid enumerating the
candidate. The nearest common dominator may be in a parent loop which is
not beneficial. In that case, find the ancestors in the dominator tree.

In the case that the instruction has no user, with this change we will
not perform unnecessary move. This causes some amdgpu test changes.

A stage-2 x86-64 clang is a byte identical with this change.
2020-08-30 22:51:00 -07:00
Sanjay Patel
56bc7f03f4 Revert "[IR][GVN] allow intrinsics in Instruction's isCommutative query"
This reverts commit 25597f7783e7038b8a2ee88bb49ac605b211b564.
It is causing crashing on bots such as:
http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10523/steps/ninja-build/logs/stdio
2020-08-30 17:02:51 -04:00
Florian Hahn
6de6df75cc [DSE,MemorySSA] Skip defs without analyzable write locations.
Similar to other checks above, if there is no write location for a def,
it cannot be considered for elimination and can be skipped.
2020-08-30 21:56:25 +01:00
Sanjay Patel
d58c2f282d [IR][GVN] allow intrinsics in Instruction's isCommutative query
As discussed in D86798 / rG09652721 , we were potentially
returning a different result for whether an Instruction
is commutable depending on if we call the base class or
derived class method.

This requires relaxing an assert in GVN, but that pass
seems to be working otherwise.

NewGVN requires more work because it uses different
code paths for numbering binops and calls.
2020-08-30 16:49:22 -04:00
Florian Hahn
49edbbbd7c [DSE,MemorySSA] Simplify code, EarlierAccess is be a MemoryDef (NFC).
After recent changes, we return early if Current is a MemoryPhi, so
EarlierAccess can only be a MemoryDef.
2020-08-30 21:31:57 +01:00
Thomas Preud'homme
14ecb9c6d9 [FileCheck] Add precision to format specifier
Add printf-style precision specifier to pad numbers to a given number of
digits when matching them if the value is smaller than the given
precision. This works on both empty numeric expression (e.g. variable
definition from input) and when matching a numeric expression. The
syntax is as follows:

[[#%.<precision><format specifier>, ...]

where <format specifier> is optional and ... can be a variable
definition or not with an empty expression or not. In the absence of a
precision specifier, a variable definition will accept leading zeros.

Reviewed By: jhenderson, grimar

Differential Revision: https://reviews.llvm.org/D81667
2020-08-30 19:40:57 +01:00
Florian Hahn
11bbe69ccc [LV] Update CFG before adding runtime checks.
addRuntimeChecks uses SCEVExpander, which relies on the DT/LoopInfo to
be up-to-date. Changing the CFG afterwards may invalidate some inserted
instructions, especially LCSSA phis.

Reorder the code to first update the CFG and then create the runtime
checks. This should not have any impact on the generated code, as we
adjust the CFG and generate runtime checks together.

Fixes PR47343.
2020-08-30 18:21:44 +01:00
Sanjay Patel
d6a2f460c7 [FastISel] update to use intrinsic's isCommutative(); NFC
This requires adding a missing 'const' to the definition because
the callers are using const args, but there should be no change
in behavior.

The intrinsic method was added with D86798 / rG096527214033
2020-08-30 11:36:41 -04:00
Sanjay Patel
5b3abe923f [DAGCombiner] skip reciprocal divisor optimization for x/sqrt(x)
In general, we probably want to try the multi-use reciprocal
transform before sqrt transforms, but x/sqrt(x) is a special-case
because that will always reduce to plain sqrt(x) or an estimate.

The AArch64 tests show that the transform is limited by TLI
hook to patterns where there are 3 or more uses of the divisor.
So this change can result in an extra division compared to
what we had, but that's the intended behvior based on the
current setting of that hook.
2020-08-30 10:55:45 -04:00
Sanjay Patel
8a96dfec70 [SLP] make commutative check apply only to binops; NFC
As discussed in D86798, it's not clear if the caller code
works with a more liberal definition of "commutative" that
includes intrinsics like min/max. This makes the binop
restriction (current functionality is unchanged) explicit
until the code is audited/tested.
2020-08-30 10:55:44 -04:00
Krzysztof Parzyszek
1eb8234c85 [Hexagon] Fix perfect shuffle generation for single vectors
Perfect shuffle instruction (vdealvdd/vshuffvdd) work on vector
pairs. When given a single input vector, half of it first needs
to be transposed into the other vector before the generated
shuffles can take effect. Also the first transpose needs to be
undone at the end (this last step was missing).
2020-08-30 06:43:16 -05:00
David Green
65d99fc840 [LV] Add some const to RecurrenceDescriptor. NFC 2020-08-30 12:27:51 +01:00
sstefan1
3610c7de35 Reland [OpenMPOpt] ICV tracking for calls
The problem with module slice has been addressed in D86319

Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a
function can have a unique ICV value after it is finished, and
AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a
call site. This enables us to check the value of a call and if it
changes the ICV. This also changes the approach in
`getReplacementValues()` to a worklist-based approach so we can explore
all relevant BBs.

Differential Revision: https://reviews.llvm.org/D85544
2020-08-30 11:27:48 +02:00
sstefan1
34f7bfd3c0 [Attributor] Introduce module slice.
Summary:
The module slice describes which functions we can analyze and transform
while working on an SCC as part of the Attributor-CGSCC pass. So far we
simply restricted it to the SCC.

Reviewers: jdoerfert

Differential Revision: https://reviews.llvm.org/D86319
2020-08-30 10:30:44 +02:00
Shinji Okumura
604021bf31 [Attributor] Fix callsite check in AAUndefinedBehavior
This is the next patch of D86842
When we check `noundef` attribute violation at callsites, we do not have to require `nonnull` in the following two cases.
1. An argument is known to be simplified to undef
2. An argument is known to be dead

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86845
2020-08-30 13:17:02 +09:00
Shinji Okumura
1ef93fe382 [Attributor][NFC] Fix dependency type in AAUndefinedBehaviorImpl::updateImpl
This patch fixes wrong dependency type in AAUB.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86842
2020-08-30 12:34:50 +09:00
Fangrui Song
41375434d7 Set alignment of .llvmbc and .llvmcmd to 1
Otherwise their alignment is dependent on the size of the section.  If the size
is large than 16, the alignment will be 16.

16 is a bad choice for both .llvmbc and .llvmcmd because the padding between two
contributions from input sections is of a variable size.

A bitstream is actually guaranteed to be 4-byte aligned, but consumers don't
need this property.
2020-08-29 18:27:34 -07:00
Lang Hames
aa6cb2b9d1 [ORC] Add getDFSLinkOrder / getReverseDFSLinkOrder methods to JITDylib.
DFS and Reverse-DFS linkage orders are used to order execution of
deinitializers and initializers respectively.

This patch replaces uses of special purpose DFS order functions in
MachOPlatform and LLJIT with uses of the new methods.
2020-08-29 15:17:06 -07:00
Shinji Okumura
8185014ab8 [Attributor] Fix AANoUndef identification
Even though `noundef` IR attribute might be attached to non-void type values, AANoUndef is mistakenly identified for pointer type values only.
This patch fixes that.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86737
2020-08-30 05:39:25 +09:00
Nikita Popov
b469dc9b48 [InstSimplify] Reduce code duplication in simplifySelectWithICmpCond (NFC)
Canonicalize icmp ne to icmp eq and implement all the folds only once.
2020-08-29 22:38:49 +02:00
Nikita Popov
556e0d5173 [InstSimplify] Protect against more poison in SimplifyWithOpReplaced (PR47322)
Replace the check for poison-producing instructions in
SimplifyWithOpReplaced() with the generic helper canCreatePoison()
that properly handles poisonous shifts and thus avoids the problem
from PR47322.

This additionally fixes a bug in IIQ.UseInstrInfo=false mode, which
previously could have caused this code to ignore poison flags.
Setting UseInstrInfo=false should reduce the possible optimizations,
not increase them.

This is not a full solution to the problem, as poison could be
introduced more indirectly. This is just a minimal, easy to backport
fix.

Differential Revision: https://reviews.llvm.org/D86834
2020-08-29 21:59:39 +02:00
Florian Hahn
cf6001d803 [LV] Check opt-for-size before expanding runtime checks.
Move bail out when optimizing for size before runtime check generation.
In that case, we do not use the result of the expansion, the expanded
instruction will be dead and cleaned up later.

By doing the check before expanding the runtime-checks, we can save a
bit of unnecessary work.
2020-08-29 20:35:14 +01:00
Nikita Popov
d37c50d614 [LVI] Remove unnecessary lambda capture (NFC) 2020-08-29 21:33:19 +02:00
Nikita Popov
4e796fd7b6 Reapply [LVI] Normalize pointer behavior
This got reverted because a dependency was reverted. It has since
been reapplied, so reapply this as well.

-----

Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.

The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.

This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this more powerful than the previous implementation as
a side-effect, and this does actually seem benefitial in practice.

Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.

I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.

Differential Revision: https://reviews.llvm.org/D69914
2020-08-29 21:17:03 +02:00
Roman Lebedev
1870ac8f33 [NFC][Local] EliminateDuplicatePHINodes(): add STATISTIC() 2020-08-29 22:03:18 +03:00
Roman Lebedev
aebca3b6a8 [NFCI][Local] Rewrite EliminateDuplicatePHINodes to optionally check hashing invariants
EarlyCSE has a mode to verify the invariant that hash equality equals
key equality, but EliminateDuplicatePHINodes() doesn't.

I've verified that this would have caught the stage2-stage3 mismatches
5ec2b757cc7d37ff0d03b36ee863b0962fe78108 revert has fixed,
that were introduced last time in 3e69871ab5a66fb55913a2a2f5e7f5b42899a4c9.
2020-08-29 22:03:10 +03:00
Shinji Okumura
408b703e38 [Attributor][NFC] Do not manifest noundef for positions to be changed to undef
This patch fixes AANoUndef manifestation.
We should not manifest noundef for positions that will be changed to undef.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86835
2020-08-30 03:23:41 +09:00
Florian Hahn
52d030620b [DSE,MemorySSA] Return early when hitting a MemoryPhi.
A MemoryPhi can never be eliminated. If we hit one, return the Phi, so
the caller can continue traversing the incoming accesses.

This saves some unnecessary read clobber checks and improves
compile-time
http://llvm-compile-time-tracker.com/compare.php?from=1ffc58b6d098ce8fa71f3a80fe75b990f633f921&to=d0fa8d1982380b57d7b6067528104bc373dbe07a&stat=instructions
2020-08-29 18:28:26 +01:00
Benjamin Kramer
82759b9044 [IR] Inline AttrBuilder::addAttribute. It just sets 1 bit. NFC. 2020-08-29 19:13:49 +02:00
Roman Lebedev
bf4def296e [Instruction] Speculatively undo isIdenticalToWhenDefined() PHI handling changes
The stage2-stage3 differences persist even without instcombine-based
PHI CSE, so this is the only possible reason.
2020-08-29 19:38:57 +03:00
Sanjay Patel
c28488b471 [EarlyCSE] fold commutable intrinsics
Handling the new min/max intrinsics is the motivation, but it
turns out that we have a bunch of other intrinsics with this
missing bit of analysis too.

The FP min/max tests show that we are intersecting FMF,
so that part should be safe too.

As noted in https://llvm.org/PR46897 , there is a commutative
property specifier for intrinsics, but no corresponding function
attribute, and so apparently no uses of that bit. We may want to
remove that next.

Follow-up patches should wire up the Instruction::isCommutative()
to this IntrinsicInst specialization. That requires updating
callers to be aware of the more general commutative property
(not just binops).

Differential Revision: https://reviews.llvm.org/D86798
2020-08-29 12:11:01 -04:00
Nikita Popov
417119145c [TargetLowering] Strip tailing whitespace (NFC) 2020-08-29 18:09:08 +02:00
Roman Lebedev
db63a47135 [InstCombine] Take 3: Perform trivial PHI CSE
The original take 1 was 6102310d814ad73eab60a88b21dd70874f7a056f,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.

However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb40d26b5d3ead3bb8e9e284218e0186
that reverted it:
> It appears to cause compilation non-determinism and caused stage3 mismatches.

Then there was take 2 3e69871ab5a66fb55913a2a2f5e7f5b42899a4c9,
which was InstCombine-specific, but it again showed stage2-stage3 differences,
and reverted in bdaa3f86a040b138c58de41d73d35b76fdec1380.
This is quite alarming.

Here, let's try to change how we find existing PHI candidate:
due to the worklist order, and the way PHI nodes are inserted
(it may be inserted as the first one, or maybe not), let's look at *all*
PHI nodes in the block.

Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |    \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| asm-printer.EmittedInsts                           | 7942329   | 7942457   |    128 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 254295632 | 254312480 |  16848 |    0.01% |    0.01% |
| correlated-value-propagation.NumPhis               | 18412     | 18347     |    -65 |   -0.35% |    0.35% |
| early-cse.NumCSE                                   | 2183283   | 2183267   |    -16 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 541842    |  -8263 |   -1.50% |    1.50% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640311   | 3644419   |   4108 |    0.11% |    0.11% |
| instcombine.NumDeadInst                            | 1778204   | 1783205   |   5001 |    0.28% |    0.28% |
| instcombine.NumPHICSEs                             | 0         | 22490     |  22490 |    0.00% |    0.00% |
| instcombine.NumWorklistIterations                  | 2023272   | 2024400   |   1128 |    0.06% |    0.06% |
| instcount.NumCallInst                              | 1758395   | 1758802   |    407 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330545    |    -12 |    0.00% |    0.00% |
| instcount.TotalBlocks                              | 1077138   | 1077220   |     82 |    0.01% |    0.01% |
| instcount.TotalFuncs                               | 101442    | 101441    |     -1 |    0.00% |    0.00% |
| instcount.TotalInsts                               | 8831946   | 8832606   |    660 |    0.01% |    0.01% |
| simplifycfg.NumHoistCommonCode                     | 24186     | 24187     |      1 |    0.00% |    0.00% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019813   | 999767    | -20046 |   -1.97% |    1.97% |
```
So it fires 22490 times, which is less than ~24k the take 1 did,
but more than what take 2 did (22228 times)
.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).

All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
2020-08-29 18:21:24 +03:00
Roman Lebedev
96e421f745 Revert "[InstCombine] Take 2: Perform trivial PHI CSE"
While the original variant with doing this in InstSimplify (rightfully)
caused questions and ultimately was detected to be a culprit
of stage2-stage3 mismatch, it was expected that
InstCombine-based implementation would be fine.

But apparently it's not, as
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24095/steps/compare-compilers/logs/stdio
suggests.

Which suggests that somewhere in InstCombine there is a loop
over nondeterministically sorted container, which causes
different worklist ordering.

This reverts commit 3e69871ab5a66fb55913a2a2f5e7f5b42899a4c9.
2020-08-29 16:05:02 +03:00
Nikita Popov
366797b8b3 [InstCombine] Return replaceInstUsesWith() result (NFC)
Follow the usual usage pattern for this function and return the
result.
2020-08-29 14:49:57 +02:00
Martin Storsjö
1137641bfa [AArch64] Generate and parse SEH assembly directives
This ensures that you get the same output regardless if generating
code directly to an object file or if generating assembly and
assembling that.

Add implementations of the EmitARM64WinCFI*() methods in
AArch64TargetAsmStreamer, and fill in one blank in MCAsmStreamer.

Add corresponding directive handlers in AArch64AsmParser and
COFFAsmParser.

Some SEH directive names have been picked to match the prior art
for SEH assembly directives for x86_64, e.g. the spelling of
".seh_startepilogue" matching the preexisting ".seh_endprologue".

For the directives for saving registers, the exact spelling
from the arm64 documentation is picked, e.g. ".seh_save_reg" (to follow
that naming for all the other ones, e.g. ".seh_save_fregp_x"), while
the corresponding one for x86_64 is plain ".seh_savereg" without the
second underscore.

Directives in the epilogues have the same names as in prologues,
e.g. .seh_savereg, even though the registers are restored, not
saved, at that point.

Differential Revision: https://reviews.llvm.org/D86529
2020-08-29 15:15:22 +03:00
Martin Storsjö
7550015c23 [MC] [Win64EH] Fill in FuncletOrFuncEnd if missing
This can happen e.g. for code that declare .seh_proc/.seh_endproc
in assembly, or for code that use .seh_handlerdata (which triggers
the unwind info to be emitted before the end of the function).

The TextSection field must be made non-const to be able to use it
with Streamer.SwitchSection().

Differential Revision: https://reviews.llvm.org/D86528
2020-08-29 15:15:22 +03:00
Roman Lebedev
ccee9bafd6 [InstCombine] foldAggregateConstructionIntoAggregateReuse(): use InstCombiner::replaceInstUsesWith() instead of RAUW
We really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.
2020-08-29 15:10:14 +03:00
Roman Lebedev
aa49db9ec5 [InstCombine] canonicalizeICmpPredicate(): use InstCombiner::replaceInstUsesWith() instead of RAUW
We really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.
2020-08-29 15:10:14 +03:00
Roman Lebedev
f95d822e44 [NFC][InstCombine] Fix some comments: the code already uses IC::replaceInstUsesWith() 2020-08-29 15:10:14 +03:00
Roman Lebedev
c05db21f5c [NFC] Instruction::isIdenticalToWhenDefined(): s/nessesairly/necessarily/ 2020-08-29 15:10:13 +03:00
Roman Lebedev
f2d160d30e [NFC][InstCombine] Add STATISTIC() for how many iterations we did
As we've established, if it takes more than two iterations
(one to perform folding and one to ensure that no folding opportunities
remain) per function, then there are worklist management issues.
So it may be interesting to keep track of it.
2020-08-29 15:10:13 +03:00
Roman Lebedev
a092faddfc [InstCombine] visitPHINode(): use InstCombiner::replaceInstUsesWith() instead of RAUW
As noted in post-commit review, we really shouldn't use RAUW in InstCombine
because we should consistently update Worklist to avoid extra iterations.
2020-08-29 15:10:00 +03:00
Roman Lebedev
2bf651df9f [InstCombine] Take 2: Perform trivial PHI CSE
The original take was 6102310d814ad73eab60a88b21dd70874f7a056f,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.

However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb40d26b5d3ead3bb8e9e284218e0186
that reverted it :
> It appears to cause compilation non-determinism and caused stage3 mismatches.

However InstCombine already does many different optimizations,
so it should be a safe place to do it here.

Note that we still can't just compare incoming values ranges,
because there is no guarantee that these PHI's we'd simplify to
were already re-visited and sorted.
However coming up with a test is problematic.

Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |      |%| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instcombine.NumPHICSEs                             | 0         | 22228     |  22228 |    0.00% |    0.00% |
| asm-printer.EmittedInsts                           | 7942329   | 7942456   |    127 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 254295632 | 254313792 |  18160 |    0.01% |    0.01% |
| early-cse.NumCSE                                   | 2183283   | 2183272   |    -11 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 541842    |  -8263 |   -1.50% |    1.50% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640311   | 3666911   |  26600 |    0.73% |    0.73% |
| instcombine.NumDeadInst                            | 1778204   | 1783318   |   5114 |    0.29% |    0.29% |
| instcount.NumCallInst                              | 1758395   | 1758804   |    409 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330549    |     -8 |    0.00% |    0.00% |
| instcount.TotalBlocks                              | 1077138   | 1077221   |     83 |    0.01% |    0.01% |
| instcount.TotalFuncs                               | 101442    | 101441    |     -1 |    0.00% |    0.00% |
| instcount.TotalInsts                               | 8831946   | 8832611   |    665 |    0.01% |    0.01% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019813   | 999740    | -20073 |   -1.97% |    1.97% |
```
So it fires ~22k times, which is less than ~24k the take 1 did.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).

All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
2020-08-29 13:13:06 +03:00
Nikita Popov
ae2c041ee4 [InstCombine] Fix typo in comment (NFC)
As pointed out in post-commit review of D63060.
2020-08-29 10:17:17 +02:00
Rainer Orth
69dc331d97 [Target][AArch64] Allow for char as int8_t in AArch64AsmParser.cpp
A couple of AArch64 tests were failing on Solaris, both sparc and x86:

  LLVM :: MC/AArch64/SVE/add-diagnostics.s
  LLVM :: MC/AArch64/SVE/cpy-diagnostics.s
  LLVM :: MC/AArch64/SVE/cpy.s
  LLVM :: MC/AArch64/SVE/dup-diagnostics.s
  LLVM :: MC/AArch64/SVE/dup.s
  LLVM :: MC/AArch64/SVE/mov-diagnostics.s
  LLVM :: MC/AArch64/SVE/mov.s
  LLVM :: MC/AArch64/SVE/sqadd-diagnostics.s
  LLVM :: MC/AArch64/SVE/sqsub-diagnostics.s
  LLVM :: MC/AArch64/SVE/sub-diagnostics.s
  LLVM :: MC/AArch64/SVE/subr-diagnostics.s
  LLVM :: MC/AArch64/SVE/uqadd-diagnostics.s
  LLVM :: MC/AArch64/SVE/uqsub-diagnostics.s

For example, reduced from `MC/AArch64/SVE/add-diagnostics.s`:

  add     z0.b, z0.b, #0, lsl #8

missed the expected diagnostics

  $ ./bin/llvm-mc -triple=aarch64 -show-encoding -mattr=+sve add.s
  add.s:1:21: error: immediate must be an integer in range [0, 255] with a shift amount of 0
  add     z0.b, z0.b, #0, lsl #8
                      ^

The message is `Match_InvalidSVEAddSubImm8`, emitted in the generated
`lib/Target/AArch64/AArch64GenAsmMatcher.inc` for `MCK_SVEAddSubImm8`.
When comparing the call to `::AArch64Operand::isSVEAddSubImm<char>` on both
Linux/x86_64 and Solaris, I find

  875	    bool IsByte = std::is_same<int8_t, std::make_signed_t<T>>::value;

is `false` on Solaris, unlike Linux.

The problem boils down to the fact that `int8_t` is plain `char` on
Solaris: both the sparc and i386 psABIs have `char` as signed.  However,
with

  9887	    DiagnosticPredicate DP(Operand.isSVEAddSubImm<int8_t>());

in `lib/Target/AArch64/AArch64GenAsmMatcher.inc`, `std::make_signed_t<int8_t>`
above yieds `signed char`, so `std::is_same<int8_t, signed char>` is `false`.

This can easily be fixed by also allowing for `int8_t` here and in a few
similar places.

Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D85225
2020-08-29 10:01:04 +02:00
Craig Topper
95cc907e21 [Attributes] Merge calls to getFnAttribute/hasFnAttribute using Attribute::isValid. NFC
Rather than calling hasFnAttribute and then calling getFnAttribute
if the attribute exists, its better to just call getFnAttribute and
then check if we got a valid attribute back.
2020-08-29 00:23:13 -07:00
Roman Lebedev
2a46a04b28 [NFC][InstructionSimplify] Add a warning about not simplifying to not def-reachable
See
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824235.html
and
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200824/824967.html

InstSimply is not allowed to perform simplifications to instructions
that are not def-reachable from the original instruction.
2020-08-29 09:58:08 +03:00
Xing GUO
1d4cd95b57 [DWARFYAML] Make the debug_abbrev_offset field optional.
This patch helps make the debug_abbrev_offset field optional. We don't
need to calculate the value of this field in the future.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D86614
2020-08-29 14:54:52 +08:00
Kai Luo
fabcc59830 [DAGCombiner] Enhance (zext(setcc))
Current `v:t = zext(setcc x,y,cc)` will be transformed to `select x, y, 1:t, 0:t, cc`. It misses some opportunities if x's type size is less than `t`'s size. This patch enhances the above transformation.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86687
2020-08-29 03:37:41 +00:00
Akira Hatanaka
b7e85bfcf7 [ObjC][ARC] In HandlePotentialAlterRefCount, check whether an
instruction can decrement the reference count, not whether it can alter
it

This prevents the state transition from S_Use to S_CanRelease when doing
a bottom-up traversal and the transition from S_Retain to S_CanRelease
when doing a top-down traversal when the visited instruction can
increment the ref count but cannot decrement it. This allows the ARC
optimizer to remove retain/release pairs which were previously not
removed.

rdar://problem/21793154
2020-08-28 17:45:14 -07:00
Owen Anderson
df34423d50 Revert "[InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block"
This reverts commit 6102310d814ad73eab60a88b21dd70874f7a056f.  It
appears to cause compilation non-determinism and caused stage3
mismatches.
2020-08-28 23:43:42 +00:00
Fangrui Song
311c781d4a [gcov] Increment counters with atomicrmw if -fsanitize=thread
Without this patch, `clang --coverage -fsanitize=thread` may fail spuriously
because non-atomic counter increments can be detected as data races.
2020-08-28 16:32:35 -07:00
Matt Arsenault
9e320d7adb GlobalISel: Combine out redundant sext_inreg
The scalar tests don't work yet, since computeNumSignBits apparently
doesn't handle sextload yet, and sext folds into the load first.
2020-08-28 17:57:31 -04:00
Jon Roelofs
2cc95b4ba6 [early-ifcvt] Add OptRemarks 2020-08-28 15:51:18 -06:00
Matt Arsenault
db8aa7921c AMDGPU: Fix incorrectly deleting copies after spilling SGPR tuples
The implicit def of the super register would appear to kill any live
uses of components before the spill, and would be deleted by
MachineCopyPropagation. We need to add implicit uses of the super
register, similarly to what copyPhysReg does. VGPR tuples appear to be
correctly handled already. I need to double check the SGPR->memory
path.
2020-08-28 17:50:37 -04:00
Craig Topper
09050e4cf7 [Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None)
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.

So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.

This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.

Differential Revision: https://reviews.llvm.org/D86744
2020-08-28 13:23:45 -07:00
Arthur Eubanks
82da713351 [ObjCARCOpt] Port objc-arc to NPM
Since doInitialization() in the legacy pass modifies the module, the NPM
pass is a Module pass.

Reviewed By: ahatanak, ychen

Differential Revision: https://reviews.llvm.org/D86178
2020-08-28 12:59:33 -07:00
Tyker
70c01602af [SROA] Improve handleling of assumes bundles by SROA
This patch fixes this crash https://gcc.godbolt.org/z/Ps8d1e
And gives SROA the ability to remove assumes if it allows promoting an alloca to register
Without removing assumes when it can't promote to register.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86570
2020-08-28 21:55:45 +02:00
Nikita Popov
2d55005b0d [InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178)
Fixes https://bugs.llvm.org/show_bug.cgi?id=42178 by folding
usub.sat(a, b) + b to umax(a, b). The backend will expand umax
back to usubsat if that is profitable.

We may also want to handle uadd.sat(a, b) - b in the future.

Differential Revision: https://reviews.llvm.org/D63060
2020-08-28 21:52:29 +02:00
serge-sans-paille
41f672146f Skip analysis re-computation when no changes are reported
This is a follow-up to https://reviews.llvm.org/D80707, generalized to
CallGraphSCC, Loop and Region

Differential Revision: https://reviews.llvm.org/D86442
2020-08-28 21:41:01 +02:00
Sjoerd Meijer
acefed239a [ARM] Skip combining base updates for vld1x NEON intrinsics
Skip this for now, to avoid a backend crash in:

  UNREACHABLE executed at llvm/lib/Target/ARM/ARMISelLowering.cpp:13412

This should fix PR45824.

Differential Revision: https://reviews.llvm.org/D86784
2020-08-28 20:29:15 +01:00
Benjamin Kramer
3e713ff98e Strength-reduce SmallVectors to arrays. NFCI. 2020-08-28 21:14:20 +02:00
Benjamin Kramer
1b1f975a90 [CodeGenPrepare] Zap the argument of llvm.assume when deleting it
We know that the argument is mostly likely dead, so we can purge it
early. Otherwise it would make it to codegen, and can block further
optimizations.
2020-08-28 20:52:22 +02:00
Snehasish Kumar
69b963173e [llvm][CodeGen] Machine Function Splitter
We introduce a codegen optimization pass which splits functions into hot and cold
parts. This pass leverages the basic block sections feature recently
introduced in LLVM from the Propeller project. The pass targets
functions with profile coverage, identifies cold blocks and moves them
to a separate section. The linker groups all cold blocks across
functions together, decreasing fragmentation and improving icache and
itlb utilization.

We evaluated the Machine Function Splitter pass on clang bootstrap and
SPECInt 2017.

For clang bootstrap we observe a mean 2.33% runtime improvement with a
~32% reduction in itlb and stlb misses. Additionally, L1 icache misses
reduced by 9.5% while L2 instruction misses reduced by 20%.

For SPECInt we report the change in IntRate the C/C++
benchmarks. All benchmarks apart from mcf and x264 improve, on average
by 0.6% with the max for deepsjeng at 1.6%.

Benchmark		% Change
500.perlbench_r		 0.78
502.gcc_r		 0.82
505.mcf_r		-0.30
520.omnetpp_r		 0.18
523.xalancbmk_r		 0.37
525.x264_r		-0.46
531.deepsjeng_r		 1.61
541.leela_r		 0.83
557.xz_r		 0.15

Differential Revision: https://reviews.llvm.org/D85368
2020-08-28 11:10:14 -07:00
Anna Welker
ec183de342 [ARM][MVE] Enable MVE gathers and scatters by default
Enable MVE gather/scatters by default, which requires some
minor adaptations in some tests.

Differential revision: https://reviews.llvm.org/D86776
2020-08-28 19:05:29 +01:00
David Green
1279c28977 [ARM] Correct predicate operand for offset gather/scatter
These arm_mve_vldr_gather_offset_predicated and
arm_mve_vstr_scatter_offset_predicated have some extra parameters
meaning the predicate is at a later operand. If a loop contains _only_
those masked instructions, we would miss transforming the active lane
mask.

Differential Revision: https://reviews.llvm.org/D86791
2020-08-28 17:48:15 +01:00
Albion Fung
c16246d9bf [PowerPC] Implemented Vector Load with Zero and Signed Extend Builtins
This patch implements the builtins for Vector Load with Zero and Signed Extend Builtins (lxvr_x for b, h, w, d), and adds the appropriate test cases for these builtins. The builtins utilize the vector load instructions itnroduced with ISA 3.1.

Differential Revision: 	https://reviews.llvm.org/D82502#inline-797941
2020-08-28 11:28:58 -05:00
Denis Antrushin
34d811b345 [Statepoint] Always spill base pointer.
There is a subtle problem with new statepoint lowering scheme
when base and pointers are the same (see PR46917 for more context):

%1 = STATEPOINT ... %0, %0(tied-def 0)...

if, for some reason, register allocator desides to put two instances
of %0 into two different objects (registers or spill slots), we may
end up with

$reg3 = STATEPOINT ... $reg2, $reg1(tied-def 0)...

and nothing will prevent later passes to sink uses of $reg2 below
statepoint, which is incorrect.

As a short term solution, always put base pointers on stack during
lowering.
A longer term solution may be to rework MIR statepoint format to
avoid GC pointer duplication in statepoint argument list.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D86712
2020-08-28 23:22:07 +07:00
Yonghong Song
ac35f59b91 [GlobalISel] fix a compilation error with gcc 6.3.0
With gcc 6.3.0, I hit the following compilation error:
  ../lib/CodeGen/GlobalISel/Combiner.cpp: In member function
      ‘bool llvm::Combiner::combineMachineInstrs(llvm::MachineFunction&,
       llvm::GISelCSEInfo*)’:
  ../lib/CodeGen/GlobalISel/Combiner.cpp:156:54: error: suggest parentheses
       around ‘&&’ within ‘||’ [-Werror=parentheses]
     assert(!CSEInfo || !errorToBool(CSEInfo->verify()) &&
                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
                            "CSEInfo is not consistent. Likely missing calls to "
                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                            "observer on mutations");

Fix the code as suggested by the compiler.
2020-08-28 09:16:52 -07:00
QingShan Zhang
8eac4059bb [DAGCombine] Don't delete the node if it has uses immediately
This is the follow up patch for https://reviews.llvm.org/D86183 as we miss to delete the node if NegX == NegY, which has use after we create the node.
```
    if (NegX && (CostX <= CostY)) {
      Cost = std::min(CostX, CostZ);
      RemoveDeadNode(NegY);
      return DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags);  #<-- NegY is used here if NegY == NegX.
    }
```

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D86689
2020-08-28 16:13:43 +00:00
David Sherwood
56b8c35591 [SVE] Make ElementCount members private
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().

Differential Revision: https://reviews.llvm.org/D86065
2020-08-28 14:43:53 +01:00
Xing GUO
c0f8e4de72 [DWARFYAML] Abbrev codes in a new abbrev table should start from 1 (by default).
The abbrev codes in a new abbrev table should start from 1 (by default),
rather than inherit the value from the code in the previous table.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D86545
2020-08-28 21:18:11 +08:00
Denis Antrushin
2a1fa7b84c [Statepoint] Turn assert into check in foldPatchpoint.
Original D81646 had check for tied regs in foldPatchpoint().
Due to unfortunate miscommunication with review comments and
adressing some comments post commit, it turned into assertion.

We had an offline talk and agreed that with current implementation
this path is possible, so I'm changing it back to check.

Note that this is workaround until ussues described in PR46917 are
resolved.
2020-08-28 20:00:23 +07:00
Sam Parker
710437b36d [ARM][LowOverheadLoops] Liveouts and reductions
Remove the code that tried to look for reduction patterns, since the
vectorizer and isel can now produce predicated arithmetic instructios
within the loop body. This has required some reorganisation and fixes
around live-out and predication checks, as well as looking for cases
where an input/output is initialised to zero.

Differential Revision: https://reviews.llvm.org/D86613
2020-08-28 13:56:16 +01:00
Benjamin Kramer
72792b57de [SCCP] Use bulk-remove API to bulk-remove attributes. NFCI. 2020-08-28 14:44:14 +02:00
Benjamin Kramer
2b45a4f499 [FunctionAttrs] Bulk remove attributes. NFC. 2020-08-28 12:56:19 +02:00
Ties Stuij
dedddbe502 [AArch64][CodeGen] Restrict bfloat vector operations to what's actually supported
Previously in addTypeForNeon, we would set the operations for bfloat vectors
like other generic types. But as bfloat is a storage-only type a number of
operations shouldn't be set. This patch fixes that.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D85101
2020-08-28 11:44:37 +01:00
Florian Hahn
01fbf332f2 [DSE,MemorySSA] Check if Current is valid for elimination first.
This changes getDomMemoryDef to check if a Current is a valid
candidate for elimination before checking for reads. Before the change,
we were spending a lot of compile-time in checking for read accesses for
Current that might not even be removable.

This patch flips the logic, so we skip Current if they cannot be
removed before checking all their uses. This is much more efficient in
practice.

It also adds a more aggressive limit for checking partially overlapping
stores. The main problem with overlapping stores is that we do not know
if they will lead to elimination until seeing all of them. This patch
limits adds a new limit for overlapping store candidates, which keeps
the number of modified overlapping stores roughly the same.

This is another substantial compile-time improvement (while also
increasing the number of stores eliminated). Geomean -O3 -0.67%,
ReleaseThinLTO -0.97%.

http://llvm-compile-time-tracker.com/compare.php?from=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&to=2e630629b43f64b60b282e90f0d96082fde2dacc&stat=instructions

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86487
2020-08-28 11:19:04 +01:00
Florian Hahn
1e84da9f17 [MemLoc] Support memcmp in MemoryLocation::getForArgument.
This patch adds support for memcmp in MemoryLocation::getForArgument.
memcmp reads from the first 2 arguments up to the number of bytes of the
third argument.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86725
2020-08-28 10:19:54 +01:00
Florian Hahn
49f842e11f [BuildLibCalls] Add argmemonly to more lib calls.
strspn, strncmp, strcspn, strcasecmp, strncasecmp, memcmp, memchr,
memrchr, memcpy, memmove, memcpy, mempcpy, strchr, strrchr, bcmp
should all only access memory through their arguments.

I broke out strcoll, strcasecmp, strncasecmp because the result
depends on the locale, which might get accessed through memory.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86724
2020-08-28 09:50:38 +01:00
Martin Storsjö
47182191ac [ValueTracking] Remove a stray semicolon. NFC.
This silences warnings when built with GCC at least.
2020-08-28 09:24:10 +03:00
Martin Storsjö
831184291b [MC] [Win64EH] Avoid producing malformed xdata records
If there's no unwinding opcodes, omit writing the xdata/pdata records.

Previously, this generated truncated xdata records, and llvm-readobj
would error out when trying to print them.

If writing of an xdata record is forced via the .seh_handlerdata
directive, skip it if there's no info to make a sensible unwind
info structure out of, and clearly error out if such info appeared
later in the process.

Differential Revision: https://reviews.llvm.org/D86527
2020-08-28 09:05:36 +03:00
serge-sans-paille
a12b4db565 (Expensive) Check for Loop, SCC and Region pass return status
This generalizes the logic introduced in https://reviews.llvm.org/D80916 to
other passes.

It's needed by https://reviews.llvm.org/D86442 to assert passes correctly report
their status.

Differential Revision: https://reviews.llvm.org/D86589
2020-08-28 07:56:35 +02:00
Kai Luo
1f90029baa [PowerPC] PPCBoolRetToInt: Don't translate Constant's operands
When collecting `i1` values via `findAllDefs`, ignore Constant's
operands, since Constant's operands might not be `i1`.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46923 which causes ICE
```
llvm-project/llvm/lib/IR/Constants.cpp:1924: static llvm::Constant *llvm::ConstantExpr::getZExt(llvm::Constant *, llvm::Type *, bool): Assertion `C->getType()->getScalarSizeInBits() < Ty->getScalarSizeInBits()&& "SrcTy must be smaller than DestTy for ZExt!"' failed.
```

Differential Revision: https://reviews.llvm.org/D85007
2020-08-28 01:56:12 +00:00
Alina Sbirlea
ee5e19fe38 [MemorySSA] Assert defining access is not a MemoryUse. 2020-08-27 18:21:10 -07:00
Harmen Stoppels
e6870b67d6 Revert "Use find_library for ncurses"
The introduction of find_library for ncurses caused more issues than it solved problems. The current open issue is it makes the static build of LLVM fail. It is better to revert for now, and get back to it later.

Revert "[CMake] Fix an issue where get_system_libname creates an empty regex capture on windows"
This reverts commit 1ed1e16ab83f55d85c90ae43a05cbe08a00c20e0.

Revert "Fix msan build"
This reverts commit 34fe9613dda3c7d8665b609136a8c12deb122382.

Revert "[CMake] Always mark terminfo as unavailable on Windows"
This reverts commit 76bf26236f6fd453343666c3cd91de8f74ffd89d.

Revert "[CMake] Fix OCaml build failure because of absolute path in system libs"
This reverts commit 8e4acb82f71ad4effec8895b8fc957189ce95933.

Revert "[CMake] Don't look for terminfo libs when LLVM_ENABLE_TERMINFO=OFF"
This reverts commit 495f91fd33d492941c39424a32cf24bcfe192f35.

Revert "Use find_library for ncurses"
This reverts commit a52173a3e56553d7b795bcf3cdadcf6433117107.

Differential revision: https://reviews.llvm.org/D86521
2020-08-27 17:57:26 -07:00
Matt Arsenault
f54c1fe9ec GlobalISel: Implement computeNumSignBits for G_SEXT_INREG 2020-08-27 19:44:37 -04:00
Matt Arsenault
1fc1020e9e AMDGPU/GlobalISel: Implement computeKnownBits for groupstaticsize 2020-08-27 19:39:44 -04:00
Matt Arsenault
aacb2d3455 AMDGPU: Fix broken switch braces 2020-08-27 19:39:39 -04:00
Matt Arsenault
62f7266a72 Correctly revert "GlobalISel: Use & operator on KnownBits"
I mis-resolved the revert through moving the code to another function.
2020-08-27 19:08:31 -04:00
Matt Arsenault
bb532337e4 Revert "GlobalISel: Use & operator on KnownBits"
This reverts commit e53b799779b079a70f600e5cad2ab7267d66b1b7.

Confusingly, this does not simply and the two sets of known bits, but
implements known bits for the and operator.
2020-08-27 18:52:34 -04:00
Vitaly Buka
2df7657efb [ValueTracking] Replace recursion with Worklist
Now findAllocaForValue can handle nontrivial phi cycles.
2020-08-27 14:44:49 -07:00
Brad Smith
a1c364cc4b [SSP] Restore setting the visibility of __guard_local to hidden for better code generation.
Patch by: Philip Guenther
2020-08-27 17:17:38 -04:00
Shinji Okumura
57bd1d9114 [Attributor] Do not manifest noundef for dead positions
Even if noundef is deduced for a position, we should not manifest it when the position is dead.
This is because the associated values with dead positions are replaced with undef values by AAIsDead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86565
2020-08-28 05:58:18 +09:00
Matt Arsenault
2cd41c208e GlobalISel: Implement known bits for min/max 2020-08-27 16:56:17 -04:00
Matt Arsenault
fcf8b40603 MIR: Infer not-SSA for subregister defs
It's possible to have a single virtual register def with a subreg
index that would pass the previous check, but it's not possible to
have a subregister def in SSA.

This is in preparation for adding stricter checks for SSA MIR.
2020-08-27 16:56:16 -04:00
Vitaly Buka
1490706ac2 [StackSafety] Ignore allocas with partial lifetime markers
Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D86672
2020-08-27 13:54:41 -07:00
Vitaly Buka
4f88f9a474 [NFC][ValueTracking] Add OffsetZero into findAllocaForValue
For StackLifetime after finding alloca we need to check that
values ponting to the begining of alloca.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D86692
2020-08-27 13:46:22 -07:00
Matt Arsenault
a7b8916a32 AMDGPU: Use caller subtarget, not intrinsic declaration
Intrinsic declarations use the default subtarget, but this should be
using the subtarget for the calling function. I haven't been able to
come up with a case where it matters though.
2020-08-27 16:42:09 -04:00
Krzysztof Parzyszek
9ae5fb90d3 [Hexagon] Emit better 32-bit multiplication sequence for HVXv62+ 2020-08-27 15:24:32 -05:00
Eli Friedman
4c050f47e9 [RegisterScavenging] Delete dead function unprocess(). 2020-08-27 13:19:32 -07:00
Roman Lebedev
0c6e84d654 [InstSimplify] SimplifyPHINode(): check that instruction is in basic block first
As pointed out in post-commit review, this can legally be called
on instructions that are not inserted into basic blocks,
so don't blindly assume that there is basic block.
2020-08-27 22:32:03 +03:00
Christopher Tetreault
06d41df75b [SVE] Remove bad call to VectorType::getNumElements() from HeapProfiler
Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D86727
2020-08-27 12:16:00 -07:00
Shinji Okumura
6ff8397df1 [Attributor] Guarantee getAAFor not to update AA in the manifestation stage
If we query an AA with `Attributor::getAAFor` in `AbstractAttribute::manifest`, the AA may be updated.
This patch makes use of the phase flag in Attributor, and handle `getAAFor` behavior according to the flag.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86635
2020-08-28 04:07:42 +09:00
Christopher Tetreault
c8d8d4e8c9 [SVE] Remove calls to VectorType::getNumElements from Transforms/Vectorize
Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D82056
2020-08-27 12:02:20 -07:00
Christopher Tetreault
c849a98c1b [SVE] Remove calls to VectorType::getNumElements from IR
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D81500
2020-08-27 11:16:10 -07:00
Matt Arsenault
9b9f0675a4 GlobalISel: Use & operator on KnownBits
Avoid repeating for zero and one
2020-08-27 14:07:18 -04:00
Matt Arsenault
b3488037c4 GlobalISel: Implement known bits for G_MERGE_VALUES 2020-08-27 14:07:18 -04:00
Mikhail Maltsev
f7e914e2c5 [ARM][BFloat16] Change types of some Arm and AArch64 bf16 intrinsics
This patch adjusts the following ARM/AArch64 LLVM IR intrinsics:
- neon_bfmmla
- neon_bfmlalb
- neon_bfmlalt
so that they take and return bf16 and float types. Previously these
intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from
implementation lacking bf16 IR type).

The neon_vbfdot[q] intrinsics are adjusted similarly. This change
required some additional selection patterns for vbfdot itself and
also for vector shuffles (in a previous patch) because of SelectionDAG
transformations kicking in and mangling the original code.

This patch makes the generated IR cleaner (less useless bitcasts are
produced), but it does not affect the final assembly.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D86146
2020-08-27 18:43:16 +01:00
Craig Topper
5106495e6e [X86] Don't call hasFnAttribute and getFnAttribute for 'prefer-vector-width' and 'min-legal-vector-width' in getSubtargetImpl
We only need to call getFnAttribute and then check if the Attribute
is None or not.
2020-08-27 10:40:20 -07:00
Owen Anderson
36eea6621c Reapply D70800: Fix AArch64 AAPCS frame record chain
Original Commit Message:
After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register)
may be placed in the middle of a stack frame if a function has both callee-saved
general-purpose registers and floating point registers. This will break the stack unwinders
that simply walk through the frame records (based on the guarantee from AAPCS64
"The Frame Pointer" section). This commit fixes the problem by adding the frame record offset.

Patch By: logan
Differential Revision: D70800
2020-08-27 17:29:41 +00:00
Teresa Johnson
362b3e8fee [HeapProf] Fix bot failures from instrumentation pass
Fix bot failure from 7ed8124d46f94601d5f1364becee9cee8538265e:
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/8533

Since we are always using dynamic shadow,
insertDynamicShadowAtFunctionEntry should always return true for
modifying the function.
2020-08-27 10:21:19 -07:00
Aditya Nandakumar
90acb0696f [GISel] Add new GISel combiners for G_SELECT
https://reviews.llvm.org/D83833

Patch adds two new GICombinerRules for G_SELECT. The rules include:
combining selects with undef comparisons into their first selectee value,
and to combine away selects with constant comparisons. Patch additionally
adds a new combiner test for the AArch64 target to test these new G_SELECT
combiner rules and the existing select_same_val combiner rule.

Patch by  mkitzan
2020-08-27 09:40:15 -07:00
Simon Moll
2afc6e83ea [sda][nfc] clang-formatting 2020-08-27 18:27:44 +02:00
Shinji Okumura
1054ad3409 [Attributor] Add a phase flag to Attributor
Add a new flag that indicates which stage in the process we are in.
This flag is introduced for handling behavior of `getAAFor` according to the stage. (discussed in D86635)

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D86678
2020-08-28 01:16:38 +09:00
Aditya Nandakumar
46055bcec5 [GISel]: Fix one more CSE Non determinism
https://reviews.llvm.org/D86676

Sometimes we can have the following code

 x:gpr(s32) = G_OP

Say we build G_OP2 to the same x and then delete the previous instruction. Using something like

 Register X = ...;
 auto NewMIB = CSEBuilder.buildOp2(X, ... args);

Currently there's a mismatch in how NewMIB is profiled and inserted into the CSEMap (ie it doesn't consider register bank/register class along with type).Unify the profiling by refactoring and calling the common method.

This was found by turning on the CSEInfo::verify in at the end of each of our GISel passes which turns inconsistent state/non determinism in CSEing into crashes which likely usually indicates missing calls to Observer on mutations (the most common case). Here non determinism usually means not cseing sometimes, but almost never about producing incorrect code.
Also this patch adds this verification at the end of the combiners as well.
2020-08-27 09:06:21 -07:00
Lucas Prates
4265553ed8 [CodeGen] Properly propagating Calling Convention information when lowering vector arguments
When joining the legal parts of vector arguments into its original value
during the lower of Formal Arguments in SelectionDAGBuilder, the Calling
Convention information was not being propagated for the handling of each
individual parts. The same did not happen when lowering calls, causing a
mismatch.

This patch fixes the issue by properly propagating the Calling
Convention details.

This fixes Bugzilla #47001.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D86715
2020-08-27 17:01:10 +01:00
Teresa Johnson
bce40acc54 [HeapProf] Clang and LLVM support for heap profiling instrumentation
See RFC for background:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html

Note that the runtime changes will be sent separately (hopefully this
week, need to add some tests).

This patch includes the LLVM pass to instrument memory accesses with
either inline sequences to increment the access count in the shadow
location, or alternatively to call into the runtime. It also changes
calls to memset/memcpy/memmove to the equivalent runtime version.
The pass is modeled on the address sanitizer pass.

The clang changes add the driver option to invoke the new pass, and to
link with the upcoming heap profiling runtime libraries.

Currently there is no attempt to optimize the instrumentation, e.g. to
aggregate updates to the same memory allocation. That will be
implemented as follow on work.

Differential Revision: https://reviews.llvm.org/D85948
2020-08-27 08:50:35 -07:00
Roman Lebedev
2088bfe3c4 [InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block
Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify,
nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..

While i could teach EarlyCSE how to hash PHI nodes,
we can't really do much (anything?) even if we find two identical
PHI nodes in different basic blocks, same-BB case is the interesting one,
and if we teach InstSimplify about it (which is what i wanted originally,
https://reviews.llvm.org/D86530), we get EarlyCSE support for free.

So i would think this is pretty uncontroversial.

On vanilla llvm test-suite + RawSpeed, this has the following effects:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |    \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instsimplify.NumPHICSE                             | 0         | 23779     |  23779 |    0.00% |    0.00% |
| asm-printer.EmittedInsts                           | 7942328   | 7942392   |     64 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 273069192 | 273084704 |  15512 |    0.01% |    0.01% |
| correlated-value-propagation.NumPhis               | 18412     | 18539     |    127 |    0.69% |    0.69% |
| early-cse.NumCSE                                   | 2183283   | 2183227   |    -56 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 542090    |  -8015 |   -1.46% |    1.46% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640264   | 3664769   |  24505 |    0.67% |    0.67% |
| instcombine.NumDeadInst                            | 1778193   | 1783183   |   4990 |    0.28% |    0.28% |
| instcount.NumCallInst                              | 1758401   | 1758799   |    398 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330533    |    -24 |   -0.01% |    0.01% |
| instcount.TotalInsts                               | 8831952   | 8832286   |    334 |    0.00% |    0.00% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019808   | 999607    | -20201 |   -1.98% |    1.98% |
```
I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call`
transforms, and counter-intuitively results in *more* instructions total.

That being said, the PHI count doesn't decrease that much,
and looking at some examples, it seems at least some of them
were previously getting PHI CSE'd in SimplifyCFG of all places..

I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time.
As a comment in `InstCombinerImpl::visitPHINode()` already stated,
there are no guarantees on the ordering of the operands of a PHI node,
so if we just naively compare them, we may false-negatively say that
the nodes are not equal when the only difference is operand order,
which is especially important since the fold is in InstSimplify,
so we can't rely on InstCombine sorting them beforehand.

Fixing this for the general case is costly (geomean +0.02%),
and does not appear to catch anything in test-suite, but for
the same-BB case, it's trivial, so let's fix at least that.

As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
this appears to cause geomean +0.03% compile time increase (regression),
but geomean -0.01%..-0.04% code size decrease (improvement).
2020-08-27 18:47:04 +03:00
Alexandre Ganea
1c64f56c35 [Support] On Windows, add optional support for {rpmalloc|snmalloc|mimalloc}
This patch optionally replaces the CRT allocator (i.e., malloc and free) with rpmalloc (mixed public domain licence/MIT licence) or snmalloc (MIT licence) or mimalloc (MIT licence). Please note that the source code for these allocators must be available outside of LLVM's tree.

To enable, use `cmake ... -DLLVM_INTEGRATED_CRT_ALLOC=D:/git/rpmalloc -DLLVM_USE_CRT_RELEASE=MT` where `D:/git/rpmalloc` has already been git clone'd from `https://github.com/mjansson/rpmalloc`. The same applies to snmalloc and mimalloc.

When enabled, the allocator will be embeded (statically linked) into the LLVM tools & libraries. This currently only works with the static CRT (/MT), although using the dynamic CRT (/MD) could potentially work as well in the future.

When enabled, this changes the memory stack from:
  new/delete -> MS VC++ CRT malloc/free -> HeapAlloc -> VirtualAlloc
to:
  new/delete -> {rpmalloc|snmalloc|mimalloc} -> VirtualAlloc

The goal of this patch is to bypass the application's global heap - which is thread-safe thus inducing locking - and instead take advantage of a modern lock-free, thread cache, allocator. On a 6-core Xeon Skylake we observe a 2.5x decrease in execution time when linking a large scale application with LLD and ThinLTO (12 min 20 sec -> 5 min 34 sec), when all hardware threads are being used (using LLD's flag /opt:lldltojobs=all). On a dual 36-core Xeon Skylake with all hardware threads used, we observe a 24x decrease in execution time (1 h 2 min -> 2 min 38 sec) when linking a large application with LLD and ThinLTO. Clang build times also see a decrease in the range 5-10% depending on the configuration.

Differential Revision: https://reviews.llvm.org/D71786
2020-08-27 11:09:46 -04:00
diggerlin
5d038d06f5 Revert "[AIX][XCOFF] emit symbol visibility for xcoff object file."
This reverts commit a0818689213234d5a078641432d10eccccf61a13.

Based on the Hubert Tong'comment  https://reviews.llvm.org/D84265#inline-799085
2020-08-27 11:07:58 -04:00
Benjamin Kramer
1af5e41f38 [Hexagon] Fold another layer of single-use variable into assert. NFCI. 2020-08-27 16:52:34 +02:00
Benjamin Kramer
f7ce567fa8 [Hexagon] Fold single-use variable into assert. NFCI. 2020-08-27 16:44:22 +02:00
Matt Arsenault
6d03e1f153 AMDGPU: Hoist subtarget lookup 2020-08-27 10:27:56 -04:00
Krzysztof Parzyszek
fd8ba7e648 [Hexagon] Widen short vector stores to HVX vectors using masked stores
Also invent a flag -hexagon-hvx-widen=N to set the minimum threshold
for widening short vectors to HVX vectors.
2020-08-27 09:25:08 -05:00
Florian Hahn
cdab0a3e82 [SimplifyLibCalls] Remove over-eager early return in strlen optzns.
Currently we bail out early for strlen calls with a GEP operand, if none
of the GEP specific optimizations fire. But there could be later
optimizations that still apply,  which we currently miss out on.

An example is that we do not apply the following optimization
   strlen(x) == 0 --> *x == 0

Unless I am missing something, there seems to be no reason for bailing
out early there.

Fixes PR47149.

Reviewed By: lebedev.ri, xbolva00

Differential Revision: https://reviews.llvm.org/D85886
2020-08-27 15:19:45 +01:00
Pavel Labath
e258c2d84b [cmake] Make gtest include directories a part of the library interface
This applies the same fix that D84748 did for macro definitions.
Appropriate include path is now automatically set for all libraries
which link against gtest targets, which avoids the need to set
include_directories in various parts of the project.

Differential Revision: https://reviews.llvm.org/D86616
2020-08-27 15:35:57 +02:00
serge-sans-paille
65b0561260 Fix OpenMP deduplicateRuntimeCalls return status
Differential Revision: https://reviews.llvm.org/D86705
2020-08-27 15:01:04 +02:00
serge-sans-paille
7d0233352d Fix Attributor return status
Differential Revision: https://reviews.llvm.org/D86703
2020-08-27 15:01:04 +02:00
Jay Foad
a697cc2007 [AMDGPU] Remove unused variable introduced in r251860 2020-08-27 13:28:32 +01:00
Drew Wock
2a5c21f1b1 [FPEnv] Allow fneg + strict_fadd -> strict_fsub in DAGCombiner
This is the first of a set of DAGCombiner changes enabling strictfp
optimizations. I want to test to waters with this to make sure changes
like these are acceptable for the strictfp case- this particular change
should preserve exception ordering and result precision perfectly, and
many other possible changes appear to be able to as well.

Copied from regular fadd combines but modified to preserve ordering via
the chain, this change allows strict_fadd x, (fneg y) to become
struct_fsub x, y and strict_fadd (fneg x), y to become strict_fsub y, x.

Differential Revision: https://reviews.llvm.org/D85548
2020-08-27 08:17:01 -04:00
Florian Hahn
c351843892 [DSE,MemorySSA] Remove short-cut to check if all paths are covered.
The post-order number early continue does not work in some cases, e.g.
if a path from EarlierAccess to an exit includes a node that dominates
EarlierAccess in a cycle.

The short-cut only has very minor impact on compile-time, so it seems
straight-forward to remove it for now:

http://llvm-compile-time-tracker.com/compare.php?from=062412e79fcfedf2cf004433e42036b0333e3f83&to=d7386016a77ce1387bdbbf360f1de157faea9d31&stat=instructions

Fixes PR47285.
2020-08-27 12:42:40 +01:00
OCHyams
5a27fa8d7e Revert "[DWARF] Add cuttoff guarding quadratic validThroughout behaviour"
This reverts commit b9d977b0ca60c54f11615ca9d144c9f08b29fd85.

This cutoff is no longer required. The commit 34ffa7fc501 (D86153) introduces a
performance improvement which was tested against the motivating case for this
patch.

Discussed in differential revision: https://reviews.llvm.org/D86153
2020-08-27 11:52:30 +01:00
OCHyams
3e2347db56 [DwarfDebug] Improve validThroughout performance (4/4)
Almost NFC (see end).

The backwards scan in validThroughout significantly contributed to compile time
for a pathological case, causing the 'X86 Assembly Printer' pass to account for
roughly 70% of the run time. This patch guards the loop against running
unnecessarily, bringing the pass contribution down to 4%.

Almost NFC: There is a hack in validThroughout which promotes single constant
value DBG_VALUEs in the prologue to be live throughout the function. We're more
likely to hit this code path with this patch applied. Similarly to the parent
patches there is a small coverage change reported in the order of 10s of bytes.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D86153
2020-08-27 11:52:30 +01:00
OCHyams
4d679a7eae [DwarfDebug] Improve multi-BB single location detection in validThroughout (3/4)
With the changes introduced in D86151 we can now check for single locations
which span multiple blocks for inlined scopes and blocks.

D86151 introduced the InstructionOrdering parameter, replacing a scan through
MBB instructions. The functionality to compare instruction positions across
blocks was add there, and this patch just removes the exit checks that were
previously (but no longer) required.

CTMark shows a geomean binary size reduction of 2.2% for RelWithDebInfo builds.
llvm-locstats (using D85636) shows a very small variable location coverage
change in 5 of 10 binaries, but just like in D86151 it is only in the order of
10s of bytes.

Reviewed By: djtodoro

Differential Revision: https://reviews.llvm.org/D86152
2020-08-27 11:52:29 +01:00
OCHyams
463930d1cb [DwarfDebug] Improve single location detection in validThroughout (2/4)
With this patch we're now accounting for two more cases which should be
considered 'valid throughout': First, where RangeEnd is ScopeEnd. Second, where
RangeEnd comes before ScopeEnd when including meta instructions, but are both
preceded by the same non-meta instruction.

CTMark shows a geomean binary size reduction of 1.5% for RelWithDebInfo builds.
`llvm-locstats` (using D85636) shows a very small variable location coverage
change in 2 of 10 binaries, but it is in the order of 10s of bytes which lines
up with my expectations.

I've added a test which checks both of these new cases. The first check in the
test isn't strictly necessary for this patch. But I'm not sure that it is
explicitly tested anywhere else, and is useful for the final patch in the
series.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D86151
2020-08-27 11:52:29 +01:00
OCHyams
adfe449a17 [NFC][DebugInfo] Create InstructionOrdering helper class (1/4)
Group the map and methods used to query instruction ordering for trimVarLocs
(D82129) into a class. This will make it easier to reuse the functionality
upcoming patches.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D86150
2020-08-27 11:52:29 +01:00
Mikhail Maltsev
63df72fb38 [AArch64] Optimize instruction selection for certain vector shuffles
This patch adds code to recognize vector shuffles which can be
represented as VDUP (splat) of a vector lane with of a different
(wider) type than the original vector lane type.

For example:
    shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
is essentially:
    shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 0, i32 0>

Such patterns are generated by the SelectionDAG machinery in some cases
(see DAGCombiner::visitBITCAST in DAGCombiner.cpp, the "Remove double
bitcasts from shuffles" part).

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D86225
2020-08-27 11:06:49 +01:00
Paul Walker
a71aa114da [SVE] Fallback to default expansion when lowering SIGN_EXTEN_INREG from non-byte based source.
Differential Revision: https://reviews.llvm.org/D86394
2020-08-27 10:57:37 +01:00
Sander de Smalen
33c91e76d2 [AArch64][SVE] Add missing debug info for ACLE types.
This patch adds type information for SVE ACLE vector types,
by describing them as vectors, with a lower bound of 0, and
an upper bound described by a DWARF expression using the
AArch64 Vector Granule register (VG), which contains the
runtime multiple of 64bit granules in an SVE vector.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D86101
2020-08-27 10:56:42 +01:00
Alex Richardson
1928b323b7 [RISC-V] fmv.s/fmv.d should be as cheap as a move
Since the canonical floatig-point move is fsgnj rd, rs, rs, we should
handle this case in RISCVInstrInfo::isAsCheapAsAMove().

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D86518
2020-08-27 10:32:23 +01:00
Alex Richardson
aa5340b34a [RISC-V] Mark C_MV as a move instruction
Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D86517
2020-08-27 10:32:23 +01:00
Alex Richardson
7a907ec504 [RISC-V] ADDI/ORI/XORI x, 0 should be as cheap as a move
The isTriviallyRematerializable hook is only called for instructions that are
tagged as isAsCheapAsAMove. Since ADDI 0 is used for "mv" it should definitely
be marked with "isAsCheapAsAMove". This change avoids one stack spill in most of
the atomic-rmw.ll tests functions. It also avoids stack spills in two of our
out-of-tree CHERI tests.
ORI/XORI with zero may or may not be the same as a move micro-architecturally,
but since we are already doing it for register == x0, we might as well
do the same if the immediate is zero.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D86480
2020-08-27 10:32:22 +01:00
Vitaly Buka
afd0abc1f8 [ValueTracking] Support select in findAllocaForValue 2020-08-27 02:13:52 -07:00
Florian Hahn
b3984cec53 [DSE,MemorySSA] Traverse use-def chain without MemSSA Walker.
For DSE with MemorySSA it is beneficial to manually traverse the
defining access, instead of using a MemorySSA walker, so we can
better control the number of steps together with other limits and
also weed out invalid/unprofitable paths early on.

This patch requires a follow-up patch to be most effective, which I will
share soon after putting this patch up.

This temporarily XFAIL's the limit tests, because we now explore more
MemoryDefs that may not alias/clobber the killing def. This will be
improved/fixed by the follow-up patch.

This patch also renames some `Dom*` variables to `Earlier*`, because the
dominance relation is not really used/important here and potentially
confusing.

This patch allows us to aggressively cut down compile time, geomean
-O3 -0.64%, ReleaseThinLTO -1.65%, at the expense of fewer stores
removed. Subsequent patches will increase the number of removed stores
again, while keeping compile-time in check.

http://llvm-compile-time-tracker.com/compare.php?from=d8e3294118a8c5f3f97688a704d5a05b67646012&to=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&stat=instructions

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D86486
2020-08-27 10:02:02 +01:00
Sjoerd Meijer
38fc834d2c Revert "[Verifier] Additional check for intrinsic get.active.lane.mask"
This reverts commit 8d5f64c4edbc190a5a8790157fa1d99cfac34016.

Thanks to Eli Friedma for pointing out that this check is not appropiate here,
this check will be moved to the Lint pass.
2020-08-27 09:27:05 +01:00
Piotr Sobczak
0a08f60d7f [AMDGPU] Preserve vcc_lo when shrinking V_CNDMASK
There is no justification for changing vcc_lo to vcc
when shrinking V_CNDMASK, and such a change could
later confuse live variable analysis.

Make sure the original register is preserved.

Differential Revision: https://reviews.llvm.org/D86541
2020-08-27 10:22:50 +02:00
Shinji Okumura
cc1db81009 [Attributor] Add flag for undef value to the state of AAPotentialValues
Currently, an undef value is reduced to 0 when it is added to a set of potential values.
This patch introduces a flag for under values. By this, for example, we can merge two states `{undef}`, `{1}` to `{1}` (because we can reduce the undef to 1).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85592
2020-08-27 16:30:29 +09:00
Sam Parker
e6053bfbc3 [ARM] Enable outliner at -Oz for M-class
Enable default outlining when the function has the minsize attribute
and we're targeting an m-class core.

Differential Revision: https://reviews.llvm.org/D82951
2020-08-27 08:02:56 +01:00
Martin Storsjö
edc5aafa50 Revert "Reapply D70800: Fix AArch64 AAPCS frame record chain"
This reverts commit 9936455204fd6ab72715cc9d67385ddc93e072ed.

That commit caused failed assertions e.g. like this:

$ cat alloca.c
a;
b() {
  float c;
  d();
  a = __builtin_alloca(d);
  c = e();
  f(a);
  return c;
}
$ clang -target aarch64-linux-gnu -c alloca.c -O2
clang: ../lib/Target/AArch64/AArch64InstrInfo.cpp:3446: void
llvm::emitFrameOffset(llvm::MachineBasicBlock&,
llvm::MachineBasicBlock::iterator, const llvm::DebugLoc&, unsigned int,
unsigned int, llvm::StackOffset, const llvm::TargetInstrInfo*,
llvm::MachineInstr::MIFlag, bool, bool, bool*):
Assertion `(DestReg != AArch64::SP || Bytes % 16 == 0) &&
"SP increment/decrement not 16-byte aligned"' failed.
2020-08-27 09:39:56 +03:00
luxufan
8dd24026c0 [RISCV] add the MC layer support of riscv vector Zvamo extension
Implements the assemble and disassemble support of RISCV Vector
extension zvamo instructions, base on the 0.9 spec version.

Reviewed  by HsiangKai

Differential Revision: https://reviews.llvm.org/D85069
2020-08-27 14:11:38 +08:00
Sam Parker
f65ec601b0 [ARM] Make MachineVerifier more strict about terminators
Fix the ARM backend's analyzeBranch so it doesn't ignore predicated
return instructions, and make the MachineVerifier rule more strict.

Differential Revision: https://reviews.llvm.org/D40061
2020-08-27 07:10:20 +01:00
Amy Kwan
daa2ee7b26 [PowerPC] Implement Vector Multiply High/Divide Extended Builtins in LLVM/Clang
This patch implements the function prototypes vec_mulh and vec_dive in order to
utilize the vector multiply high (vmulh[s|u][w|d]) and vector divide extended
(vdive[s|u][w|d]) instructions introduced in Power10.

Differential Revision: https://reviews.llvm.org/D82609
2020-08-26 23:14:34 -05:00
Matt Arsenault
5f7f259e00 GlobalISel: IRTranslate minimum of pointer sizes on memcpy
I forgot to squash this with 0b7f6cc71a72a85f8a0cbee836a7a8e31876951a
2020-08-26 20:10:00 -04:00
Matt Arsenault
a03ce058a1 GlobalISel: Add generic instructions for memory intrinsics
AArch64, X86 and Mips currently directly consumes these and custom
lowering to produce a libcall, but really these should follow the
normal legalization process through the libcall/lower action.
2020-08-26 20:08:45 -04:00
Lang Hames
02bae0a8ff [ORC][JITLink] Switch to unique ownership for EHFrameRegistrars.
This will make stateful registrars (e.g. a future TargetProcessControl based
registrar) easier to deal with.
2020-08-26 16:59:45 -07:00
Arthur Eubanks
d74ec65308 [ConstProp] Remove ConstantPropagation
As discussed in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html.

Currently no users outside of unit tests.

Replace all instances in tests of -constprop with -instsimplify.
Notable changes in tests:
* vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead
* InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef
llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp

Reviewed By: lattner, nikic

Differential Revision: https://reviews.llvm.org/D85159
2020-08-26 15:51:30 -07:00
Craig Topper
47ba73f241 [X86] Change pentium4 tuning settings and scheduler model back to their values before D83913.
Clang now defaults to -march=pentium4 -mtune=generic so we don't
need modern tune settings on pentium4.
2020-08-26 15:38:12 -07:00
Alina Sbirlea
9e04c74adc Use properlyDominates in RDFLiveness when sorting on dominance.
Summary:
When looking for all reaching definitions, we sort basic blocks on dominance. When sorting looking for properlyDominates() handles the case A == B.

Authored by: pranavb

Differential Revision: https://reviews.llvm.org/D86661
2020-08-26 15:16:40 -07:00
Ahmed Bougacha
dee1317373 [AArch64] Use CCAssignFnForReturn helper in more spots. NFC.
It was added for GISel, but SDAG could use it too!
2020-08-26 14:39:11 -07:00
Nikita Popov
d8d374b2b3 [InstSimplify] Fold min/max intrinsic based on icmp of operands
This is a reboot of D84655, now performing the inner icmp
simplification query without undef folds.

It should be possible to handle the current foldMinMaxSharedOp()
fold based on this, by moving the logic into icmp of min/max instead,
making it more general. We can't drop the folds for constant operands,
because those also allow undef, which we exclude here.

The tests use assumes for exhaustive coverage, and have a few
more examples of misc folds we get based on icmp simplification.

Differential Revision: https://reviews.llvm.org/D85929
2020-08-26 22:02:57 +02:00
Muhammad Asif Manzoor
9eaab3b90f [AArch64][SVE] Add lowering for llvm fceil
Add the functionality to lower fceil for passthru variant

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D84548
2020-08-26 15:59:44 -04:00
Owen Anderson
a90ccf474c Reapply D70800: Fix AArch64 AAPCS frame record chain
Original Commit Message:
After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register)
may be placed in the middle of a stack frame if a function has both callee-saved
general-purpose registers and floating point registers. This will break the stack unwinders
that simply walk through the frame records (based on the guarantee from AAPCS64
"The Frame Pointer" section). This commit fixes the problem by adding the frame record offset.

Patch By: logan
2020-08-26 19:38:38 +00:00
Sanjay Patel
03d2b97d01 [DAGCombiner] allow store merging non-i8 truncated ops
We have a gap in our store merging capabilities for shift+truncate
patterns as discussed in:
https://llvm.org/PR46662

I generalized the code/comments for this function in earlier commits,
so we only need ease the type restriction and adjust the address/endian
checking to make this work.

AArch64 lets us switch endian to make sure that patterns are matched
either way.

Differential Revision: https://reviews.llvm.org/D86420
2020-08-26 15:23:08 -04:00
Aleksandr Platonov
218a7995fe [Support][Windows] Fix incorrect GetFinalPathNameByHandleW() return value check in realPathFromHandle()
`GetFinalPathNameByHandleW(,,N,)` returns:
- `< N` on success (this value does not include the size of the terminating null character)
- `>= N` if buffer is too small (this value includes the size of the terminating null character)

So, when `N == Buffer.capacity() - 1`, we need to resize buffer if return value is > `Buffer.capacity() - 2`.
Also, we can set `N` to `Buffer.capacity()`.

Thus, without this patch `realPathFromHandle()` returns unfilled buffer when length of the final path of the file is equal to `Buffer.capacity()` or `Buffer.capacity() - 1`.

Reviewed By: andrewng, amccarth

Differential Revision: https://reviews.llvm.org/D86564
2020-08-26 22:11:44 +03:00
Arthur Eubanks
4c381d496d [InstSimplify] Simplify to vector constants when possible
InstSimplify should do all transformations that ConstProp does, but
one thing that ConstProp does that InstSimplify wouldn't is inline
vector instructions that are constants, e.g. into a ret.

Previously vector instructions wouldn't be inlined in InstSimplify
because llvm::Simplify*Instruction() would return nullptr for specific
instructions, such as vector instructions that were actually constants,
if it couldn't simplify them.

This changes SimplifyInsertElementInst, SimplifyExtractElementInst, and
SimplifyShuffleVectorInst to return a vector constant when possible.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85946
2020-08-26 11:40:36 -07:00
Francesco Petrogalli
267cbf225e [MC][SVE] Fix data operand for instruction alias of st1d.
The version of `st1d` that operates with vector plus immediate
addressing mode uses the alias `st1d { <Zn>.d }, <Pg>, [<Za>.d]` for
rendering `st1d { <Zn>.d }, <Pg>, [<Za>.d, #0]`. The disassembler was
generating `<Zn>.s` instead of `<Zn>.d>`.

Differential Revision: https://reviews.llvm.org/D86633
2020-08-26 18:22:17 +00:00