This patch changes errors in `SmallVector::grow` that are independent of
memory capacity to be reported using report_fatal_error or
std::length_error instead of report_bad_alloc_error, which falsely signals
an OOM.
It also cleans up a few related things:
- makes report_bad_alloc_error to print the failure reason passed
to it.
- fixes the documentation to indicate that report_bad_alloc_error
calls `abort()` not "an assertion"
- uses a consistent name for the size/capacity argument to `grow`
and `grow_pod`
Reviewed By: mehdi_amini, MaskRay
Differential Revision: https://reviews.llvm.org/D86892
For ThinLTO importing we don't need to import all the fields of the DICompileUnit, such as enums, macros, retained types lists. The importation of those fields were previously disabled by setting their value map entries to nullptr. Unfortunately a metadata node can be shared by multiple metadata operands. Setting the map entry to nullptr might result in not importing other metadata unexpectedly. The issue is fixed by explicitly setting the original DICompileUnit fields (still a copy of the source module metadata) to null.
Reviewed By: wenlei, dblaikie
Differential Revision: https://reviews.llvm.org/D86675
Quite a while ago, we legalized these nodes as we added custom
handling for reciprocal estimates in the back end. We have since
moved to target-independent combines but neglected to turn off
legalization. As a result, we can now get selection failures on
non-VSX subtargets as evidenced in the listed PR.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=47373
This patch implements the builtins for Vector Multiply Builtins (vmulxxd family of instructions), and adds the appropriate test cases for these builtins. The builtins utilize the vector multiply instructions itnroduced with ISA 3.1.
Differential Revision: https://reviews.llvm.org/D83955
Summary of changes:
- Changed parser to eliminate generation of excessive error messages;
- Corrected lit tests to match all expected error messages;
- Corrected lit tests to guard against unwanted extra messages (added option "--implicit-check-not=error:");
- Added missing checks and fixed some typos in tests.
See bug 46907: https://bugs.llvm.org/show_bug.cgi?id=46907
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D86940
Add support for line continuations (the "backslash operator") in MASM by modifying the Parser's Lex method.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D83347
If the PSHUFBs have no other uses, then we can force the unselected elements to zero to OR them instead, avoiding both an extra mask load and a costly variable blend.
Eventually we should try to bring this into shuffle combining, once we can more easily convert between shuffles + select patterns.
In IPSCCP when a function is optimized to return undef, it should clear the returned attribute for all its input arguments
and its corresponding call sites.
The bug is exposed when the value of an input argument of the function is assigned to a physical register and
because of the argument having a returned attribute, the value of this physical register will continue to be used
as the function return value right after the call instruction returns, even if the value that this register holds may
be clobbered during the function call. This potentially results in incorrect values being used afterwards.
Reviewed By: jdoerfert, fhahn
Differential Revision: https://reviews.llvm.org/D84220
After computing dependence, we check if it is safe to hoist by
identifying if it clobbers any liveIns in the sibling block (NullSucc).
This check is moved to its own function which will be used in the
soon-to-be modified dependence checking algorithm for implicit null
checks pass.
Tests-Run: lit tests on X86/implicit-*
Separated out some checks in isSuitableMemoryOp and added comments
explaining why some of those checks are done.
Tests-Run:X86 implicit null checks tests.
When marking a global variable constant, and simplifying users using
CleanupConstantGlobalUsers(), the pass could incorrectly return false if
there were still some uses left, and no further optimizations was done.
This was caught using the check introduced by D80916.
This fixes PR46749.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D85837
These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)":
1.0/sqrt(X) * X => X/sqrt(X)
X * 1.0/sqrt(X) => X/sqrt(X)
We already handle more general cases, and we are intentionally not creating extra (and likely expensive)
fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce
X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags.
Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X).
Differential Revision: https://reviews.llvm.org/D86726
This is an enhancement to D81766 to allow loading the minimum target
vector type into an IR vector with a different number of elements.
In one of the motivating tests from PR16739, SLP creates <2 x float>
load ops mixed with <4 x float> insert ops, so we want to handle that
pattern in addition to potential oversized vectors created by the
vectorizers.
For now, we are assuming the insert/extract subvector with undef is
free because there is no exact corresponding TTI modeling for that.
Differential Revision: https://reviews.llvm.org/D86160
When lowering fixed length vector operations for SVE the subvector
operations are used extensively to marshall data between scalable
and fixed-length vectors. This means that sequences like:
extract_subvec(binop(insert_subvec(a), insert_subvec(b)))
are very common. DAGCombine only checks if the resulting binop is
legal or can be custom lowered when undoing such sequences. When
it's custom lowering that is introducing them the result is an
infinite legalise->combine->legalise loop.
This patch extends the isOperationLegalOr... functions to include
a "LegalOnly" parameter to restrict the check to legal operations
only. Although isOperationLegal could be used it's common for
the affected code paths to be visited pre and post legalisation,
so the extra parameter keeps the code tidy.
Differential Revision: https://reviews.llvm.org/D86450
The addend in a REL32 reloc needs to be adjusted to account for the
offset from the PC value returned by the s_getpc instruction to the
point where the reloc is applied. This was being done correctly for
(GOTPC)REL32_LO but not for (GOTPC)REL32_HI. This will only make a
difference if the target symbol happens to get loaded almost exactly
a multiple of 4G away from the relocated instructions.
Differential Revision: https://reviews.llvm.org/D86938
Unwinders may only preserve the lower 64bits of Neon and SVE registers,
as only the registers in the base ABI are guaranteed to be preserved
over the exception edge. The caller will need to preserve additional
registers for when the call throws an exception and the unwinder has
tried to recover state.
For e.g.
svint32_t bar(svint32_t);
svint32_t foo(svint32_t x, bool *err) {
try { bar(x); } catch (...) { *err = true; }
return x;
}
`z0` needs to be spilled before the call to `bar(x)` and reloaded before
returning from foo, as the exception handler may have clobbered z0.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84737
As stated in section 6.1.1.2, DWARFv5, p. 142,
| The last entry for each name is followed by a zero byte that
| terminates the list. There may be gaps between the lists.
The patch changes emitting a 4-byte zero value to a 1-byte one, which
effectively removes the gap between entry lists, and thus saves
approximately 3 bytes per name; the calculation is not exact because
the total size of the table is aligned to 4.
Differential Revision: https://reviews.llvm.org/D86927
The member is not in use; the unit length for the table is emitted as
a difference between two labels. Moreover, the type of the member might
be misleading, because for DWARF64 the field should be 64 bit long.
Differential Revision: https://reviews.llvm.org/D86912
This patch uses partial DemandedElts masks to further simplify target shuffle chains and finally starts making target shuffle combining part of SimplifyDemandedBits/SimplifyDemandedVectorElts.
We already manage this for Depth == 0 cases, where combineX86ShuffleChain would early-out if the shuffle combined to the same op, but the patch generalizes this by manipulating the depth handling of combineX86ShufflesRecursively - calling with a new Depth = 0 and reducing the maximum shuffle combine depth accordingly.
Differential Revision: https://reviews.llvm.org/D66004
This patch makes it possible for AAUB to use information from AANoUndef.
This is the next patch of D86983
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86984
When the associated value is undef, we immediately forced to indicate a pessimistic fixpoint so far.
This patch changes the initialization to check the attribute given in IR at first and to indicate an optimistic fixpoint when it is given.
This change will enable us to catch , for example, the following case in AAUB.
```
call void @foo(i32 noundef undef)
```
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86983
Before upstream a new target called CSKY, make a new triple of that called Triple::csky.
For now, it's a 32-bit little endian target and the detail can be referred at D86269.
This is the split part of D86269, which add a new target called CSKY.
Differential Revision: https://reviews.llvm.org/D86505
If there's no initializer symbol in the current MaterializationResponsibility
then bail out without installing JITLink passes: they're going to be no-ops
anyway.
A think-o in the existing code meant that dependencies were never registered.
This failure could lead to crashes rather than orderly error propagation if
initialization dependencies failed to materialize.
No test case: The bug was discovered in an out-of-tree code and requires
pathalogically misconfigured JIT to generate the original error that lead to
the crash.
This patch adds a helper function DumpStrSection to simplify codes.
Besides, nonprintable chars in debug_str and debug_str.dwo sections
are printed as escaped chars.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D86918
Summary:
Analyses are preserved in MemCpyOptimizer.
Get analyses before running the pass and store the pointers, instead of
using lambdas and getting them every time on demand.
Reviewers: lenary, deadalnix, mehdi_amini, nikic, efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74494
This patch adds an initial, incomeplete and unsound implementation of
canReplacePointersIfEqual to check if a pointer value A can be replaced
by another pointer value B, that are deemed to be equivalent through
some means (e.g. information from conditions).
Note that is in general not sound to blindly replace pointers based on
equality, for example if they are based on different underlying objects.
LLVM's memory model is not completely settled as of now; see
https://bugs.llvm.org/show_bug.cgi?id=34548 for a more detailed
discussion.
The initial version of canReplacePointersIfEqual only rejects a very
specific case: replacing a pointer with a constant expression that is
not dereferenceable. Such a replacement is problematic and can be
restricted relatively easily without impacting most code. Using it to
limit replacements in GVN/SCCP/CVP only results in small differences in
7 programs out of MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto.
This patch is supposed to be an initial step to improve the current
situation and the helper should be made stricter in the future. But this
will require careful analysis of the impact on performance.
Reviewed By: aqjune
Differential Revision: https://reviews.llvm.org/D85524
Interleave for small loops that have reductions inside,
which breaks dependencies and expose.
This gives very significant performance improvements for some benchmarks.
Because small loops could be in very hot functions in real applications.
Differential Revision: https://reviews.llvm.org/D81416
Previously if the source match we asserted that the destination
matched. But GPR <-> mask register copies on X86 can violate this
since we use the same K-registers for multiple sizes.
Fixes this ISPC issue https://github.com/ispc/ispc/issues/1851
Differential Revision: https://reviews.llvm.org/D86507
This reverts commit bc9a29b9ee6ade4894252b1470977142c32b4602.
The reasoning that this patch was wrong was itself incorrect
(see discussion on llvm-commits). This patch does seem to be exposing
a latent SVE code generation bug on non-public tests, which should
not block a correctness fix for public, non-SVE use cases.
General purpose registers 30 and 31 are handled differently when they are
reserved as the base-pointer and frame-pointer respectively. This fixes the
offset of their fixed-stack objects when there are fpr calle-saved registers.
Differential Revision: https://reviews.llvm.org/D85850
I was recently debugging a similar issue to https://reviews.llvm.org/D86500 only with a large metadata section. Only after I finished debugging it did I discover it was fixed very recently.
My version of the fix was going to alignTo since that uses uint64_t and improves the readability of the code. So I though I would go ahead and share it.
Differential Revision: https://reviews.llvm.org/D86957
This is needed for an upcoming change to how we translate conditional branches
which might generate these.
Differential Revision: https://reviews.llvm.org/D86383
These instructions actually use a 512-byte location, where bytes 464-511 are ignored.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D86942
On -O0, i1 strict_fsetcc will be promoted to i32. We don't handle that
in TD patterns. This patch fills logic in PPCISelDAGToDAG to handle more
cases.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D86595
This patch adds the td instruction definitions of the xvcvspbf16 and xvcvbf16spn
instructions, along with their respective MC tests.
Differential Revision: https://reviews.llvm.org/D86794
This reverts commit e9d9a612084b47fc4277523561d61e675370c854.
This patch was previously revert by 04879086b44348cad600a0a1ccbe1f7776cc3cf9
with the reapplication being done after breaking the assert used to
ensure SP is always 16-byte aligned, which is a requirement of the AAPCS.
For extra context the latest patch caused runtime failures when
building with "-march=armv8-a+sve -mllvm -aarch64-sve-vector-bits-min=256".
Unmerges have the same fundamental problem as G_TRUNC, and G_TRUNC
could be implemented in terms of G_UNMERGE_VALUES. Reducing the number
of elements in unmerge results ends up producing the original unmerge
type profile, so the artifact combiner needs to eliminate the
intermediate illegal registers. This avoids infinite looping in the
legalizer in a future change.
Assuming an unmerge has each result unmerged the same way, this ends
up producing a new unmerge of the source for every definition. I'm not
sure if the artifact combiner should either insert temporary merges
here and erase the original merge, or if the combiner should look at
uses from defs rather than defs from uses for unmerges.
In a few cases this regresses from using 16-bit shifts for 8-bit
values to using 32-bit shifts, but I think these can be legalized
later (the other legalization rules don't try very hard to use 16-bit
shifts either).
Loop Idiom Recognize Pass (LIRP) attempts to transform loops with subscripted arrays
into memcpy/memset function calls. In some particular situation, this transformation
introduces negative impacts. For example: https://bugs.llvm.org/show_bug.cgi?id=47300
This patch will enable users to disable a particular part of the transformation, while
he/she can still enjoy the benefit brought about by the rest of LIRP. The default
behavior stays unchanged: no part of LIRP is disabled by default.
Reviewed By: etiotto (Ettore Tiotto)
Differential Revision: https://reviews.llvm.org/D86262
This relands e9a3d1a401b07cbf7b11695637f1b549782a26cd which was originally
missing linking LLVMSupport into LLMVFileCheck which broke the SHARED_LIBS build.
Original summary:
The actual FileCheck logic seems to be implemented in LLVMSupport. I don't see a
good reason for having FileCheck implemented there as it has a very specific use
while LLVMSupport is a dependency of pretty much every LLVM tool there is. In
fact, the only use of FileCheck I could find (outside the FileCheck tool and the
FileCheck unit test) is a single call in GISelMITest.h.
This moves the FileCheck logic to its own LLVMFileCheck library. This way only
FileCheck and the GlobalISelTests now have a dependency on this code.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D86344
I have fixed up a number of warnings resulting from TypeSize -> uint64_t
casts and calling getVectorNumElements() on scalable vector types. I
think most of the changes are fairly trivial except for those in
DAGTypeLegalizer::SplitVecRes_MLOAD I've tried to ensure we create
the MachineMemoryOperands in a sensible way for scalable vectors.
I have added a CHECK line to the following test:
CodeGen/AArch64/sve-split-load.ll
that ensures no new warnings are added.
Differential Revision: https://reviews.llvm.org/D86697
Currently it is hard to avoid having LLVM link to the system install of
ncurses, since it uses check_library_exists to find e.g. libtinfo and
not find_library or find_package.
With this change the ncurses lib is found with find_library, which also
considers CMAKE_PREFIX_PATH. This solves an issue for the spack package
manager, where we want to use the zlib installed by spack, and spack
provides the CMAKE_PREFIX_PATH for it.
This is a similar change as https://reviews.llvm.org/D79219, which just
landed in master.
Patch By: haampie
Differential Revision: https://reviews.llvm.org/D85820
MemoryPhis with a single value are correct, but can lead to errors when
updating. Clean up single entry Phis newly added when cloning blocks.
Resolves PR46574.
This patch makes the debug_str section optional. When the debug_str
section exists but doesn't contain anything, yaml2obj will emit a
section header for it.
Reviewed By: grimar
Differential Revision: https://reviews.llvm.org/D86860
Add a reproducer verifier that catches:
- Missing or invalid home directory
- Missing or invalid working directory
- Missing or invalid module/symbol paths
- Missing files from the VFS
The verifier is enabled by default during replay, but can be skipped by
passing --reproducer-no-verify.
Differential revision: https://reviews.llvm.org/D86497
getValuesInOffloadArrays goes through the offload arrays in __tgt_target_data_begin_mapper getting the values stored in them before the call is issued.
call void @__tgt_target_data_begin_mapper(arg0, arg1,
i8** %offload_baseptrs, i8** %offload_ptrs, i64* %offload_sizes,
...)
Diferential Revision: https://reviews.llvm.org/D86300
The 1st try was reverted because I missed an assert that
needed softening.
As discussed in D86798 / rG09652721 , we were potentially
returning a different result for whether an Instruction
is commutable depending on if we call the base class or
derived class method.
This requires relaxing asserts in GVN, but that pass
seems to be working otherwise.
NewGVN requires more work because it uses different
code paths for numbering binops and calls.
This patch implements the foldMemoryOperand hook in Thumb1InstrInfo,
allowing tBLXr and a spilled function address to be combined back into a
tBL. This can help with codesize at Oz, especailly in the tinycrypt
library.
Differential Revision: https://reviews.llvm.org/D79785
fabs and fneg share a common transformation:
(fneg (bitconvert x)) -> (bitconvert (xor x sign))
(fabs (bitconvert x)) -> (bitconvert (and x ~sign))
This patch separate the code into a single method.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D86862
I tried to fix this in:
rG716e35a0cf53
...but that patch depends on the order that we encounter the
magic "x/sqrt(x)" expression in the combiner's worklist.
This patch should improve that by waiting until we walk the
user list to decide if there's a use to skip.
The AArch64 test reveals another (existing) ordering problem
though - we may try to create an estimate for plain sqrt(x)
before we see that it is part of a 1/sqrt(x) expression.
The actual FileCheck logic seems to be implemented in LLVMSupport. I don't see a
good reason for having FileCheck implemented there as it has a very specific use
while LLVMSupport is a dependency of pretty much every LLVM tool there is. In
fact, the only use of FileCheck I could find (outside the FileCheck tool and the
FileCheck unit test) is a single call in GISelMITest.h.
This moves the FileCheck logic to its own LLVMFileCheck library. This way only
FileCheck and the GlobalISelTests now have a dependency on this code.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D86344
For an instruction in the basic block BB, SinkingPass enumerates basic blocks
dominated by BB and BB's successors. For each enumerated basic block,
SinkingPass uses `AllUsesDominatedByBlock` to check whether the basic
block dominates all of the instruction's users. This is inefficient.
Use the nearest common dominator of all users to avoid enumerating the
candidate. The nearest common dominator may be in a parent loop which is
not beneficial. In that case, find the ancestors in the dominator tree.
In the case that the instruction has no user, with this change we will
not perform unnecessary move. This causes some amdgpu test changes.
A stage-2 x86-64 clang is a byte identical with this change.
As discussed in D86798 / rG09652721 , we were potentially
returning a different result for whether an Instruction
is commutable depending on if we call the base class or
derived class method.
This requires relaxing an assert in GVN, but that pass
seems to be working otherwise.
NewGVN requires more work because it uses different
code paths for numbering binops and calls.
Add printf-style precision specifier to pad numbers to a given number of
digits when matching them if the value is smaller than the given
precision. This works on both empty numeric expression (e.g. variable
definition from input) and when matching a numeric expression. The
syntax is as follows:
[[#%.<precision><format specifier>, ...]
where <format specifier> is optional and ... can be a variable
definition or not with an empty expression or not. In the absence of a
precision specifier, a variable definition will accept leading zeros.
Reviewed By: jhenderson, grimar
Differential Revision: https://reviews.llvm.org/D81667
addRuntimeChecks uses SCEVExpander, which relies on the DT/LoopInfo to
be up-to-date. Changing the CFG afterwards may invalidate some inserted
instructions, especially LCSSA phis.
Reorder the code to first update the CFG and then create the runtime
checks. This should not have any impact on the generated code, as we
adjust the CFG and generate runtime checks together.
Fixes PR47343.
This requires adding a missing 'const' to the definition because
the callers are using const args, but there should be no change
in behavior.
The intrinsic method was added with D86798 / rG096527214033
In general, we probably want to try the multi-use reciprocal
transform before sqrt transforms, but x/sqrt(x) is a special-case
because that will always reduce to plain sqrt(x) or an estimate.
The AArch64 tests show that the transform is limited by TLI
hook to patterns where there are 3 or more uses of the divisor.
So this change can result in an extra division compared to
what we had, but that's the intended behvior based on the
current setting of that hook.
As discussed in D86798, it's not clear if the caller code
works with a more liberal definition of "commutative" that
includes intrinsics like min/max. This makes the binop
restriction (current functionality is unchanged) explicit
until the code is audited/tested.
Perfect shuffle instruction (vdealvdd/vshuffvdd) work on vector
pairs. When given a single input vector, half of it first needs
to be transposed into the other vector before the generated
shuffles can take effect. Also the first transpose needs to be
undone at the end (this last step was missing).
The problem with module slice has been addressed in D86319
Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a
function can have a unique ICV value after it is finished, and
AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a
call site. This enables us to check the value of a call and if it
changes the ICV. This also changes the approach in
`getReplacementValues()` to a worklist-based approach so we can explore
all relevant BBs.
Differential Revision: https://reviews.llvm.org/D85544
Summary:
The module slice describes which functions we can analyze and transform
while working on an SCC as part of the Attributor-CGSCC pass. So far we
simply restricted it to the SCC.
Reviewers: jdoerfert
Differential Revision: https://reviews.llvm.org/D86319
This is the next patch of D86842
When we check `noundef` attribute violation at callsites, we do not have to require `nonnull` in the following two cases.
1. An argument is known to be simplified to undef
2. An argument is known to be dead
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86845
Otherwise their alignment is dependent on the size of the section. If the size
is large than 16, the alignment will be 16.
16 is a bad choice for both .llvmbc and .llvmcmd because the padding between two
contributions from input sections is of a variable size.
A bitstream is actually guaranteed to be 4-byte aligned, but consumers don't
need this property.
DFS and Reverse-DFS linkage orders are used to order execution of
deinitializers and initializers respectively.
This patch replaces uses of special purpose DFS order functions in
MachOPlatform and LLJIT with uses of the new methods.
Even though `noundef` IR attribute might be attached to non-void type values, AANoUndef is mistakenly identified for pointer type values only.
This patch fixes that.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86737
Replace the check for poison-producing instructions in
SimplifyWithOpReplaced() with the generic helper canCreatePoison()
that properly handles poisonous shifts and thus avoids the problem
from PR47322.
This additionally fixes a bug in IIQ.UseInstrInfo=false mode, which
previously could have caused this code to ignore poison flags.
Setting UseInstrInfo=false should reduce the possible optimizations,
not increase them.
This is not a full solution to the problem, as poison could be
introduced more indirectly. This is just a minimal, easy to backport
fix.
Differential Revision: https://reviews.llvm.org/D86834
Move bail out when optimizing for size before runtime check generation.
In that case, we do not use the result of the expansion, the expanded
instruction will be dead and cleaned up later.
By doing the check before expanding the runtime-checks, we can save a
bit of unnecessary work.
This got reverted because a dependency was reverted. It has since
been reapplied, so reapply this as well.
-----
Related to D69686. As noted there, LVI currently behaves differently
for integer and pointer values: For integers, the block value is always
valid inside the basic block, while for pointers it is only valid at
the end of the basic block. I believe the integer behavior is the
correct one, and CVP relies on it via its getConstantRange() uses.
The reason for the special pointer behavior is that LVI checks whether
a pointer is dereferenced in a given basic block and marks it as
non-null in that case. Of course, this information is valid only after
the dereferencing instruction, or in conservative approximation,
at the end of the block.
This patch changes the treatment of dereferencability: Instead of
including it inside the block value, we instead treat it as something
similar to an assume (it essentially is a non-nullness assume) and
incorporate this information in intersectAssumeOrGuardBlockValueConstantRange()
if the context instruction is the terminator of the basic block.
This happens either when determining an edge-value internally in LVI,
or when a terminator was explicitly passed to getValueAt(). The latter
case makes this more powerful than the previous implementation as
a side-effect, and this does actually seem benefitial in practice.
Of course, we do not want to recompute dereferencability on each
intersectAssume call, so we need a new cache for this. The
dereferencability analysis requires walking the entire basic block
and computing underlying objects of all memory operands. This was
previously done separately for each queried pointer value. In the
new implementation (both because this makes the caching simpler,
and because it is faster), I instead only walk the full BB once and
cache all the dereferenced pointers. So the traversal is now performed
only once per BB, instead of once per queried pointer value.
I think the overall model now makes more sense than before, and there
will be no more pitfalls due to differing integer/pointer behavior.
Differential Revision: https://reviews.llvm.org/D69914
EarlyCSE has a mode to verify the invariant that hash equality equals
key equality, but EliminateDuplicatePHINodes() doesn't.
I've verified that this would have caught the stage2-stage3 mismatches
5ec2b757cc7d37ff0d03b36ee863b0962fe78108 revert has fixed,
that were introduced last time in 3e69871ab5a66fb55913a2a2f5e7f5b42899a4c9.
This patch fixes AANoUndef manifestation.
We should not manifest noundef for positions that will be changed to undef.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86835
Handling the new min/max intrinsics is the motivation, but it
turns out that we have a bunch of other intrinsics with this
missing bit of analysis too.
The FP min/max tests show that we are intersecting FMF,
so that part should be safe too.
As noted in https://llvm.org/PR46897 , there is a commutative
property specifier for intrinsics, but no corresponding function
attribute, and so apparently no uses of that bit. We may want to
remove that next.
Follow-up patches should wire up the Instruction::isCommutative()
to this IntrinsicInst specialization. That requires updating
callers to be aware of the more general commutative property
(not just binops).
Differential Revision: https://reviews.llvm.org/D86798
The original take 1 was 6102310d814ad73eab60a88b21dd70874f7a056f,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.
However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb40d26b5d3ead3bb8e9e284218e0186
that reverted it:
> It appears to cause compilation non-determinism and caused stage3 mismatches.
Then there was take 2 3e69871ab5a66fb55913a2a2f5e7f5b42899a4c9,
which was InstCombine-specific, but it again showed stage2-stage3 differences,
and reverted in bdaa3f86a040b138c58de41d73d35b76fdec1380.
This is quite alarming.
Here, let's try to change how we find existing PHI candidate:
due to the worklist order, and the way PHI nodes are inserted
(it may be inserted as the first one, or maybe not), let's look at *all*
PHI nodes in the block.
Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name | baseline | proposed | Δ | % | \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| asm-printer.EmittedInsts | 7942329 | 7942457 | 128 | 0.00% | 0.00% |
| assembler.ObjectBytes | 254295632 | 254312480 | 16848 | 0.01% | 0.01% |
| correlated-value-propagation.NumPhis | 18412 | 18347 | -65 | -0.35% | 0.35% |
| early-cse.NumCSE | 2183283 | 2183267 | -16 | 0.00% | 0.00% |
| early-cse.NumSimplify | 550105 | 541842 | -8263 | -1.50% | 1.50% |
| instcombine.NumAggregateReconstructionsSimplified | 73 | 4506 | 4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined | 3640311 | 3644419 | 4108 | 0.11% | 0.11% |
| instcombine.NumDeadInst | 1778204 | 1783205 | 5001 | 0.28% | 0.28% |
| instcombine.NumPHICSEs | 0 | 22490 | 22490 | 0.00% | 0.00% |
| instcombine.NumWorklistIterations | 2023272 | 2024400 | 1128 | 0.06% | 0.06% |
| instcount.NumCallInst | 1758395 | 1758802 | 407 | 0.02% | 0.02% |
| instcount.NumInvokeInst | 59478 | 59502 | 24 | 0.04% | 0.04% |
| instcount.NumPHIInst | 330557 | 330545 | -12 | 0.00% | 0.00% |
| instcount.TotalBlocks | 1077138 | 1077220 | 82 | 0.01% | 0.01% |
| instcount.TotalFuncs | 101442 | 101441 | -1 | 0.00% | 0.00% |
| instcount.TotalInsts | 8831946 | 8832606 | 660 | 0.01% | 0.01% |
| simplifycfg.NumHoistCommonCode | 24186 | 24187 | 1 | 0.00% | 0.00% |
| simplifycfg.NumInvokes | 4300 | 4410 | 110 | 2.56% | 2.56% |
| simplifycfg.NumSimpl | 1019813 | 999767 | -20046 | -1.97% | 1.97% |
```
So it fires 22490 times, which is less than ~24k the take 1 did,
but more than what take 2 did (22228 times)
.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).
All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
While the original variant with doing this in InstSimplify (rightfully)
caused questions and ultimately was detected to be a culprit
of stage2-stage3 mismatch, it was expected that
InstCombine-based implementation would be fine.
But apparently it's not, as
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/24095/steps/compare-compilers/logs/stdio
suggests.
Which suggests that somewhere in InstCombine there is a loop
over nondeterministically sorted container, which causes
different worklist ordering.
This reverts commit 3e69871ab5a66fb55913a2a2f5e7f5b42899a4c9.
This ensures that you get the same output regardless if generating
code directly to an object file or if generating assembly and
assembling that.
Add implementations of the EmitARM64WinCFI*() methods in
AArch64TargetAsmStreamer, and fill in one blank in MCAsmStreamer.
Add corresponding directive handlers in AArch64AsmParser and
COFFAsmParser.
Some SEH directive names have been picked to match the prior art
for SEH assembly directives for x86_64, e.g. the spelling of
".seh_startepilogue" matching the preexisting ".seh_endprologue".
For the directives for saving registers, the exact spelling
from the arm64 documentation is picked, e.g. ".seh_save_reg" (to follow
that naming for all the other ones, e.g. ".seh_save_fregp_x"), while
the corresponding one for x86_64 is plain ".seh_savereg" without the
second underscore.
Directives in the epilogues have the same names as in prologues,
e.g. .seh_savereg, even though the registers are restored, not
saved, at that point.
Differential Revision: https://reviews.llvm.org/D86529
This can happen e.g. for code that declare .seh_proc/.seh_endproc
in assembly, or for code that use .seh_handlerdata (which triggers
the unwind info to be emitted before the end of the function).
The TextSection field must be made non-const to be able to use it
with Streamer.SwitchSection().
Differential Revision: https://reviews.llvm.org/D86528
As we've established, if it takes more than two iterations
(one to perform folding and one to ensure that no folding opportunities
remain) per function, then there are worklist management issues.
So it may be interesting to keep track of it.
The original take was 6102310d814ad73eab60a88b21dd70874f7a056f,
which taught InstSimplify to do that, which seemed better at time,
since we got EarlyCSE support for free.
However, it was proven that we can not do that there,
the simplified-to PHI would not be reachable from the original PHI,
and that is not something InstSimplify is allowed to do,
as noted in the commit ed90f15efb40d26b5d3ead3bb8e9e284218e0186
that reverted it :
> It appears to cause compilation non-determinism and caused stage3 mismatches.
However InstCombine already does many different optimizations,
so it should be a safe place to do it here.
Note that we still can't just compare incoming values ranges,
because there is no guarantee that these PHI's we'd simplify to
were already re-visited and sorted.
However coming up with a test is problematic.
Effects on vanilla llvm test-suite + RawSpeed:
```
| statistic name | baseline | proposed | Δ | % | |%| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instcombine.NumPHICSEs | 0 | 22228 | 22228 | 0.00% | 0.00% |
| asm-printer.EmittedInsts | 7942329 | 7942456 | 127 | 0.00% | 0.00% |
| assembler.ObjectBytes | 254295632 | 254313792 | 18160 | 0.01% | 0.01% |
| early-cse.NumCSE | 2183283 | 2183272 | -11 | 0.00% | 0.00% |
| early-cse.NumSimplify | 550105 | 541842 | -8263 | -1.50% | 1.50% |
| instcombine.NumAggregateReconstructionsSimplified | 73 | 4506 | 4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined | 3640311 | 3666911 | 26600 | 0.73% | 0.73% |
| instcombine.NumDeadInst | 1778204 | 1783318 | 5114 | 0.29% | 0.29% |
| instcount.NumCallInst | 1758395 | 1758804 | 409 | 0.02% | 0.02% |
| instcount.NumInvokeInst | 59478 | 59502 | 24 | 0.04% | 0.04% |
| instcount.NumPHIInst | 330557 | 330549 | -8 | 0.00% | 0.00% |
| instcount.TotalBlocks | 1077138 | 1077221 | 83 | 0.01% | 0.01% |
| instcount.TotalFuncs | 101442 | 101441 | -1 | 0.00% | 0.00% |
| instcount.TotalInsts | 8831946 | 8832611 | 665 | 0.01% | 0.01% |
| simplifycfg.NumInvokes | 4300 | 4410 | 110 | 2.56% | 2.56% |
| simplifycfg.NumSimpl | 1019813 | 999740 | -20073 | -1.97% | 1.97% |
```
So it fires ~22k times, which is less than ~24k the take 1 did.
It allows foldAggregateConstructionIntoAggregateReuse() to actually work
after PHI-of-extractvalue folds did their thing. Previously SimplifyCFG
would have done this PHI CSE, of all places. Additionally, allows some
more `invoke`->`call` folds to happen (+110, +2.56%).
All in all, expectedly, this catches less things overall,
but all the motivational cases are still caught, so all good.
A couple of AArch64 tests were failing on Solaris, both sparc and x86:
LLVM :: MC/AArch64/SVE/add-diagnostics.s
LLVM :: MC/AArch64/SVE/cpy-diagnostics.s
LLVM :: MC/AArch64/SVE/cpy.s
LLVM :: MC/AArch64/SVE/dup-diagnostics.s
LLVM :: MC/AArch64/SVE/dup.s
LLVM :: MC/AArch64/SVE/mov-diagnostics.s
LLVM :: MC/AArch64/SVE/mov.s
LLVM :: MC/AArch64/SVE/sqadd-diagnostics.s
LLVM :: MC/AArch64/SVE/sqsub-diagnostics.s
LLVM :: MC/AArch64/SVE/sub-diagnostics.s
LLVM :: MC/AArch64/SVE/subr-diagnostics.s
LLVM :: MC/AArch64/SVE/uqadd-diagnostics.s
LLVM :: MC/AArch64/SVE/uqsub-diagnostics.s
For example, reduced from `MC/AArch64/SVE/add-diagnostics.s`:
add z0.b, z0.b, #0, lsl #8
missed the expected diagnostics
$ ./bin/llvm-mc -triple=aarch64 -show-encoding -mattr=+sve add.s
add.s:1:21: error: immediate must be an integer in range [0, 255] with a shift amount of 0
add z0.b, z0.b, #0, lsl #8
^
The message is `Match_InvalidSVEAddSubImm8`, emitted in the generated
`lib/Target/AArch64/AArch64GenAsmMatcher.inc` for `MCK_SVEAddSubImm8`.
When comparing the call to `::AArch64Operand::isSVEAddSubImm<char>` on both
Linux/x86_64 and Solaris, I find
875 bool IsByte = std::is_same<int8_t, std::make_signed_t<T>>::value;
is `false` on Solaris, unlike Linux.
The problem boils down to the fact that `int8_t` is plain `char` on
Solaris: both the sparc and i386 psABIs have `char` as signed. However,
with
9887 DiagnosticPredicate DP(Operand.isSVEAddSubImm<int8_t>());
in `lib/Target/AArch64/AArch64GenAsmMatcher.inc`, `std::make_signed_t<int8_t>`
above yieds `signed char`, so `std::is_same<int8_t, signed char>` is `false`.
This can easily be fixed by also allowing for `int8_t` here and in a few
similar places.
Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and
`x86_64-pc-linux-gnu`.
Differential Revision: https://reviews.llvm.org/D85225
Rather than calling hasFnAttribute and then calling getFnAttribute
if the attribute exists, its better to just call getFnAttribute and
then check if we got a valid attribute back.
This patch helps make the debug_abbrev_offset field optional. We don't
need to calculate the value of this field in the future.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D86614
Current `v:t = zext(setcc x,y,cc)` will be transformed to `select x, y, 1:t, 0:t, cc`. It misses some opportunities if x's type size is less than `t`'s size. This patch enhances the above transformation.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D86687
instruction can decrement the reference count, not whether it can alter
it
This prevents the state transition from S_Use to S_CanRelease when doing
a bottom-up traversal and the transition from S_Retain to S_CanRelease
when doing a top-down traversal when the visited instruction can
increment the ref count but cannot decrement it. This allows the ARC
optimizer to remove retain/release pairs which were previously not
removed.
rdar://problem/21793154
The implicit def of the super register would appear to kill any live
uses of components before the spill, and would be deleted by
MachineCopyPropagation. We need to add implicit uses of the super
register, similarly to what copyPhysReg does. VGPR tuples appear to be
correctly handled already. I need to double check the SGPR->memory
path.
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.
So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.
This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.
Differential Revision: https://reviews.llvm.org/D86744
Since doInitialization() in the legacy pass modifies the module, the NPM
pass is a Module pass.
Reviewed By: ahatanak, ychen
Differential Revision: https://reviews.llvm.org/D86178
This patch fixes this crash https://gcc.godbolt.org/z/Ps8d1e
And gives SROA the ability to remove assumes if it allows promoting an alloca to register
Without removing assumes when it can't promote to register.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86570
Skip this for now, to avoid a backend crash in:
UNREACHABLE executed at llvm/lib/Target/ARM/ARMISelLowering.cpp:13412
This should fix PR45824.
Differential Revision: https://reviews.llvm.org/D86784
We introduce a codegen optimization pass which splits functions into hot and cold
parts. This pass leverages the basic block sections feature recently
introduced in LLVM from the Propeller project. The pass targets
functions with profile coverage, identifies cold blocks and moves them
to a separate section. The linker groups all cold blocks across
functions together, decreasing fragmentation and improving icache and
itlb utilization.
We evaluated the Machine Function Splitter pass on clang bootstrap and
SPECInt 2017.
For clang bootstrap we observe a mean 2.33% runtime improvement with a
~32% reduction in itlb and stlb misses. Additionally, L1 icache misses
reduced by 9.5% while L2 instruction misses reduced by 20%.
For SPECInt we report the change in IntRate the C/C++
benchmarks. All benchmarks apart from mcf and x264 improve, on average
by 0.6% with the max for deepsjeng at 1.6%.
Benchmark % Change
500.perlbench_r 0.78
502.gcc_r 0.82
505.mcf_r -0.30
520.omnetpp_r 0.18
523.xalancbmk_r 0.37
525.x264_r -0.46
531.deepsjeng_r 1.61
541.leela_r 0.83
557.xz_r 0.15
Differential Revision: https://reviews.llvm.org/D85368
These arm_mve_vldr_gather_offset_predicated and
arm_mve_vstr_scatter_offset_predicated have some extra parameters
meaning the predicate is at a later operand. If a loop contains _only_
those masked instructions, we would miss transforming the active lane
mask.
Differential Revision: https://reviews.llvm.org/D86791
This patch implements the builtins for Vector Load with Zero and Signed Extend Builtins (lxvr_x for b, h, w, d), and adds the appropriate test cases for these builtins. The builtins utilize the vector load instructions itnroduced with ISA 3.1.
Differential Revision: https://reviews.llvm.org/D82502#inline-797941
There is a subtle problem with new statepoint lowering scheme
when base and pointers are the same (see PR46917 for more context):
%1 = STATEPOINT ... %0, %0(tied-def 0)...
if, for some reason, register allocator desides to put two instances
of %0 into two different objects (registers or spill slots), we may
end up with
$reg3 = STATEPOINT ... $reg2, $reg1(tied-def 0)...
and nothing will prevent later passes to sink uses of $reg2 below
statepoint, which is incorrect.
As a short term solution, always put base pointers on stack during
lowering.
A longer term solution may be to rework MIR statepoint format to
avoid GC pointer duplication in statepoint argument list.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D86712
With gcc 6.3.0, I hit the following compilation error:
../lib/CodeGen/GlobalISel/Combiner.cpp: In member function
‘bool llvm::Combiner::combineMachineInstrs(llvm::MachineFunction&,
llvm::GISelCSEInfo*)’:
../lib/CodeGen/GlobalISel/Combiner.cpp:156:54: error: suggest parentheses
around ‘&&’ within ‘||’ [-Werror=parentheses]
assert(!CSEInfo || !errorToBool(CSEInfo->verify()) &&
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
"CSEInfo is not consistent. Likely missing calls to "
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"observer on mutations");
Fix the code as suggested by the compiler.
This is the follow up patch for https://reviews.llvm.org/D86183 as we miss to delete the node if NegX == NegY, which has use after we create the node.
```
if (NegX && (CostX <= CostY)) {
Cost = std::min(CostX, CostZ);
RemoveDeadNode(NegY);
return DAG.getNode(Opcode, DL, VT, NegX, Y, NegZ, Flags); #<-- NegY is used here if NegY == NegX.
}
```
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D86689
This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). In addition I've
added some other member functions for more commonly used operations.
Hopefully this makes the class more useful and will reduce the
need for calling getKnownMinValue().
Differential Revision: https://reviews.llvm.org/D86065
The abbrev codes in a new abbrev table should start from 1 (by default),
rather than inherit the value from the code in the previous table.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D86545
Original D81646 had check for tied regs in foldPatchpoint().
Due to unfortunate miscommunication with review comments and
adressing some comments post commit, it turned into assertion.
We had an offline talk and agreed that with current implementation
this path is possible, so I'm changing it back to check.
Note that this is workaround until ussues described in PR46917 are
resolved.
Remove the code that tried to look for reduction patterns, since the
vectorizer and isel can now produce predicated arithmetic instructios
within the loop body. This has required some reorganisation and fixes
around live-out and predication checks, as well as looking for cases
where an input/output is initialised to zero.
Differential Revision: https://reviews.llvm.org/D86613
Previously in addTypeForNeon, we would set the operations for bfloat vectors
like other generic types. But as bfloat is a storage-only type a number of
operations shouldn't be set. This patch fixes that.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D85101
This changes getDomMemoryDef to check if a Current is a valid
candidate for elimination before checking for reads. Before the change,
we were spending a lot of compile-time in checking for read accesses for
Current that might not even be removable.
This patch flips the logic, so we skip Current if they cannot be
removed before checking all their uses. This is much more efficient in
practice.
It also adds a more aggressive limit for checking partially overlapping
stores. The main problem with overlapping stores is that we do not know
if they will lead to elimination until seeing all of them. This patch
limits adds a new limit for overlapping store candidates, which keeps
the number of modified overlapping stores roughly the same.
This is another substantial compile-time improvement (while also
increasing the number of stores eliminated). Geomean -O3 -0.67%,
ReleaseThinLTO -0.97%.
http://llvm-compile-time-tracker.com/compare.php?from=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&to=2e630629b43f64b60b282e90f0d96082fde2dacc&stat=instructions
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86487
This patch adds support for memcmp in MemoryLocation::getForArgument.
memcmp reads from the first 2 arguments up to the number of bytes of the
third argument.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86725
strspn, strncmp, strcspn, strcasecmp, strncasecmp, memcmp, memchr,
memrchr, memcpy, memmove, memcpy, mempcpy, strchr, strrchr, bcmp
should all only access memory through their arguments.
I broke out strcoll, strcasecmp, strncasecmp because the result
depends on the locale, which might get accessed through memory.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86724
If there's no unwinding opcodes, omit writing the xdata/pdata records.
Previously, this generated truncated xdata records, and llvm-readobj
would error out when trying to print them.
If writing of an xdata record is forced via the .seh_handlerdata
directive, skip it if there's no info to make a sensible unwind
info structure out of, and clearly error out if such info appeared
later in the process.
Differential Revision: https://reviews.llvm.org/D86527
When collecting `i1` values via `findAllDefs`, ignore Constant's
operands, since Constant's operands might not be `i1`.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46923 which causes ICE
```
llvm-project/llvm/lib/IR/Constants.cpp:1924: static llvm::Constant *llvm::ConstantExpr::getZExt(llvm::Constant *, llvm::Type *, bool): Assertion `C->getType()->getScalarSizeInBits() < Ty->getScalarSizeInBits()&& "SrcTy must be smaller than DestTy for ZExt!"' failed.
```
Differential Revision: https://reviews.llvm.org/D85007
The introduction of find_library for ncurses caused more issues than it solved problems. The current open issue is it makes the static build of LLVM fail. It is better to revert for now, and get back to it later.
Revert "[CMake] Fix an issue where get_system_libname creates an empty regex capture on windows"
This reverts commit 1ed1e16ab83f55d85c90ae43a05cbe08a00c20e0.
Revert "Fix msan build"
This reverts commit 34fe9613dda3c7d8665b609136a8c12deb122382.
Revert "[CMake] Always mark terminfo as unavailable on Windows"
This reverts commit 76bf26236f6fd453343666c3cd91de8f74ffd89d.
Revert "[CMake] Fix OCaml build failure because of absolute path in system libs"
This reverts commit 8e4acb82f71ad4effec8895b8fc957189ce95933.
Revert "[CMake] Don't look for terminfo libs when LLVM_ENABLE_TERMINFO=OFF"
This reverts commit 495f91fd33d492941c39424a32cf24bcfe192f35.
Revert "Use find_library for ncurses"
This reverts commit a52173a3e56553d7b795bcf3cdadcf6433117107.
Differential revision: https://reviews.llvm.org/D86521
This reverts commit e53b799779b079a70f600e5cad2ab7267d66b1b7.
Confusingly, this does not simply and the two sets of known bits, but
implements known bits for the and operator.
Even if noundef is deduced for a position, we should not manifest it when the position is dead.
This is because the associated values with dead positions are replaced with undef values by AAIsDead.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86565
It's possible to have a single virtual register def with a subreg
index that would pass the previous check, but it's not possible to
have a subregister def in SSA.
This is in preparation for adding stricter checks for SSA MIR.
For StackLifetime after finding alloca we need to check that
values ponting to the begining of alloca.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D86692
Intrinsic declarations use the default subtarget, but this should be
using the subtarget for the calling function. I haven't been able to
come up with a case where it matters though.
As pointed out in post-commit review, this can legally be called
on instructions that are not inserted into basic blocks,
so don't blindly assume that there is basic block.
If we query an AA with `Attributor::getAAFor` in `AbstractAttribute::manifest`, the AA may be updated.
This patch makes use of the phase flag in Attributor, and handle `getAAFor` behavior according to the flag.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86635
This patch adjusts the following ARM/AArch64 LLVM IR intrinsics:
- neon_bfmmla
- neon_bfmlalb
- neon_bfmlalt
so that they take and return bf16 and float types. Previously these
intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from
implementation lacking bf16 IR type).
The neon_vbfdot[q] intrinsics are adjusted similarly. This change
required some additional selection patterns for vbfdot itself and
also for vector shuffles (in a previous patch) because of SelectionDAG
transformations kicking in and mangling the original code.
This patch makes the generated IR cleaner (less useless bitcasts are
produced), but it does not affect the final assembly.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D86146
Original Commit Message:
After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register)
may be placed in the middle of a stack frame if a function has both callee-saved
general-purpose registers and floating point registers. This will break the stack unwinders
that simply walk through the frame records (based on the guarantee from AAPCS64
"The Frame Pointer" section). This commit fixes the problem by adding the frame record offset.
Patch By: logan
Differential Revision: D70800
https://reviews.llvm.org/D83833
Patch adds two new GICombinerRules for G_SELECT. The rules include:
combining selects with undef comparisons into their first selectee value,
and to combine away selects with constant comparisons. Patch additionally
adds a new combiner test for the AArch64 target to test these new G_SELECT
combiner rules and the existing select_same_val combiner rule.
Patch by mkitzan
Add a new flag that indicates which stage in the process we are in.
This flag is introduced for handling behavior of `getAAFor` according to the stage. (discussed in D86635)
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86678
https://reviews.llvm.org/D86676
Sometimes we can have the following code
x:gpr(s32) = G_OP
Say we build G_OP2 to the same x and then delete the previous instruction. Using something like
Register X = ...;
auto NewMIB = CSEBuilder.buildOp2(X, ... args);
Currently there's a mismatch in how NewMIB is profiled and inserted into the CSEMap (ie it doesn't consider register bank/register class along with type).Unify the profiling by refactoring and calling the common method.
This was found by turning on the CSEInfo::verify in at the end of each of our GISel passes which turns inconsistent state/non determinism in CSEing into crashes which likely usually indicates missing calls to Observer on mutations (the most common case). Here non determinism usually means not cseing sometimes, but almost never about producing incorrect code.
Also this patch adds this verification at the end of the combiners as well.
When joining the legal parts of vector arguments into its original value
during the lower of Formal Arguments in SelectionDAGBuilder, the Calling
Convention information was not being propagated for the handling of each
individual parts. The same did not happen when lowering calls, causing a
mismatch.
This patch fixes the issue by properly propagating the Calling
Convention details.
This fixes Bugzilla #47001.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86715
See RFC for background:
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142744.html
Note that the runtime changes will be sent separately (hopefully this
week, need to add some tests).
This patch includes the LLVM pass to instrument memory accesses with
either inline sequences to increment the access count in the shadow
location, or alternatively to call into the runtime. It also changes
calls to memset/memcpy/memmove to the equivalent runtime version.
The pass is modeled on the address sanitizer pass.
The clang changes add the driver option to invoke the new pass, and to
link with the upcoming heap profiling runtime libraries.
Currently there is no attempt to optimize the instrumentation, e.g. to
aggregate updates to the same memory allocation. That will be
implemented as follow on work.
Differential Revision: https://reviews.llvm.org/D85948
Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify,
nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..
While i could teach EarlyCSE how to hash PHI nodes,
we can't really do much (anything?) even if we find two identical
PHI nodes in different basic blocks, same-BB case is the interesting one,
and if we teach InstSimplify about it (which is what i wanted originally,
https://reviews.llvm.org/D86530), we get EarlyCSE support for free.
So i would think this is pretty uncontroversial.
On vanilla llvm test-suite + RawSpeed, this has the following effects:
```
| statistic name | baseline | proposed | Δ | % | \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instsimplify.NumPHICSE | 0 | 23779 | 23779 | 0.00% | 0.00% |
| asm-printer.EmittedInsts | 7942328 | 7942392 | 64 | 0.00% | 0.00% |
| assembler.ObjectBytes | 273069192 | 273084704 | 15512 | 0.01% | 0.01% |
| correlated-value-propagation.NumPhis | 18412 | 18539 | 127 | 0.69% | 0.69% |
| early-cse.NumCSE | 2183283 | 2183227 | -56 | 0.00% | 0.00% |
| early-cse.NumSimplify | 550105 | 542090 | -8015 | -1.46% | 1.46% |
| instcombine.NumAggregateReconstructionsSimplified | 73 | 4506 | 4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined | 3640264 | 3664769 | 24505 | 0.67% | 0.67% |
| instcombine.NumDeadInst | 1778193 | 1783183 | 4990 | 0.28% | 0.28% |
| instcount.NumCallInst | 1758401 | 1758799 | 398 | 0.02% | 0.02% |
| instcount.NumInvokeInst | 59478 | 59502 | 24 | 0.04% | 0.04% |
| instcount.NumPHIInst | 330557 | 330533 | -24 | -0.01% | 0.01% |
| instcount.TotalInsts | 8831952 | 8832286 | 334 | 0.00% | 0.00% |
| simplifycfg.NumInvokes | 4300 | 4410 | 110 | 2.56% | 2.56% |
| simplifycfg.NumSimpl | 1019808 | 999607 | -20201 | -1.98% | 1.98% |
```
I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call`
transforms, and counter-intuitively results in *more* instructions total.
That being said, the PHI count doesn't decrease that much,
and looking at some examples, it seems at least some of them
were previously getting PHI CSE'd in SimplifyCFG of all places..
I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time.
As a comment in `InstCombinerImpl::visitPHINode()` already stated,
there are no guarantees on the ordering of the operands of a PHI node,
so if we just naively compare them, we may false-negatively say that
the nodes are not equal when the only difference is operand order,
which is especially important since the fold is in InstSimplify,
so we can't rely on InstCombine sorting them beforehand.
Fixing this for the general case is costly (geomean +0.02%),
and does not appear to catch anything in test-suite, but for
the same-BB case, it's trivial, so let's fix at least that.
As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
this appears to cause geomean +0.03% compile time increase (regression),
but geomean -0.01%..-0.04% code size decrease (improvement).
This patch optionally replaces the CRT allocator (i.e., malloc and free) with rpmalloc (mixed public domain licence/MIT licence) or snmalloc (MIT licence) or mimalloc (MIT licence). Please note that the source code for these allocators must be available outside of LLVM's tree.
To enable, use `cmake ... -DLLVM_INTEGRATED_CRT_ALLOC=D:/git/rpmalloc -DLLVM_USE_CRT_RELEASE=MT` where `D:/git/rpmalloc` has already been git clone'd from `https://github.com/mjansson/rpmalloc`. The same applies to snmalloc and mimalloc.
When enabled, the allocator will be embeded (statically linked) into the LLVM tools & libraries. This currently only works with the static CRT (/MT), although using the dynamic CRT (/MD) could potentially work as well in the future.
When enabled, this changes the memory stack from:
new/delete -> MS VC++ CRT malloc/free -> HeapAlloc -> VirtualAlloc
to:
new/delete -> {rpmalloc|snmalloc|mimalloc} -> VirtualAlloc
The goal of this patch is to bypass the application's global heap - which is thread-safe thus inducing locking - and instead take advantage of a modern lock-free, thread cache, allocator. On a 6-core Xeon Skylake we observe a 2.5x decrease in execution time when linking a large scale application with LLD and ThinLTO (12 min 20 sec -> 5 min 34 sec), when all hardware threads are being used (using LLD's flag /opt:lldltojobs=all). On a dual 36-core Xeon Skylake with all hardware threads used, we observe a 24x decrease in execution time (1 h 2 min -> 2 min 38 sec) when linking a large application with LLD and ThinLTO. Clang build times also see a decrease in the range 5-10% depending on the configuration.
Differential Revision: https://reviews.llvm.org/D71786
Currently we bail out early for strlen calls with a GEP operand, if none
of the GEP specific optimizations fire. But there could be later
optimizations that still apply, which we currently miss out on.
An example is that we do not apply the following optimization
strlen(x) == 0 --> *x == 0
Unless I am missing something, there seems to be no reason for bailing
out early there.
Fixes PR47149.
Reviewed By: lebedev.ri, xbolva00
Differential Revision: https://reviews.llvm.org/D85886
This applies the same fix that D84748 did for macro definitions.
Appropriate include path is now automatically set for all libraries
which link against gtest targets, which avoids the need to set
include_directories in various parts of the project.
Differential Revision: https://reviews.llvm.org/D86616
This is the first of a set of DAGCombiner changes enabling strictfp
optimizations. I want to test to waters with this to make sure changes
like these are acceptable for the strictfp case- this particular change
should preserve exception ordering and result precision perfectly, and
many other possible changes appear to be able to as well.
Copied from regular fadd combines but modified to preserve ordering via
the chain, this change allows strict_fadd x, (fneg y) to become
struct_fsub x, y and strict_fadd (fneg x), y to become strict_fsub y, x.
Differential Revision: https://reviews.llvm.org/D85548
This reverts commit b9d977b0ca60c54f11615ca9d144c9f08b29fd85.
This cutoff is no longer required. The commit 34ffa7fc501 (D86153) introduces a
performance improvement which was tested against the motivating case for this
patch.
Discussed in differential revision: https://reviews.llvm.org/D86153
Almost NFC (see end).
The backwards scan in validThroughout significantly contributed to compile time
for a pathological case, causing the 'X86 Assembly Printer' pass to account for
roughly 70% of the run time. This patch guards the loop against running
unnecessarily, bringing the pass contribution down to 4%.
Almost NFC: There is a hack in validThroughout which promotes single constant
value DBG_VALUEs in the prologue to be live throughout the function. We're more
likely to hit this code path with this patch applied. Similarly to the parent
patches there is a small coverage change reported in the order of 10s of bytes.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D86153
With the changes introduced in D86151 we can now check for single locations
which span multiple blocks for inlined scopes and blocks.
D86151 introduced the InstructionOrdering parameter, replacing a scan through
MBB instructions. The functionality to compare instruction positions across
blocks was add there, and this patch just removes the exit checks that were
previously (but no longer) required.
CTMark shows a geomean binary size reduction of 2.2% for RelWithDebInfo builds.
llvm-locstats (using D85636) shows a very small variable location coverage
change in 5 of 10 binaries, but just like in D86151 it is only in the order of
10s of bytes.
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D86152
With this patch we're now accounting for two more cases which should be
considered 'valid throughout': First, where RangeEnd is ScopeEnd. Second, where
RangeEnd comes before ScopeEnd when including meta instructions, but are both
preceded by the same non-meta instruction.
CTMark shows a geomean binary size reduction of 1.5% for RelWithDebInfo builds.
`llvm-locstats` (using D85636) shows a very small variable location coverage
change in 2 of 10 binaries, but it is in the order of 10s of bytes which lines
up with my expectations.
I've added a test which checks both of these new cases. The first check in the
test isn't strictly necessary for this patch. But I'm not sure that it is
explicitly tested anywhere else, and is useful for the final patch in the
series.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D86151
Group the map and methods used to query instruction ordering for trimVarLocs
(D82129) into a class. This will make it easier to reuse the functionality
upcoming patches.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D86150
This patch adds code to recognize vector shuffles which can be
represented as VDUP (splat) of a vector lane with of a different
(wider) type than the original vector lane type.
For example:
shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
is essentially:
shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 0, i32 0>
Such patterns are generated by the SelectionDAG machinery in some cases
(see DAGCombiner::visitBITCAST in DAGCombiner.cpp, the "Remove double
bitcasts from shuffles" part).
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D86225
This patch adds type information for SVE ACLE vector types,
by describing them as vectors, with a lower bound of 0, and
an upper bound described by a DWARF expression using the
AArch64 Vector Granule register (VG), which contains the
runtime multiple of 64bit granules in an SVE vector.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86101
Since the canonical floatig-point move is fsgnj rd, rs, rs, we should
handle this case in RISCVInstrInfo::isAsCheapAsAMove().
Reviewed By: lenary
Differential Revision: https://reviews.llvm.org/D86518
The isTriviallyRematerializable hook is only called for instructions that are
tagged as isAsCheapAsAMove. Since ADDI 0 is used for "mv" it should definitely
be marked with "isAsCheapAsAMove". This change avoids one stack spill in most of
the atomic-rmw.ll tests functions. It also avoids stack spills in two of our
out-of-tree CHERI tests.
ORI/XORI with zero may or may not be the same as a move micro-architecturally,
but since we are already doing it for register == x0, we might as well
do the same if the immediate is zero.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D86480
For DSE with MemorySSA it is beneficial to manually traverse the
defining access, instead of using a MemorySSA walker, so we can
better control the number of steps together with other limits and
also weed out invalid/unprofitable paths early on.
This patch requires a follow-up patch to be most effective, which I will
share soon after putting this patch up.
This temporarily XFAIL's the limit tests, because we now explore more
MemoryDefs that may not alias/clobber the killing def. This will be
improved/fixed by the follow-up patch.
This patch also renames some `Dom*` variables to `Earlier*`, because the
dominance relation is not really used/important here and potentially
confusing.
This patch allows us to aggressively cut down compile time, geomean
-O3 -0.64%, ReleaseThinLTO -1.65%, at the expense of fewer stores
removed. Subsequent patches will increase the number of removed stores
again, while keeping compile-time in check.
http://llvm-compile-time-tracker.com/compare.php?from=d8e3294118a8c5f3f97688a704d5a05b67646012&to=0a929b6978a068af8ddb02d0d4714a2843dd8ba9&stat=instructions
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86486
This reverts commit 8d5f64c4edbc190a5a8790157fa1d99cfac34016.
Thanks to Eli Friedma for pointing out that this check is not appropiate here,
this check will be moved to the Lint pass.
There is no justification for changing vcc_lo to vcc
when shrinking V_CNDMASK, and such a change could
later confuse live variable analysis.
Make sure the original register is preserved.
Differential Revision: https://reviews.llvm.org/D86541
Currently, an undef value is reduced to 0 when it is added to a set of potential values.
This patch introduces a flag for under values. By this, for example, we can merge two states `{undef}`, `{1}` to `{1}` (because we can reduce the undef to 1).
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85592
Enable default outlining when the function has the minsize attribute
and we're targeting an m-class core.
Differential Revision: https://reviews.llvm.org/D82951
Implements the assemble and disassemble support of RISCV Vector
extension zvamo instructions, base on the 0.9 spec version.
Reviewed by HsiangKai
Differential Revision: https://reviews.llvm.org/D85069
Fix the ARM backend's analyzeBranch so it doesn't ignore predicated
return instructions, and make the MachineVerifier rule more strict.
Differential Revision: https://reviews.llvm.org/D40061
This patch implements the function prototypes vec_mulh and vec_dive in order to
utilize the vector multiply high (vmulh[s|u][w|d]) and vector divide extended
(vdive[s|u][w|d]) instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82609
AArch64, X86 and Mips currently directly consumes these and custom
lowering to produce a libcall, but really these should follow the
normal legalization process through the libcall/lower action.
As discussed in
http://lists.llvm.org/pipermail/llvm-dev/2020-July/143801.html.
Currently no users outside of unit tests.
Replace all instances in tests of -constprop with -instsimplify.
Notable changes in tests:
* vscale.ll - @llvm.sadd.sat.nxv16i8 is evaluated by instsimplify, use a fake intrinsic instead
* InsertElement.ll - insertelement undef is removed by instsimplify in @insertelement_undef
llvm/test/Transforms/ConstProp moved to llvm/test/Transforms/InstSimplify/ConstProp
Reviewed By: lattner, nikic
Differential Revision: https://reviews.llvm.org/D85159
Summary:
When looking for all reaching definitions, we sort basic blocks on dominance. When sorting looking for properlyDominates() handles the case A == B.
Authored by: pranavb
Differential Revision: https://reviews.llvm.org/D86661
This is a reboot of D84655, now performing the inner icmp
simplification query without undef folds.
It should be possible to handle the current foldMinMaxSharedOp()
fold based on this, by moving the logic into icmp of min/max instead,
making it more general. We can't drop the folds for constant operands,
because those also allow undef, which we exclude here.
The tests use assumes for exhaustive coverage, and have a few
more examples of misc folds we get based on icmp simplification.
Differential Revision: https://reviews.llvm.org/D85929
Original Commit Message:
After the commit r368987 (rG643adb55769e) was landed, the frame record (FP and LR register)
may be placed in the middle of a stack frame if a function has both callee-saved
general-purpose registers and floating point registers. This will break the stack unwinders
that simply walk through the frame records (based on the guarantee from AAPCS64
"The Frame Pointer" section). This commit fixes the problem by adding the frame record offset.
Patch By: logan
We have a gap in our store merging capabilities for shift+truncate
patterns as discussed in:
https://llvm.org/PR46662
I generalized the code/comments for this function in earlier commits,
so we only need ease the type restriction and adjust the address/endian
checking to make this work.
AArch64 lets us switch endian to make sure that patterns are matched
either way.
Differential Revision: https://reviews.llvm.org/D86420
`GetFinalPathNameByHandleW(,,N,)` returns:
- `< N` on success (this value does not include the size of the terminating null character)
- `>= N` if buffer is too small (this value includes the size of the terminating null character)
So, when `N == Buffer.capacity() - 1`, we need to resize buffer if return value is > `Buffer.capacity() - 2`.
Also, we can set `N` to `Buffer.capacity()`.
Thus, without this patch `realPathFromHandle()` returns unfilled buffer when length of the final path of the file is equal to `Buffer.capacity()` or `Buffer.capacity() - 1`.
Reviewed By: andrewng, amccarth
Differential Revision: https://reviews.llvm.org/D86564
InstSimplify should do all transformations that ConstProp does, but
one thing that ConstProp does that InstSimplify wouldn't is inline
vector instructions that are constants, e.g. into a ret.
Previously vector instructions wouldn't be inlined in InstSimplify
because llvm::Simplify*Instruction() would return nullptr for specific
instructions, such as vector instructions that were actually constants,
if it couldn't simplify them.
This changes SimplifyInsertElementInst, SimplifyExtractElementInst, and
SimplifyShuffleVectorInst to return a vector constant when possible.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85946
The version of `st1d` that operates with vector plus immediate
addressing mode uses the alias `st1d { <Zn>.d }, <Pg>, [<Za>.d]` for
rendering `st1d { <Zn>.d }, <Pg>, [<Za>.d, #0]`. The disassembler was
generating `<Zn>.s` instead of `<Zn>.d>`.
Differential Revision: https://reviews.llvm.org/D86633