This patch implements the builtins for Vector Multiply Builtins (vmulxxd family of instructions), and adds the appropriate test cases for these builtins. The builtins utilize the vector multiply instructions itnroduced with ISA 3.1.
Differential Revision: https://reviews.llvm.org/D83955
Summary of changes:
- Changed parser to eliminate generation of excessive error messages;
- Corrected lit tests to match all expected error messages;
- Corrected lit tests to guard against unwanted extra messages (added option "--implicit-check-not=error:");
- Added missing checks and fixed some typos in tests.
See bug 46907: https://bugs.llvm.org/show_bug.cgi?id=46907
Reviewers: arsenm, rampitec
Differential Revision: https://reviews.llvm.org/D86940
Add support for line continuations (the "backslash operator") in MASM by modifying the Parser's Lex method.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D83347
If the PSHUFBs have no other uses, then we can force the unselected elements to zero to OR them instead, avoiding both an extra mask load and a costly variable blend.
Eventually we should try to bring this into shuffle combining, once we can more easily convert between shuffles + select patterns.
In IPSCCP when a function is optimized to return undef, it should clear the returned attribute for all its input arguments
and its corresponding call sites.
The bug is exposed when the value of an input argument of the function is assigned to a physical register and
because of the argument having a returned attribute, the value of this physical register will continue to be used
as the function return value right after the call instruction returns, even if the value that this register holds may
be clobbered during the function call. This potentially results in incorrect values being used afterwards.
Reviewed By: jdoerfert, fhahn
Differential Revision: https://reviews.llvm.org/D84220
After computing dependence, we check if it is safe to hoist by
identifying if it clobbers any liveIns in the sibling block (NullSucc).
This check is moved to its own function which will be used in the
soon-to-be modified dependence checking algorithm for implicit null
checks pass.
Tests-Run: lit tests on X86/implicit-*
Separated out some checks in isSuitableMemoryOp and added comments
explaining why some of those checks are done.
Tests-Run:X86 implicit null checks tests.
When marking a global variable constant, and simplifying users using
CleanupConstantGlobalUsers(), the pass could incorrectly return false if
there were still some uses left, and no further optimizations was done.
This was caught using the check introduced by D80916.
This fixes PR46749.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D85837
These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)":
1.0/sqrt(X) * X => X/sqrt(X)
X * 1.0/sqrt(X) => X/sqrt(X)
We already handle more general cases, and we are intentionally not creating extra (and likely expensive)
fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce
X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags.
Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X).
Differential Revision: https://reviews.llvm.org/D86726
This is an enhancement to D81766 to allow loading the minimum target
vector type into an IR vector with a different number of elements.
In one of the motivating tests from PR16739, SLP creates <2 x float>
load ops mixed with <4 x float> insert ops, so we want to handle that
pattern in addition to potential oversized vectors created by the
vectorizers.
For now, we are assuming the insert/extract subvector with undef is
free because there is no exact corresponding TTI modeling for that.
Differential Revision: https://reviews.llvm.org/D86160
When lowering fixed length vector operations for SVE the subvector
operations are used extensively to marshall data between scalable
and fixed-length vectors. This means that sequences like:
extract_subvec(binop(insert_subvec(a), insert_subvec(b)))
are very common. DAGCombine only checks if the resulting binop is
legal or can be custom lowered when undoing such sequences. When
it's custom lowering that is introducing them the result is an
infinite legalise->combine->legalise loop.
This patch extends the isOperationLegalOr... functions to include
a "LegalOnly" parameter to restrict the check to legal operations
only. Although isOperationLegal could be used it's common for
the affected code paths to be visited pre and post legalisation,
so the extra parameter keeps the code tidy.
Differential Revision: https://reviews.llvm.org/D86450
The addend in a REL32 reloc needs to be adjusted to account for the
offset from the PC value returned by the s_getpc instruction to the
point where the reloc is applied. This was being done correctly for
(GOTPC)REL32_LO but not for (GOTPC)REL32_HI. This will only make a
difference if the target symbol happens to get loaded almost exactly
a multiple of 4G away from the relocated instructions.
Differential Revision: https://reviews.llvm.org/D86938
Unwinders may only preserve the lower 64bits of Neon and SVE registers,
as only the registers in the base ABI are guaranteed to be preserved
over the exception edge. The caller will need to preserve additional
registers for when the call throws an exception and the unwinder has
tried to recover state.
For e.g.
svint32_t bar(svint32_t);
svint32_t foo(svint32_t x, bool *err) {
try { bar(x); } catch (...) { *err = true; }
return x;
}
`z0` needs to be spilled before the call to `bar(x)` and reloaded before
returning from foo, as the exception handler may have clobbered z0.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84737
As stated in section 6.1.1.2, DWARFv5, p. 142,
| The last entry for each name is followed by a zero byte that
| terminates the list. There may be gaps between the lists.
The patch changes emitting a 4-byte zero value to a 1-byte one, which
effectively removes the gap between entry lists, and thus saves
approximately 3 bytes per name; the calculation is not exact because
the total size of the table is aligned to 4.
Differential Revision: https://reviews.llvm.org/D86927
The member is not in use; the unit length for the table is emitted as
a difference between two labels. Moreover, the type of the member might
be misleading, because for DWARF64 the field should be 64 bit long.
Differential Revision: https://reviews.llvm.org/D86912
This patch uses partial DemandedElts masks to further simplify target shuffle chains and finally starts making target shuffle combining part of SimplifyDemandedBits/SimplifyDemandedVectorElts.
We already manage this for Depth == 0 cases, where combineX86ShuffleChain would early-out if the shuffle combined to the same op, but the patch generalizes this by manipulating the depth handling of combineX86ShufflesRecursively - calling with a new Depth = 0 and reducing the maximum shuffle combine depth accordingly.
Differential Revision: https://reviews.llvm.org/D66004
This patch makes it possible for AAUB to use information from AANoUndef.
This is the next patch of D86983
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86984
When the associated value is undef, we immediately forced to indicate a pessimistic fixpoint so far.
This patch changes the initialization to check the attribute given in IR at first and to indicate an optimistic fixpoint when it is given.
This change will enable us to catch , for example, the following case in AAUB.
```
call void @foo(i32 noundef undef)
```
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86983
Before upstream a new target called CSKY, make a new triple of that called Triple::csky.
For now, it's a 32-bit little endian target and the detail can be referred at D86269.
This is the split part of D86269, which add a new target called CSKY.
Differential Revision: https://reviews.llvm.org/D86505
If there's no initializer symbol in the current MaterializationResponsibility
then bail out without installing JITLink passes: they're going to be no-ops
anyway.
A think-o in the existing code meant that dependencies were never registered.
This failure could lead to crashes rather than orderly error propagation if
initialization dependencies failed to materialize.
No test case: The bug was discovered in an out-of-tree code and requires
pathalogically misconfigured JIT to generate the original error that lead to
the crash.
This patch adds a helper function DumpStrSection to simplify codes.
Besides, nonprintable chars in debug_str and debug_str.dwo sections
are printed as escaped chars.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D86918
Summary:
Analyses are preserved in MemCpyOptimizer.
Get analyses before running the pass and store the pointers, instead of
using lambdas and getting them every time on demand.
Reviewers: lenary, deadalnix, mehdi_amini, nikic, efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74494
This patch adds an initial, incomeplete and unsound implementation of
canReplacePointersIfEqual to check if a pointer value A can be replaced
by another pointer value B, that are deemed to be equivalent through
some means (e.g. information from conditions).
Note that is in general not sound to blindly replace pointers based on
equality, for example if they are based on different underlying objects.
LLVM's memory model is not completely settled as of now; see
https://bugs.llvm.org/show_bug.cgi?id=34548 for a more detailed
discussion.
The initial version of canReplacePointersIfEqual only rejects a very
specific case: replacing a pointer with a constant expression that is
not dereferenceable. Such a replacement is problematic and can be
restricted relatively easily without impacting most code. Using it to
limit replacements in GVN/SCCP/CVP only results in small differences in
7 programs out of MultiSource/SPEC2000/SPEC2006 on X86 with -O3 -flto.
This patch is supposed to be an initial step to improve the current
situation and the helper should be made stricter in the future. But this
will require careful analysis of the impact on performance.
Reviewed By: aqjune
Differential Revision: https://reviews.llvm.org/D85524
Interleave for small loops that have reductions inside,
which breaks dependencies and expose.
This gives very significant performance improvements for some benchmarks.
Because small loops could be in very hot functions in real applications.
Differential Revision: https://reviews.llvm.org/D81416
Previously if the source match we asserted that the destination
matched. But GPR <-> mask register copies on X86 can violate this
since we use the same K-registers for multiple sizes.
Fixes this ISPC issue https://github.com/ispc/ispc/issues/1851
Differential Revision: https://reviews.llvm.org/D86507
This reverts commit bc9a29b9ee6ade4894252b1470977142c32b4602.
The reasoning that this patch was wrong was itself incorrect
(see discussion on llvm-commits). This patch does seem to be exposing
a latent SVE code generation bug on non-public tests, which should
not block a correctness fix for public, non-SVE use cases.
General purpose registers 30 and 31 are handled differently when they are
reserved as the base-pointer and frame-pointer respectively. This fixes the
offset of their fixed-stack objects when there are fpr calle-saved registers.
Differential Revision: https://reviews.llvm.org/D85850
I was recently debugging a similar issue to https://reviews.llvm.org/D86500 only with a large metadata section. Only after I finished debugging it did I discover it was fixed very recently.
My version of the fix was going to alignTo since that uses uint64_t and improves the readability of the code. So I though I would go ahead and share it.
Differential Revision: https://reviews.llvm.org/D86957
This is needed for an upcoming change to how we translate conditional branches
which might generate these.
Differential Revision: https://reviews.llvm.org/D86383
These instructions actually use a 512-byte location, where bytes 464-511 are ignored.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D86942
On -O0, i1 strict_fsetcc will be promoted to i32. We don't handle that
in TD patterns. This patch fills logic in PPCISelDAGToDAG to handle more
cases.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D86595