The width of the APSInt values should be the width of an element.
getCharByteWidth returns the size of an element in _host_ bytes, which
makes the width N times greater, where N is the ratio between target's
CHAR_BIT and host's CHAR_BIT.
This is NFC for in-tree targets because all of them have CHAR_BIT == 8.
Reviewed By: cor3ntin, aaron.ballman
Differential Revision: https://reviews.llvm.org/D154773
The vendors of the MSVC STL, libstdc++ and libc++ have agreed [1] to
make the C++23 modules std and std.compat available in C++20. This
provides the std module; libc++ has not implemented the std.compat
module yet.
[1] https://github.com/microsoft/STL/issues/3945
Depends on D158357
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D158358
Each vslide1down operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique inserts, each with a cost linear in LMUL, the overall cost is O(VL*LMUL). Since VL is a linear function of LMUL, this means the current lowering is quadradic in both LMUL and VL. To avoid the degenerate case, fallback to the stack if the cost is more than a fixed (linear) threshold.
For context, here's the sifive-x280 llvm-mca results for the current lowering and stack based lowering for each LMUL (using e64). Assumes code was compiled for V (i.e. zvl128b).
buildvector_m1_via_stack.mca:Total Cycles: 1904
buildvector_m2_via_stack.mca:Total Cycles: 2104
buildvector_m4_via_stack.mca:Total Cycles: 2504
buildvector_m8_via_stack.mca:Total Cycles: 3304
buildvector_m1_via_vslide1down.mca:Total Cycles: 804
buildvector_m2_via_vslide1down.mca:Total Cycles: 1604
buildvector_m4_via_vslide1down.mca:Total Cycles: 6400
buildvector_m8_via_vslide1down.mca:Total Cycles: 25599
There are other schemes we could use to cap the cost. The next best is recursive decomposition of the vector into smaller LMULs. That's still quadratic, but with a better constant. However, stack based seems to cost better on all LMULs, so we can just go with the simpler scheme.
Arguably, this patch is fixing a regression introduced with my D149667 as before that change, we'd always fallback to the stack, and thus didn't have the non-linearity.
Differential Revision: https://reviews.llvm.org/D159332
Introduce a simple conversion of a memref.alloc/dealloc pair into an
alloca in the same scope. Expose it as a transform op and a pattern.
Allocas typically lower to stack allocations as opposed to alloc/dealloc
that lower to significantly more expensive malloc/free calls. In
addition, this can be combined with allocation hoisting from loops to
further improve performance.
Some flags named "IsLTO" and "IsThinLTO" implied they described
compilation modes, but with Unified LTO this is no longer true. Rename
these to "PrepForXXX" to be less confusing to readers. Also, deleted
"IsThinOrUnifiedLTO" because Unified implies PrepareForThinLTO.
The current generated ClangCommandLineReference.rst unconditionally
escapes underscores. This leads odd output on the website
https://clang.llvm.org/docs/ClangCommandLineReference.html
For example
-fchar8\_t, -fno-char8\_t
Whether an underscore should be escaped depends on the state. Currently
the escape routine doesn't keep track of the state and currently
underscores are not used in places where they need to be escaped.
Therefore remove the underscore from the list of escaped characters.
Linking the driver binary can take a long time, particularly with debug
info. This adds an option to disable the driver build meant for local use.
Differential Revision: https://reviews.llvm.org/D159130
As reported by mgorny at https://reviews.llvm.org/D159115#4638037, the
unsigned long long cast fails on 32-bit systems at least with GCC.
It looks like a pointer provenance/aliasing issue rather than a bug in GCC.
Acked by Vassil Vassilev on the original revision.
Add a dedicated debug location to VPRecipeBase to remove another
unneeded use of the underlying LLVM IR instruction and also consolidate
various DL fields in sub classes.
Each recipe can have debug location and it shouldn't rely on reference
to the underlying LLVM IR instructions to retain it. See various recipes
that had separate DL fields already.
Few printf config options have been setup using this new config system
along with their baremetal overrides. A follow up patch will add generation
of doc/config.rst, which will contain the full list of libc config options
and short description explaining how they affect the libc.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D159158
Re-application of the Dexter builder removal reversed due to continued
errors on the green dragon LLDB buildbot:
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/59716/
Cause of the error is unclear, but it looks as though there is some
unexpected non-determinism in the test failures.
This reverts commit 323270451d.
When having <N x t> d1, unused = unmerge(G_EXT <2*N x t> v1, undef, N),
it is possible to express it just as unused, d1 = unmerge v1.
It is useful for tackling regressions in arm64-vcvt_f.ll, introduced in
https://reviews.llvm.org/D144670.
When having <N x t> d1, unused = unmerge(G_EXT <2*N x t> v1, undef, N),
it is possible to express it just as unused, d1 = unmerge v1.
It is useful for tackling regressions in arm64-vcvt_f.ll, introduced in
https://reviews.llvm.org/D144670.
This patch pairs a promised interface with the object (Op/Attr/Type/Dialect) requesting the promise, ie:
```
declarePromisedInterface<MyAttr, MyInterface>();
```
Allowing to make fine grained promises. It also adds a mechanism to query if `Op/Attr/Type` has an specific promise returning true if the promise is there or if an implementation has been added. Finally it adds a couple of `Attr|TypeConstraints` that can be used in ODS to query if the promise or an implementation is there.
This patch tries to solve 2 issues:
1. Different entities cannot use the same promise.
```
declarePromisedInterface<MyInterface>();
// Resolves a promise.
MyAttr1::attachInterface<MyInterface>(ctx);
// Doesn't resolves a promise, as the previous attachment removed the promise.
MyAttr2::attachInterface<MyInterface>(ctx);
```
2. Is not possible to query if a promise has been declared.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D158464
It's very important that the GPU build does not include any system
directories. We currently use `-ffreestanding` to disable a lot of these
features, but we can still accidentally include them if they are not
provided by `libc` yet. This patch changes this to use `-nostdinc` to
disable all standard search paths. Then we use the compiler's resource
directory to pick up the provided headers like `stdint.h`.
Differential Revision: https://reviews.llvm.org/D159445
If there are copy instructions between uaddlv and dup for transfer from gpr to
fpr, try to remove them with duplane.
Differential Revision: https://reviews.llvm.org/D159267
Using https://github.com/actions/labeler, this add a workflow to
automatically label PRs, in hope to reduce the work needed to triage new
PRs.
"new-prs-labeler.yml" has been seeded taking inspiration from the
CODEOWNERS file when there was an existing corresponding label on the
issue tracker.
This patch:
* Replaces `andymckay/labeler` which does not appear to be maintained by
github official solution
* Removes the closed issue workflow which was disabled a few years ago
and never fixed.
* Adds a few rules to add label based on PR title, hopefully that can
make triaging simpler. If that turns out to be useful, we can consider
adding more rules for backends, etc. We could technically also pattern
match the body of the issue but I'm concerned about trying to be _too_
clever.
The new system is only triggered on PR open so manual labels should not
be removed.
Functional-style cast (i.e. a simple-type-specifier or typename-specifier
followed by a parenthesize single expression [expr.type.conv]) is equivalent
to the C-style cast, so that makes sense they have identical behavior
including warnings.
This also matches GCC https://godbolt.org/z/b8Ma9Thjb.
Reviewed By: rnk, aaron.ballman
Differential Revision: https://reviews.llvm.org/D159133
In some cases where the same mask is used for multiple
extending masked loads it can be more efficient to combine
the zero- or sign-extend into the load even if it's not a
legal or custom operation. This leads to splitting up the
extending load into smaller parts, which also requires
splitting the mask. For SVE at least this improves the
performance of the SPEC benchmark x264 slightly on
neoverse-v1 (~0.3%), and at least one other benchmark
improves by around 30%. The uplift for SVE seems due to
removing the dependencies (vector unpacks) introduced
between the loads and the vector operations, since this
should increase the level of parallelism.
See tests:
CodeGen/AArch64/sve-masked-ldst-sext.ll
CodeGen/AArch64/sve-masked-ldst-zext.ll
https://reviews.llvm.org/D159191
I've added some missing tests for the following cases:
1. Zero- and sign-extends from unpacked vector types to wide,
illegal types. For example,
%aext = zext <vscale x 4 x i8> %a to <vscale x 4 x i64>
2. Normal loads combined with 1
3. Masked loads combined with 1
Differential Revision: https://reviews.llvm.org/D159192
This always succeeds. While I'm here, document why we check the size
of p0 against the value of VG.
Reviewed By: omjavaid
Differential Revision: https://reviews.llvm.org/D157845
Rename CheckBaseDerived to something more general and call it in
GetPtrField() as well, so we don't crash later in Pointer::toAPValue().
Differential Revision: https://reviews.llvm.org/D149149
This adds some commonly-used instruction aliases from various sources:
- GNU
- SPARCv9 manual
- JPS1 ASR names
Reviewed By: barannikov88
Differential Revision: https://reviews.llvm.org/D157236