Tensor pack operations are optimistically lowered to pad + insert_slice
when the pack operation only pads the input tensor. The existing
lowering emits insert_slice operations which do not meet the
rank-reducibility requirements of insert_slice.
This change updates the logic in linalg::lowerPack to first check the
rank-reducibility requirement. When the requirement is not met, the
lowering will emit the full sequence of pad + expand + transpose.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D159382
The primary motivation is to we have a simple mechanism to extract
values from attributes in folders and canon patterns without having to
re-fold constants or write nested conditions over attribute types.
Matching over attributes composes especially well with fold adaptors.
Update folds in Arith and SPIRV dialects to match over attributes, where
applicable.
Reviewed By: mehdi_amini, zero9178
Differential Revision: https://reviews.llvm.org/D159437
Support environments where logical types do not necessarily correspond to allowed storage access types.
Also make pattern match failures more descriptive.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D159386
As reported by @uweigand in https://reviews.llvm.org/D158883:
```
The newly added test cases in ffp-model.c fail on SystemZ, making CI red:
https://lab.llvm.org/buildbot/#/builders/94/builds/16280
The root cause seems to be that by default, the SystemZ back-end targets
a machine without SIMD support, and therefore vector return types are
passed via implicit reference according to the ABI
```
this uses manual stores instead of vector returns.
Fixing the clang-format crash with the segmentation fault error when
formatting code with nested namespaces.
Fixes#64701.
Differential Revision: https://reviews.llvm.org/D158363
When a handle to an error_category singleton object is used during the
termination phase of a program, the destruction of the error_category
object may have occurred prior to execution of the current destructor
or function registered with atexit, because the singleton object may
have been constructed after the corresponding initialization or call
to atexit. For example, the updated tests from this patch will fail if
using a libc++ built using a compiler that updates the vtable of the
object on destruction.
This patch attempts to avoid the issue by causing the destructor to not
be called in the style of ResourceInitHelper in src/experimental/memory_resource.cpp.
This approach might not work if object lifetime is strictly enforced.
Differential Revision: https://reviews.llvm.org/D65667
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
WriteVFWRedOV_From and WriteVFWRedV_From SchedWrite classes exist
already, but no pseudos were using the ordered SchedWrite. This change
makes it so that the VFWREDOSUM pseudo used the ordered VFW SchedWrite.
With HLFIR the lbounds for the ALLOCATABLE result are taken from the
mutable box created for the result, so the non-default lbounds might be
propagated further causing incorrect result, e.g.:
```
program p
real, allocatable :: p5(:)
allocate(p5, source=real_init())
print *, lbound(p5, 1) ! must print 1, but prints 7
contains
function real_init()
real, allocatable :: real_init(:)
allocate(real_init(7:8))
end function real_init
end program p
```
With FIR lowering the box passed for `source` has explicit lower bound 1
at the call site, but the runtime box initialized by `real_init` call
still has lower bound 7. I am not sure if the runtime box initialized by
`real_init` will ever be accessed in a debugger via Fortran variable
names, but I think that having the right runtime bounds that can be
accessible via examining registers/stack might be good in general. So I
decided to update the runtime bounds at the point of return.
This change fixes the test above for HLFIR.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D156187
Compiling with TLS variables requires -pthread, but if the user omits this
option, the compiler will not show any obvious indication during compilation
that -pthread is needed for programs using TLS variables. Instead, the user will
experience a segmentation fault when running programs with TLS variables in them
and without specifying -pthread.
This patch aims to generate .extern/.ref references to __tls_get_addr[DS] for
local-exec accesses, in order to trigger an error from the linker to indicate
that there is an undefined symbol to __tls_get_addr. Doing so will remind the
user to compile/link with -pthread.
Differential Revision: https://reviews.llvm.org/D151335
Support printing of symbols for MachO of `MH_FILESET` type.
This is achieved by extending `dumpSymbolNamesFromObject`
to encompass fileset handling, and including an offset in
`MachOObjectFile` class to locate embedded MachO headers.
Differential Revision: https://reviews.llvm.org/D159294
The width of the APSInt values should be the width of an element.
getCharByteWidth returns the size of an element in _host_ bytes, which
makes the width N times greater, where N is the ratio between target's
CHAR_BIT and host's CHAR_BIT.
This is NFC for in-tree targets because all of them have CHAR_BIT == 8.
Reviewed By: cor3ntin, aaron.ballman
Differential Revision: https://reviews.llvm.org/D154773
The vendors of the MSVC STL, libstdc++ and libc++ have agreed [1] to
make the C++23 modules std and std.compat available in C++20. This
provides the std module; libc++ has not implemented the std.compat
module yet.
[1] https://github.com/microsoft/STL/issues/3945
Depends on D158357
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D158358
Each vslide1down operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique inserts, each with a cost linear in LMUL, the overall cost is O(VL*LMUL). Since VL is a linear function of LMUL, this means the current lowering is quadradic in both LMUL and VL. To avoid the degenerate case, fallback to the stack if the cost is more than a fixed (linear) threshold.
For context, here's the sifive-x280 llvm-mca results for the current lowering and stack based lowering for each LMUL (using e64). Assumes code was compiled for V (i.e. zvl128b).
buildvector_m1_via_stack.mca:Total Cycles: 1904
buildvector_m2_via_stack.mca:Total Cycles: 2104
buildvector_m4_via_stack.mca:Total Cycles: 2504
buildvector_m8_via_stack.mca:Total Cycles: 3304
buildvector_m1_via_vslide1down.mca:Total Cycles: 804
buildvector_m2_via_vslide1down.mca:Total Cycles: 1604
buildvector_m4_via_vslide1down.mca:Total Cycles: 6400
buildvector_m8_via_vslide1down.mca:Total Cycles: 25599
There are other schemes we could use to cap the cost. The next best is recursive decomposition of the vector into smaller LMULs. That's still quadratic, but with a better constant. However, stack based seems to cost better on all LMULs, so we can just go with the simpler scheme.
Arguably, this patch is fixing a regression introduced with my D149667 as before that change, we'd always fallback to the stack, and thus didn't have the non-linearity.
Differential Revision: https://reviews.llvm.org/D159332
Introduce a simple conversion of a memref.alloc/dealloc pair into an
alloca in the same scope. Expose it as a transform op and a pattern.
Allocas typically lower to stack allocations as opposed to alloc/dealloc
that lower to significantly more expensive malloc/free calls. In
addition, this can be combined with allocation hoisting from loops to
further improve performance.
Some flags named "IsLTO" and "IsThinLTO" implied they described
compilation modes, but with Unified LTO this is no longer true. Rename
these to "PrepForXXX" to be less confusing to readers. Also, deleted
"IsThinOrUnifiedLTO" because Unified implies PrepareForThinLTO.
The current generated ClangCommandLineReference.rst unconditionally
escapes underscores. This leads odd output on the website
https://clang.llvm.org/docs/ClangCommandLineReference.html
For example
-fchar8\_t, -fno-char8\_t
Whether an underscore should be escaped depends on the state. Currently
the escape routine doesn't keep track of the state and currently
underscores are not used in places where they need to be escaped.
Therefore remove the underscore from the list of escaped characters.
Linking the driver binary can take a long time, particularly with debug
info. This adds an option to disable the driver build meant for local use.
Differential Revision: https://reviews.llvm.org/D159130
As reported by mgorny at https://reviews.llvm.org/D159115#4638037, the
unsigned long long cast fails on 32-bit systems at least with GCC.
It looks like a pointer provenance/aliasing issue rather than a bug in GCC.
Acked by Vassil Vassilev on the original revision.
Add a dedicated debug location to VPRecipeBase to remove another
unneeded use of the underlying LLVM IR instruction and also consolidate
various DL fields in sub classes.
Each recipe can have debug location and it shouldn't rely on reference
to the underlying LLVM IR instructions to retain it. See various recipes
that had separate DL fields already.
Few printf config options have been setup using this new config system
along with their baremetal overrides. A follow up patch will add generation
of doc/config.rst, which will contain the full list of libc config options
and short description explaining how they affect the libc.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D159158
Re-application of the Dexter builder removal reversed due to continued
errors on the green dragon LLDB buildbot:
https://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/59716/
Cause of the error is unclear, but it looks as though there is some
unexpected non-determinism in the test failures.
This reverts commit 323270451d.
When having <N x t> d1, unused = unmerge(G_EXT <2*N x t> v1, undef, N),
it is possible to express it just as unused, d1 = unmerge v1.
It is useful for tackling regressions in arm64-vcvt_f.ll, introduced in
https://reviews.llvm.org/D144670.