This removes a comment added in D159312, which warned people to not
re-add a whitespace in the `((void*)0))` expression. After discussions
happened in D159312, it doesn't seem like a permanent solution.
While I'd like to keep the whitespace removed for now, given that at
least it can be a band-aid to some users who use musl and clang's
`stddef.h` at the same time, it seems the usage of them together is not
something that's officially supported, and I should not be implying this
should be the permanent solution by saying so in the comments.
Reviewed By: aaron.ballman, ributzka
Differential Revision: https://reviews.llvm.org/D159383
CurLexer is dereferenced and should not be null in clang::Preprocessor::SkipExcludedConditionalBlock(clang::SourceLocation, clang::SourceLocation, bool, bool, clang::SourceLocation)
This patch adds an assert for NULL value check of pointer CurLexer and splits up all predicates so that, when/if a failure occurs, we'll be able to tell which predicate failed.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D158293
Some error logging in llvm-exegesis under the subprocess executor just
prints a generic failure information rather than any details about the
error as we omit printing the string version of errno. This patch adds
in printing errno at all relevant points in the subprocess executor that
were previously missed.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D157682
When using the subprocess execution mode and dummy perf counters
currently, cryptic errors will be printed out related to not being able
to send the file descriptor through a socket. This patch prints out an
explicit message that this configuration isn't (currently) supported.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D157686
Tensor pack operations are optimistically lowered to pad + insert_slice
when the pack operation only pads the input tensor. The existing
lowering emits insert_slice operations which do not meet the
rank-reducibility requirements of insert_slice.
This change updates the logic in linalg::lowerPack to first check the
rank-reducibility requirement. When the requirement is not met, the
lowering will emit the full sequence of pad + expand + transpose.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D159382
The primary motivation is to we have a simple mechanism to extract
values from attributes in folders and canon patterns without having to
re-fold constants or write nested conditions over attribute types.
Matching over attributes composes especially well with fold adaptors.
Update folds in Arith and SPIRV dialects to match over attributes, where
applicable.
Reviewed By: mehdi_amini, zero9178
Differential Revision: https://reviews.llvm.org/D159437
Support environments where logical types do not necessarily correspond to allowed storage access types.
Also make pattern match failures more descriptive.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D159386
As reported by @uweigand in https://reviews.llvm.org/D158883:
```
The newly added test cases in ffp-model.c fail on SystemZ, making CI red:
https://lab.llvm.org/buildbot/#/builders/94/builds/16280
The root cause seems to be that by default, the SystemZ back-end targets
a machine without SIMD support, and therefore vector return types are
passed via implicit reference according to the ABI
```
this uses manual stores instead of vector returns.
Fixing the clang-format crash with the segmentation fault error when
formatting code with nested namespaces.
Fixes#64701.
Differential Revision: https://reviews.llvm.org/D158363
When a handle to an error_category singleton object is used during the
termination phase of a program, the destruction of the error_category
object may have occurred prior to execution of the current destructor
or function registered with atexit, because the singleton object may
have been constructed after the corresponding initialization or call
to atexit. For example, the updated tests from this patch will fail if
using a libc++ built using a compiler that updates the vtable of the
object on destruction.
This patch attempts to avoid the issue by causing the destructor to not
be called in the style of ResourceInitHelper in src/experimental/memory_resource.cpp.
This approach might not work if object lifetime is strictly enforced.
Differential Revision: https://reviews.llvm.org/D65667
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
WriteVFWRedOV_From and WriteVFWRedV_From SchedWrite classes exist
already, but no pseudos were using the ordered SchedWrite. This change
makes it so that the VFWREDOSUM pseudo used the ordered VFW SchedWrite.
With HLFIR the lbounds for the ALLOCATABLE result are taken from the
mutable box created for the result, so the non-default lbounds might be
propagated further causing incorrect result, e.g.:
```
program p
real, allocatable :: p5(:)
allocate(p5, source=real_init())
print *, lbound(p5, 1) ! must print 1, but prints 7
contains
function real_init()
real, allocatable :: real_init(:)
allocate(real_init(7:8))
end function real_init
end program p
```
With FIR lowering the box passed for `source` has explicit lower bound 1
at the call site, but the runtime box initialized by `real_init` call
still has lower bound 7. I am not sure if the runtime box initialized by
`real_init` will ever be accessed in a debugger via Fortran variable
names, but I think that having the right runtime bounds that can be
accessible via examining registers/stack might be good in general. So I
decided to update the runtime bounds at the point of return.
This change fixes the test above for HLFIR.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D156187
Compiling with TLS variables requires -pthread, but if the user omits this
option, the compiler will not show any obvious indication during compilation
that -pthread is needed for programs using TLS variables. Instead, the user will
experience a segmentation fault when running programs with TLS variables in them
and without specifying -pthread.
This patch aims to generate .extern/.ref references to __tls_get_addr[DS] for
local-exec accesses, in order to trigger an error from the linker to indicate
that there is an undefined symbol to __tls_get_addr. Doing so will remind the
user to compile/link with -pthread.
Differential Revision: https://reviews.llvm.org/D151335
Support printing of symbols for MachO of `MH_FILESET` type.
This is achieved by extending `dumpSymbolNamesFromObject`
to encompass fileset handling, and including an offset in
`MachOObjectFile` class to locate embedded MachO headers.
Differential Revision: https://reviews.llvm.org/D159294
The width of the APSInt values should be the width of an element.
getCharByteWidth returns the size of an element in _host_ bytes, which
makes the width N times greater, where N is the ratio between target's
CHAR_BIT and host's CHAR_BIT.
This is NFC for in-tree targets because all of them have CHAR_BIT == 8.
Reviewed By: cor3ntin, aaron.ballman
Differential Revision: https://reviews.llvm.org/D154773
The vendors of the MSVC STL, libstdc++ and libc++ have agreed [1] to
make the C++23 modules std and std.compat available in C++20. This
provides the std module; libc++ has not implemented the std.compat
module yet.
[1] https://github.com/microsoft/STL/issues/3945
Depends on D158357
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D158358
Each vslide1down operation is linear in LMUL on common hardware. (For instance, the sifive-x280 cost model models slides this way.) If we do a VL unique inserts, each with a cost linear in LMUL, the overall cost is O(VL*LMUL). Since VL is a linear function of LMUL, this means the current lowering is quadradic in both LMUL and VL. To avoid the degenerate case, fallback to the stack if the cost is more than a fixed (linear) threshold.
For context, here's the sifive-x280 llvm-mca results for the current lowering and stack based lowering for each LMUL (using e64). Assumes code was compiled for V (i.e. zvl128b).
buildvector_m1_via_stack.mca:Total Cycles: 1904
buildvector_m2_via_stack.mca:Total Cycles: 2104
buildvector_m4_via_stack.mca:Total Cycles: 2504
buildvector_m8_via_stack.mca:Total Cycles: 3304
buildvector_m1_via_vslide1down.mca:Total Cycles: 804
buildvector_m2_via_vslide1down.mca:Total Cycles: 1604
buildvector_m4_via_vslide1down.mca:Total Cycles: 6400
buildvector_m8_via_vslide1down.mca:Total Cycles: 25599
There are other schemes we could use to cap the cost. The next best is recursive decomposition of the vector into smaller LMULs. That's still quadratic, but with a better constant. However, stack based seems to cost better on all LMULs, so we can just go with the simpler scheme.
Arguably, this patch is fixing a regression introduced with my D149667 as before that change, we'd always fallback to the stack, and thus didn't have the non-linearity.
Differential Revision: https://reviews.llvm.org/D159332
Introduce a simple conversion of a memref.alloc/dealloc pair into an
alloca in the same scope. Expose it as a transform op and a pattern.
Allocas typically lower to stack allocations as opposed to alloc/dealloc
that lower to significantly more expensive malloc/free calls. In
addition, this can be combined with allocation hoisting from loops to
further improve performance.
Some flags named "IsLTO" and "IsThinLTO" implied they described
compilation modes, but with Unified LTO this is no longer true. Rename
these to "PrepForXXX" to be less confusing to readers. Also, deleted
"IsThinOrUnifiedLTO" because Unified implies PrepareForThinLTO.
The current generated ClangCommandLineReference.rst unconditionally
escapes underscores. This leads odd output on the website
https://clang.llvm.org/docs/ClangCommandLineReference.html
For example
-fchar8\_t, -fno-char8\_t
Whether an underscore should be escaped depends on the state. Currently
the escape routine doesn't keep track of the state and currently
underscores are not used in places where they need to be escaped.
Therefore remove the underscore from the list of escaped characters.