This patches implements the auto keyword from the N3007 standard
specification.
This allows deducing the type of the variable like in C++:
```
auto nb = 1;
auto chr = 'A';
auto str = "String";
```
The list of statements which allows the usage of auto:
* Basic variables declarations (int, float, double, char, char*...)
* Macros declaring a variable with the auto type
The list of statements which will not work with the auto keyword:
* auto arrays
* sizeof(), alignas()
* auto parameters, auto return type
* auto as a struct/typedef member
* uninitialized auto variables
* auto in an union
* auto as a enum type specifier
* auto casts
* auto in an compound literals
Differential Revision: https://reviews.llvm.org/D133289
`BufferizableOpInterface::bufferizesToAllocation` is queried when
forming equivalence sets during bufferization. It is not really needed
for ops like `tensor.empty` which do not have tensor operands, but it
should be added for consistency.
This change should have been part of #68080. No test is added because
the return value of this function is irrelevant for ops without tensor
operands. (However, this function acts as a form documentation,
describing the bufferization semantics of the op.)
Add `dump_alias_sets` to `transform.bufferization.one_shot_bufferize`.
This option is useful for debugging. Also improve the verifier to ensure
that `test_analysis_only` is set when other debugging flags are enabled.
Add initial half/bfloat broadcast shuffles test coverage (more to follow)
Fixes#68117 - which was stuck in a loop between getting scalarized insert/extract costs for the shuffle and then trying to convert a bfloat insert into a shuffle again......
This fixes many unit tests when trying to enable IncrementalExtensions
by default for testing purposes.
Differential Revision: https://reviews.llvm.org/D158415
Add vscale range attirbute for the Scalable Vector Extension (SVE) if
provided on the command-line (options in a previous commit)
If no command-line option is provided, if the target-feature of SVE is
specified and the architecture is AArch64, it defualts to 128-2048. in
other words a vscale-min of 1, vscale-max of 16.
A pass is used to add the atribute to all functions. The vectorizer will
use this attribute to generate the SVE instruction to match the range
specified. The attribute is harmless if there is no vectorizable
operations in the function.
* Fixes#67977, a crash in `empty-tensor-elimination`.
* Also improves `linalg.copy` canonicalization.
* Also improves indentation indentation in `mlir-linalg-ods-yaml-gen.cpp`.
Currently this non-trivial calculation is repeated multiple times,
making it hard to reason about when the
`byte_offset`/`member_byte_offset` is being set or not.
This patch simply moves all those instances of the same calculation into
a helper function.
We return an optional to remain an NFC patch. Default initializing the
offset would make sense but requires further analysis and can be done in
a follow-up patch.
The bounds of a c++ array is a _constant-expression_. And in C++ it is
also a constant expression.
But we also support VLAs, ie arrays with non-constant bounds.
We need to take care to handle the case of a consteval function (which
are specified to be only immediately called in non-constant contexts)
that appear in arrays bounds.
This introduces `Sema::isAlwayConstantEvaluatedContext`, and a flag in
ExpressionEvaluationContextRecord, such that immediate functions in
array bounds are always immediately invoked.
Sema had both `isConstantEvaluatedContext` and
`isConstantEvaluated`, so I took the opportunity to cleanup that.
The change in `TimeProfilerTest.cpp` is an unfortunate manifestation of
the problem that #66203 seeks to address.
Fixes#65520
Since the getMaximisedVFForTarget function is called twice, once for fixed-width and once for scalable, it adds no value to always return a fixed-width VF. Instead, when we are tail-folding, we can use either fixed-width or scalable vectors.
When performing store to load forwarding, replacing users of the
load may turn an indirect call into one with a known callee, in
which case it might become willreturn, invalidating cached ICF
information. Avoid this by removing users.
This is a bit more aggressive than strictly necessary (e.g. this
shouldn't be necessary when doing load-load CSE), but better safe
than sorry.
Fixes https://github.com/llvm/llvm-project/issues/48805.
This new method repeatedly calls Lex() until end of file is reached
and optionally fills a std::vector of Tokens. Use it in Clang's unit
tests to avoid quite some code duplication.
Differential Revision: https://reviews.llvm.org/D158413
Long tail calls use the following instruction sequence on RISC-V:
```
1: auipc xi, %pcrel_hi(sym)
jalr zero, %pcrel_lo(1b)(xi)
```
Since the second instruction in isolation looks like an indirect branch,
this confused BOLT and most functions containing a long tail call got
marked with "unknown control flow" and didn't get optimized as a
consequence.
This patch fixes this by detecting long tail call sequence in
`analyzeIndirectBranch`. `FixRISCVCallsPass` also had to be updated to
expand long tail calls to `PseudoTAIL` instead of `PseudoCALL`.
Besides this, this patch also fixes a minor issue with compressed tail
calls (`c.jr`) not being detected.
Note that I had to change `BinaryFunction::postProcessIndirectBranches`
slightly: the documentation of `MCPlusBuilder::analyzeIndirectBranch`
mentions that the [`Begin`, `End`) range contains the instructions
immediately preceding `Instruction`. However, in
`postProcessIndirectBranches`, *all* the instructions in the BB where
passed in the range. This made it difficult to find the preceding
instruction so I made sure *only* the preceding instructions are passed.
This PR introduces a new Op called `warpgroup.mma.store` to the NVGPU
dialect of MLIR. The purpose of this operation is to facilitate storing
fragmanted result(s) `nvgpu.warpgroup.accumulator` produced by
`warpgroup.mma` to the given memref.
An example of fragmentated matrix is given here :
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#wgmma-64n16-d
The `warpgroup.mma.store` does followings:
1) Takes one or more `nvgpu.warpgroup.accumulator` type (fragmented
results matrix)
2) Calculates indexes per thread in warp-group and stores the data into
give memref.
Here's an example usage:
```
// A warpgroup performs GEMM, results in fragmented matrix
%result1, %result2 = nvgpu.warpgroup.mma ...
// Stores the fragmented result to memref
nvgpu.warpgroup.mma.store [%result1, %result2], %matrixD :
!nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>>,
!nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>>
to memref<128x128xf32,3>
```
Handle the following relocations related to TLS local-exec and
initial-exec:
- R_RISCV_TLS_GOT_HI20
- R_RISCV_TPREL_HI20
- R_RISCV_TPREL_ADD
- R_RISCV_TPREL_LO12_I
- R_RISCV_TPREL_LO12_S
In addition, GNU ld has a quirk where after TLS le relaxation, two
unofficial relocation types may be emitted:
- R_RISCV_TPREL_I
- R_RISCV_TPREL_S
Since they are unofficial (defined in the reserved range of relocation
types), LLVM does not define them. Hence, I've defined them locally in
BOLT in a private namespace.
If libc++ is available and should be used as the ubsan C++ ABI library,
the check for libc++ might fail if libc++ is a static library, as the
-nodefaultlibs flag inhibits a potential compiler default -lunwind.
Just like the -nodefaultlibs configuration tests for and manually adds a
bunch of compiler default libraries, look for -lunwind too.
This is a reland of #65912.
with an explicit parameter.
We tried to read a pointer to a non-existent `This` APValue when
constant-evaluating an explicit object lambda call operator (the `this`
pointer is never set in explicit object member functions)
Fixes#68070
This PR introduces substantial improvements to the readability and
maintainability of the `nvgpu.warpgroup.mma` Op transformation from
nvgpu->nvvm. This transformation plays a crucial role in GEMM and
manages complex operations such as generating multiple wgmma ops and
iterating their descriptors. The prior code lacked clarity, but this PR
addresses that issue effectively.
**PR does followings:**
**Introduces a helper class:** `WarpgroupGemm` class encapsulates the
necessary functionality, making the code cleaner and more
understandable.
**Detailed Documentation:** Each function within the helper class is
thoroughly documented to provide clear insights into its purpose and
functionality.
While we pretty much always want to pass DT, AC and CxtI, most
places don't care about TLI. Add an overload where this is not
one of the first parameters.
This PR moves some calculation out of `LowerCall` and into
`SystemZXPLINKFrameLowering::processFunctionBeforeFrameFinalized`.
We need to make this change because LowerCall isn't invoked for
functions that don't have function calls, and it is required for some
tooling to work correctly. A function that does not make any calls is
required to allocate 32 bytes for the parameter area required by the
ABI. However, we allocate 64 bytes because this additional space is
utilized by certain tools, like the debugger.
Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
NVGPU dialect is gaining large support for warpgroup level operations,
and their names always starts with `warpgroup....`.
This PR changes name of Op and type from `wgmma.descriptor` to
`warpgroup.descriptor` for sake of consistency.
Fix issue 67529, [clang-tidy: modernize-use-using fails when type is
implicitly forward
declared](https://github.com/llvm/llvm-project/issues/67529)
The problem is that using `Lexer` to get record declaration will lose
the type information when its original type is pointer or reference.
This patch fix this problem by skip adding the tag declaration when it's
only a 'declaration' and not a 'definition'.
Co-authored-by: huqizhi <836744285@qq.com>
Post #68263, the inline advisor printer tries to print SCC Nodes' names,
but if we perform a full pipeline (like O1), there'll be some DCE-ing
happening and the Node pointers kept in the advisor for this (printing)
purpose are dangling. Using the more eager printer post each scc inline
pass is sufficient.
This reverts commit a1e81d2ead.
Revert "Fix test hip-offload-compress-zlib.hip"
This reverts commit ba01ce6066.
Revert due to sanity fail at
https://lab.llvm.org/buildbot/#/builders/5/builds/37188https://lab.llvm.org/buildbot/#/builders/238/builds/5955
/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment
0xaaaae2d90e7c: note: pointer points here
bc 00 00 00 94 dc 29 9a 89 fb ca 2b 78 9c 8b 8f 77 f6 71 f4 73 8f f7 77 73 f3 f1 77 74 89 77 0a
^
#0 0xaaaaba125f70 in clang::CompressedOffloadBundle::decompress(llvm::MemoryBuffer const&, bool) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25
#1 0xaaaaba126150 in clang::OffloadBundler::ListBundleIDsInFile(llvm::StringRef, clang::OffloadBundlerConfig const&) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1089:7
Will reland after fixing it.
Also fix an issue with sink broadcast across elementwise where
`arith.cmpf` is elementwise, but result type is different. The result
type is not same as the operand type, creating illegal IR.
Similar issue with `vector.fma` which only accepts vector operand types,
while broadcasts can have scalar sources. Sinking broadcast across would
result in an illegal `vector.fma` (with scalar operands).
This is the floating-point analog of SimplifyDemandedBits. If we know
the edge cases are assumed impossible in uses, it's possible to prune
upstream edge case handling.
Start by only using this on returns in functions with nofpclass
returns (where I'm surprised there are no other combines), but this
can be extended to include any other nofpclass use or FPMathOperator
with flags.
Partially addresses issue #64870https://reviews.llvm.org/D158648
As part of an effort to make our codebase ready for the migration from
llvm::support::endianness to std::endian in C++20, this patch renames
llvm::support::endianness to llvm::endianness.
The intent of this patch is to make fully qualified names less
painful. That is, with this patch, we can just say
llvm::endianness::big rather than llvm::support::endianness::big.
I'm not renaming llvm::support::endianness to llvm::endian because we
have a lot of places with "using namespace support;" where it would be
ambiguous whether "endian" refers to llvm::endian or
llvm::support::endian.
This patch defines several helpers for gradual migration:
namespace llvm {
namespace support {
using endianness = llvm::endianness;
constexpr llvm::endianness big = llvm::endianness::big;
constexpr llvm::endianness little = llvm::endianness::little;
constexpr llvm::endianness native = llvm::endianness::native;
While we are at it, this patch changes the enum to "enum class". The
"enum class" prevents implicit conversions from endianness to bool.
I've fixed three such instances of implicit conversions:
95f4b2a7088de2ecc2e7a7517e12ca
Commit 9370271ec5 made HashBuilder an
alias for HashBuilderImpl:
template <class HasherT, support::endianness Endianness>
using HashBuilder = HashBuilderImpl<HasherT, Endianness>;
This patch renames HashBuilderImpl to HashBuilder while removing the
alias above.
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member in the same
structure holding the count of elements in the flexible array. This
information can be used to improve the results of the array bound sanitizer
and the '__builtin_dynamic_object_size' builtin.
This example specifies the that the flexible array member 'array' has the
number of elements allocated for it in 'count':
struct bar;
struct foo {
size_t count;
/* ... */
struct bar *array[] __attribute__((counted_by(count)));
};
This establishes a relationship between 'array' and 'count', specifically
that 'p->array' must have *at least* 'p->count' number of elements available.
It's the user's responsibility to ensure that this relationship is maintained
through changes to the structure.
In the following, the allocated array erroneously has fewer elements than
what's specified by 'p->count'. This would result in an out-of-bounds access not
not being detected:
struct foo *p;
void foo_alloc(size_t count) {
p = malloc(MAX(sizeof(struct foo),
offsetof(struct foo, array[0]) + count *
sizeof(struct bar *)));
p->count = count + 42;
}
The next example updates 'p->count', breaking the relationship requirement that
'p->array' must have at least 'p->count' number of elements available:
struct foo *p;
void foo_alloc(size_t count) {
p = malloc(MAX(sizeof(struct foo),
offsetof(struct foo, array[0]) + count *
sizeof(struct bar *)));
p->count = count + 42;
}
void use_foo(int index) {
p->count += 42;
p->array[index] = 0; /* The sanitizer cannot properly check this access */
}
Reviewed By: nickdesaulniers, aaron.ballman
Differential Revision: https://reviews.llvm.org/D148381