Commit Graph

476866 Commits

Author SHA1 Message Date
Aaron Ballman
96dd50ee83 Fix LLVM Sphinx build 2023-10-05 08:12:58 -04:00
Guillot Tony
5d78b78c85 [C2X] N3007 Type inference for object definitions
This patches implements the auto keyword from the N3007 standard
specification.
This allows deducing the type of the variable like in C++:
```
auto nb = 1;
auto chr = 'A';
auto str = "String";
```
The list of statements which allows the usage of auto:

    * Basic variables declarations (int, float, double, char, char*...)
    * Macros declaring a variable with the auto type

The list of statements which will not work with the auto keyword:

    * auto arrays
    * sizeof(), alignas()
    * auto parameters, auto return type
    * auto as a struct/typedef member
    * uninitialized auto variables
    * auto in an union
    * auto as a enum type specifier
    * auto casts
    * auto in an compound literals

Differential Revision: https://reviews.llvm.org/D133289
2023-10-05 08:11:02 -04:00
Matthias Springer
58678d3bcf
[mlir][tensor][bufferize] tensor.empty bufferizes to allocation (#68201)
`BufferizableOpInterface::bufferizesToAllocation` is queried when
forming equivalence sets during bufferization. It is not really needed
for ops like `tensor.empty` which do not have tensor operands, but it
should be added for consistency.

This change should have been part of #68080. No test is added because
the return value of this function is irrelevant for ops without tensor
operands. (However, this function acts as a form documentation,
describing the bufferization semantics of the op.)
2023-10-05 14:06:00 +02:00
Matthias Springer
5958043e2d
[mlir][bufferization] Add dump_alias_sets option to transform op (#68289)
Add `dump_alias_sets` to `transform.bufferization.one_shot_bufferize`.
This option is useful for debugging. Also improve the verifier to ensure
that `test_analysis_only` is set when other debugging flags are enabled.
2023-10-05 14:05:45 +02:00
Yingwei Zheng
33a194b158
[InstCombine] Add pre-commit tests for #67915. NFC. 2023-10-05 20:01:55 +08:00
Kohei Yamaguchi
777a6e6f10
[mlir][docs] Cleanup documentations [NFC] (#67945)
- Fix missing links
- Fix missing link format
- Move transform::ApplyFuncToLLVMConversionPatternOp into Transform
dialect
- Remove duplicated MemRef's TOC
- Remove duplicated Memref's dma_start/dma_wait docs
2023-10-05 13:33:41 +02:00
Ivan Kosarev
f04aa1f814
[AMDGPU][CodeGen] Fold immediates in src1 operands of V_MAD/MAC/FMA/FMAC. (#68002) 2023-10-05 14:22:29 +03:00
Bogdan Graur
821dfc392a Revert "[X86] Change target of __builtin_ia32_cmp[p|s][s|d] from avx into sse/sse2 (#67410)"
Does not respect `__attribute__((target("avx"))`.

This reverts commit ccd5b8db48.
2023-10-05 10:33:44 +00:00
Simon Pilgrim
baecc9e997 [CostModel][X86] getShuffleCost - add fallback (to half vector) for bfloat vector shuffle costs
Add initial half/bfloat broadcast shuffles test coverage (more to follow)

Fixes #68117 - which was stuck in a loop between getting scalarized insert/extract costs for the shuffle and then trying to convert a bfloat insert into a shuffle again......
2023-10-05 11:12:40 +01:00
Jonas Hahnfeld
abb9eb2778 [Lex] Handle repl_input_end in Preprocessor::LexTokensUntilEOF()
This fixes many unit tests when trying to enable IncrementalExtensions
by default for testing purposes.

Differential Revision: https://reviews.llvm.org/D158415
2023-10-05 12:09:14 +02:00
Mats Petersson
6180964a01
[flang]Pass to add vscale range attribute (#68103)
Add vscale range attirbute for the Scalable Vector Extension (SVE) if
provided on the command-line (options in a previous commit)

If no command-line option is provided, if the target-feature of SVE is
specified and the architecture is AArch64, it defualts to 128-2048. in
other words a vscale-min of 1, vscale-max of 16.

A pass is used to add the atribute to all functions. The vectorizer will
use this attribute to generate the SVE instruction to match the range
specified. The attribute is harmless if there is no vectorizable
operations in the function.
2023-10-05 11:06:00 +01:00
long.chen
5979e1dfb1
[mlir] Fix empty-tensor-elimination around self-copies (#68129)
* Fixes #67977, a crash in `empty-tensor-elimination`.
* Also improves `linalg.copy` canonicalization.
* Also improves indentation indentation in `mlir-linalg-ods-yaml-gen.cpp`.
2023-10-05 12:04:20 +02:00
Michael Buch
3a35ca01fc
[lldb][DWARFASTParserClang][NFCI] Extract DW_AT_data_member_location calculation logic (#68231)
Currently this non-trivial calculation is repeated multiple times,
making it hard to reason about when the
`byte_offset`/`member_byte_offset` is being set or not.

This patch simply moves all those instances of the same calculation into
a helper function.

We return an optional to remain an NFC patch. Default initializing the
offset would make sense but requires further analysis and can be done in
a follow-up patch.
2023-10-05 10:49:42 +01:00
cor3ntin
c72d3a0966
[Clang] Handle consteval expression in array bounds expressions (#66222)
The bounds of a c++ array is a _constant-expression_. And in C++ it is
also a constant expression.

But we also support VLAs, ie arrays with non-constant bounds.

We need to take care to handle the case of a consteval function (which
are specified to be only immediately called in non-constant contexts)
that appear in arrays bounds.

This introduces `Sema::isAlwayConstantEvaluatedContext`, and a flag in
ExpressionEvaluationContextRecord, such that immediate functions in
array bounds are always immediately invoked.

Sema had both `isConstantEvaluatedContext` and
`isConstantEvaluated`, so I took the opportunity to cleanup that.

The change in `TimeProfilerTest.cpp` is an unfortunate manifestation of
the problem that #66203 seeks to address.

Fixes #65520
2023-10-05 11:36:27 +02:00
Christian Sigg
c64a098ee4
[GVN] Fix after 46aac949bc
replaceUsersOf -> removeUsersOf
2023-10-05 11:31:35 +02:00
tdanyluk
a608830807
[mlir] Speed up FuncToLLVM using a SymbolTable (#68082)
We have a project where this saves 23% of the compilation time.

This means using hashmaps instead of searching in linked lists.
2023-10-05 11:24:52 +02:00
Rin
d3e4702c0f
[AArch64] [LoopVectorize] Use either fixed-width or scalable VF when tail-folding (#67543)
Since the getMaximisedVFForTarget function is called twice, once for fixed-width and once for scalable, it adds no value to always return a fixed-width VF. Instead, when we are tail-folding, we can use either fixed-width or scalable vectors.
2023-10-05 10:24:30 +01:00
Nikita Popov
46aac949bc [GVN] Remove users from ICF when RAUWing loads
When performing store to load forwarding, replacing users of the
load may turn an indirect call into one with a known callee, in
which case it might become willreturn, invalidating cached ICF
information. Avoid this by removing users.

This is a bit more aggressive than strictly necessary (e.g. this
shouldn't be necessary when doing load-load CSE), but better safe
than sorry.

Fixes https://github.com/llvm/llvm-project/issues/48805.
2023-10-05 11:21:33 +02:00
Christian Sigg
59e75b7df2
[mlir][bazel] Sort targets list. 2023-10-05 11:14:12 +02:00
Christian Sigg
2f1c78014f
[mlir][bazel] Fix after d20fbc9007 2023-10-05 11:12:55 +02:00
Guray Ozen
29b33e8397 [bazel] fix typo 2023-10-05 11:08:46 +02:00
Jonas Hahnfeld
3116d60494 [Lex] Introduce Preprocessor::LexTokensUntilEOF()
This new method repeatedly calls Lex() until end of file is reached
and optionally fills a std::vector of Tokens. Use it in Clang's unit
tests to avoid quite some code duplication.

Differential Revision: https://reviews.llvm.org/D158413
2023-10-05 11:04:07 +02:00
Job Noorman
7fa33773e3
[BOLT][RISCV] Handle long tail calls (#67098)
Long tail calls use the following instruction sequence on RISC-V:

```
1: auipc xi, %pcrel_hi(sym)
jalr zero, %pcrel_lo(1b)(xi)
```

Since the second instruction in isolation looks like an indirect branch,
this confused BOLT and most functions containing a long tail call got
marked with "unknown control flow" and didn't get optimized as a
consequence.

This patch fixes this by detecting long tail call sequence in
`analyzeIndirectBranch`. `FixRISCVCallsPass` also had to be updated to
expand long tail calls to `PseudoTAIL` instead of `PseudoCALL`.

Besides this, this patch also fixes a minor issue with compressed tail
calls (`c.jr`) not being detected.

Note that I had to change `BinaryFunction::postProcessIndirectBranches`
slightly: the documentation of `MCPlusBuilder::analyzeIndirectBranch`
mentions that the [`Begin`, `End`) range contains the instructions
immediately preceding `Instruction`. However, in
`postProcessIndirectBranches`, *all* the instructions in the BB where
passed in the range. This made it difficult to find the preceding
instruction so I made sure *only* the preceding instructions are passed.
2023-10-05 08:55:30 +00:00
Guray Ozen
d20fbc9007
[MLIR][NVGPU] Introduce nvgpu.wargroup.mma.store Op for Hopper GPUs (#65441)
This PR introduces a new Op called `warpgroup.mma.store` to the NVGPU
dialect of MLIR. The purpose of this operation is to facilitate storing
fragmanted result(s) `nvgpu.warpgroup.accumulator` produced by
`warpgroup.mma` to the given memref.

An example of fragmentated matrix is given here :

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#wgmma-64n16-d

The `warpgroup.mma.store` does followings:
1) Takes one or more `nvgpu.warpgroup.accumulator` type (fragmented
results matrix)
2) Calculates indexes per thread in warp-group and stores the data into
give memref.

Here's an example usage:
```
// A warpgroup performs GEMM, results in fragmented matrix
%result1, %result2 = nvgpu.warpgroup.mma ...

// Stores the fragmented result to memref
nvgpu.warpgroup.mma.store [%result1, %result2], %matrixD : 
    !nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>>,
    !nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>> 
    to memref<128x128xf32,3>
```
2023-10-05 10:54:13 +02:00
Job Noorman
c7d6d62252
[BOLT][RISCV] Implement TLS le/ie relocations (#67112)
Handle the following relocations related to TLS local-exec and
initial-exec:
- R_RISCV_TLS_GOT_HI20
- R_RISCV_TPREL_HI20
- R_RISCV_TPREL_ADD
- R_RISCV_TPREL_LO12_I
- R_RISCV_TPREL_LO12_S

In addition, GNU ld has a quirk where after TLS le relaxation, two
unofficial relocation types may be emitted:
- R_RISCV_TPREL_I
- R_RISCV_TPREL_S

Since they are unofficial (defined in the reserved range of relocation
types), LLVM does not define them. Hence, I've defined them locally in
BOLT in a private namespace.
2023-10-05 08:53:51 +00:00
Martin Storsjö
7c5e4e5fa3
Reapply [compiler-rt] Check for and use -lunwind when linking with -nodefaultlibs (#66584)
If libc++ is available and should be used as the ubsan C++ ABI library,
the check for libc++ might fail if libc++ is a static library, as the
-nodefaultlibs flag inhibits a potential compiler default -lunwind.

Just like the -nodefaultlibs configuration tests for and manually adds a
bunch of compiler default libraries, look for -lunwind too.

This is a reland of #65912.
2023-10-05 11:41:11 +03:00
Jonas Hahnfeld
26bb22b0c8 Revert "InstCombine: Introduce SimplifyDemandedUseFPClass"
It causes a test failure of clang/test/Headers/__clang_hip_math.hip:
https://lab.llvm.org/buildbot/#/builders/109/builds/75022

This reverts commit 59c6e2e9c1.
2023-10-05 10:26:10 +02:00
Owen Pan
8902f12e61 [clang-format][doc] Update the Linux kernel coding style URL 2023-10-05 01:18:49 -07:00
cor3ntin
49666ec038
[Clang] Fix constant evaluating a captured variable in a lambda (#68090)
with an explicit parameter.

We tried to read a pointer to a non-existent `This` APValue when
constant-evaluating an explicit object lambda call operator (the `this`
pointer is never set in explicit object member functions)

Fixes #68070
2023-10-05 10:17:50 +02:00
Guray Ozen
b74cfc139a
[mlir][nvgpu] Improve nvgpu->nvvm transformation of warpgroup.mma Op (NFC) (#67325)
This PR introduces substantial improvements to the readability and
maintainability of the `nvgpu.warpgroup.mma` Op transformation from
nvgpu->nvvm. This transformation plays a crucial role in GEMM and
manages complex operations such as generating multiple wgmma ops and
iterating their descriptors. The prior code lacked clarity, but this PR
addresses that issue effectively.

**PR does followings:**
**Introduces a helper class:** `WarpgroupGemm` class encapsulates the
necessary functionality, making the code cleaner and more
understandable.

**Detailed Documentation:** Each function within the helper class is
thoroughly documented to provide clear insights into its purpose and
functionality.
2023-10-05 10:16:59 +02:00
Guray Ozen
7eb2b99f16
[mlir] Change the class name of the GenerateWarpgroupDescriptor (#68286) 2023-10-05 10:15:40 +02:00
Nikita Popov
c263639134 [InstSimplify] Add missing const qualifier (NFC)
The context instruction is a "const Instruction *", so that's
what getWithInstruction() should accept.
2023-10-05 10:05:16 +02:00
Nikita Popov
ba149f6e09 [ValueTracking] Add SimplifyQuery ctor without TLI (NFC)
While we pretty much always want to pass DT, AC and CxtI, most
places don't care about TLI. Add an overload where this is not
one of the first parameters.
2023-10-05 09:55:00 +02:00
Timm Bäder
57147bb253 [clang][Interp] Support LambdaThisCaptures
Differential Revision: https://reviews.llvm.org/D154262
2023-10-05 09:46:15 +02:00
Nicolas Vasilache
cc2d9515d0 [mlir][Transform] NFC - Fix missing field in copy constructor 2023-10-05 07:40:35 +00:00
Timm Bäder
4d7f4a7c82 [clang][Interp] Only lazily visit constant globals
Differential Revision: https://reviews.llvm.org/D158516
2023-10-05 09:37:37 +02:00
Yusra Syeda
5c4d35d8cf
[SystemZ][z/OS] Update lowerCall (#68259)
This PR moves some calculation out of `LowerCall` and into
`SystemZXPLINKFrameLowering::processFunctionBeforeFrameFinalized`.
We need to make this change because LowerCall isn't invoked for
functions that don't have function calls, and it is required for some
tooling to work correctly. A function that does not make any calls is
required to allocate 32 bytes for the parameter area required by the
ABI. However, we allocate 64 bytes because this additional space is
utilized by certain tools, like the debugger.

Co-authored-by: Yusra Syeda <yusra.syeda@ibm.com>
2023-10-05 10:32:57 +03:00
Nikita Popov
941c75a530 [ValueTracking] Return ConstantRange instead of setting limits (NFC)
Same as previously done for intrinsics.
2023-10-05 09:24:20 +02:00
Guray Ozen
6dc7717bca
[MLIR][NVGPU] Change name wgmma.descriptor to warpgroup.descriptor (NFC) (#67526)
NVGPU dialect is gaining large support for warpgroup level operations,
and their names always starts with `warpgroup....`.

This PR changes name of Op and type from `wgmma.descriptor` to
`warpgroup.descriptor` for sake of consistency.
2023-10-05 09:01:48 +02:00
Timm Baeder
5ef904b5da
[clang][ExprConst] Don't try to evaluate value-dependent DeclRefExprs (#67778)
The Expression here migth be value dependent, which makes us run into an
assertion later on. Just bail out early.

Fixes #67690
2023-10-05 08:42:34 +02:00
Qizhi Hu
eef35c287e
[clang-tidy]: Add TagDecl into LastTagDeclRanges in UseUsingCheck only when it is a definition (#67639)
Fix issue 67529, [clang-tidy: modernize-use-using fails when type is
implicitly forward
declared](https://github.com/llvm/llvm-project/issues/67529)
The problem is that using `Lexer` to get record declaration will lose
the type information when its original type is pointer or reference.
This patch fix this problem by skip adding the tag declaration when it's
only a 'declaration' and not a 'definition'.

Co-authored-by: huqizhi <836744285@qq.com>
2023-10-05 13:49:21 +08:00
Mircea Trofin
a4765c6a02 [mlgo] Fix state-tracking-coro.ll test
Post #68263, the inline advisor printer tries to print SCC Nodes' names,
but if we perform a full pipeline (like O1), there'll be some DCE-ing
happening and the Node pointers kept in the advisor for this (printing)
purpose are dangling. Using the more eager printer post each scc inline
pass is sufficient.
2023-10-04 22:07:44 -07:00
Yaxun (Sam) Liu
c6ed5a6125 Revert "[HIP] Support compressing device binary (#67162)"
This reverts commit a1e81d2ead.

Revert "Fix test hip-offload-compress-zlib.hip"

This reverts commit ba01ce6066.

Revert due to sanity fail at

https://lab.llvm.org/buildbot/#/builders/5/builds/37188

https://lab.llvm.org/buildbot/#/builders/238/builds/5955

/b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error: load of misaligned address 0xaaaae2d90e7c for type 'const uint64_t' (aka 'const unsigned long'), which requires 8 byte alignment
0xaaaae2d90e7c: note: pointer points here
  bc 00 00 00 94 dc 29 9a  89 fb ca 2b 78 9c 8b 8f  77 f6 71 f4 73 8f f7 77  73 f3 f1 77 74 89 77 0a
              ^
    #0 0xaaaaba125f70 in clang::CompressedOffloadBundle::decompress(llvm::MemoryBuffer const&, bool) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1012:25
    #1 0xaaaaba126150 in clang::OffloadBundler::ListBundleIDsInFile(llvm::StringRef, clang::OffloadBundlerConfig const&) /b/sanitizer-aarch64-linux-bootstrap-ubsan/build/llvm-project/clang/lib/Driver/OffloadBundler.cpp:1089:7

Will reland after fixing it.
2023-10-05 00:29:42 -04:00
MaheshRavishankar
f28f09dcf0
[mlir][Vector] Add Broadcast -> CastOp reordering to SinkVectorBroadcasting patterns. (#68257)
Also fix an issue with sink broadcast across elementwise where
`arith.cmpf` is elementwise, but result type is different. The result
type is not same as the operand type, creating illegal IR.
Similar issue with `vector.fma` which only accepts vector operand types,
while broadcasts can have scalar sources. Sinking broadcast across would
result in an illegal `vector.fma` (with scalar operands).
2023-10-04 21:27:24 -07:00
Mircea Trofin
1b3fc40586
[mlgo][coro] Assign coro split-ed functions a FunctionLevel (#68263) 2023-10-04 21:20:00 -07:00
Matt Arsenault
59c6e2e9c1 InstCombine: Introduce SimplifyDemandedUseFPClass
This is the floating-point analog of SimplifyDemandedBits. If we know
the edge cases are assumed impossible in uses, it's possible to prune
upstream edge case handling.

Start by only using this on returns in functions with nofpclass
returns (where I'm surprised there are no other combines), but this
can be extended to include any other nofpclass use or FPMathOperator
with flags.

Partially addresses issue #64870

https://reviews.llvm.org/D158648
2023-10-04 21:06:24 -07:00
Kazu Hirata
bbdbcd83e6
[Support] Rename llvm::support::endianness to llvm::endianness (#68174)
As part of an effort to make our codebase ready for the migration from
llvm::support::endianness to std::endian in C++20, this patch renames
llvm::support::endianness to llvm::endianness.

The intent of this patch is to make fully qualified names less
painful.  That is, with this patch, we can just say
llvm::endianness::big rather than llvm::support::endianness::big.

I'm not renaming llvm::support::endianness to llvm::endian because we
have a lot of places with "using namespace support;" where it would be
ambiguous whether "endian" refers to llvm::endian or
llvm::support::endian.

This patch defines several helpers for gradual migration:

  namespace llvm {
  namespace support {
  using endianness = llvm::endianness;
  constexpr llvm::endianness big = llvm::endianness::big;
  constexpr llvm::endianness little = llvm::endianness::little;
  constexpr llvm::endianness native = llvm::endianness::native;

While we are at it, this patch changes the enum to "enum class".  The
"enum class" prevents implicit conversions from endianness to bool.
I've fixed three such instances of implicit conversions:

  95f4b2a708
  8de2ecc2e7
  a7517e12ca
2023-10-04 20:34:02 -07:00
Kazu Hirata
f37028c2cc
[Support] Rename HashBuilderImpl to HashBuilder (NFC) (#68173)
Commit 9370271ec5 made HashBuilder an
alias for HashBuilderImpl:

  template <class HasherT, support::endianness Endianness>
  using HashBuilder = HashBuilderImpl<HasherT, Endianness>;

This patch renames HashBuilderImpl to HashBuilder while removing the
alias above.
2023-10-04 20:33:38 -07:00
Maksim Levental
6f44f87011
[mlir][python] Enable py312. (#68009)
Python 3.12 has been released so why not support it.
2023-10-04 20:35:24 -05:00
Bill Wendling
9a954c6935 [Clang] Implement the 'counted_by' attribute
The 'counted_by' attribute is used on flexible array members. The
argument for the attribute is the name of the field member in the same
structure holding the count of elements in the flexible array. This
information can be used to improve the results of the array bound sanitizer
and the '__builtin_dynamic_object_size' builtin.

This example specifies the that the flexible array member 'array' has the
number of elements allocated for it in 'count':

  struct bar;
  struct foo {
    size_t count;
     /* ... */
    struct bar *array[] __attribute__((counted_by(count)));
  };

This establishes a relationship between 'array' and 'count', specifically
that 'p->array' must have *at least* 'p->count' number of elements available.
It's the user's responsibility to ensure that this relationship is maintained
through changes to the structure.

In the following, the allocated array erroneously has fewer elements than
what's specified by 'p->count'. This would result in an out-of-bounds access not
not being detected:

  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }

The next example updates 'p->count', breaking the relationship requirement that
'p->array' must have at least 'p->count' number of elements available:

  struct foo *p;

  void foo_alloc(size_t count) {
    p = malloc(MAX(sizeof(struct foo),
                   offsetof(struct foo, array[0]) + count *
                       sizeof(struct bar *)));
    p->count = count + 42;
  }

  void use_foo(int index) {
    p->count += 42;
    p->array[index] = 0; /* The sanitizer cannot properly check this access */
  }

Reviewed By: nickdesaulniers, aaron.ballman

Differential Revision: https://reviews.llvm.org/D148381
2023-10-04 18:26:15 -07:00