Commit Graph

486862 Commits

Author SHA1 Message Date
madanial0
fe4d502524
[flang] fix unsafe memory access using mlir::ValueRange (#78435)
When running the `flang/test/HLFIR/simplify-hlfir-intrinsics.fir` test
case on AIX we encounter issues building op as they are not found in the
mlir context:
```
LLVM ERROR: Building op `arith.subi` but it isn't known in this MLIRContext: the dialect may not be loaded or this operation hasn't been added by the dialect. See also https://mlir.llvm.org/getting_started/Faq/#registered-loaded-dependent-whats-up-with-dialects-management
LLVM ERROR: Building op `hlfir.yield_element` but it isn't known in this MLIRContext: the dialect may not be loaded or this operation hasn't been added by the dialect. See also https://mlir.llvm.org/getting_started/Faq/#registered-loaded-dependent-whats-up-with-dialects-management
LLVM ERROR: Building op `hlfir.yield_element` but it isn't known in this MLIRContext: the dialect may not be loaded or this operation hasn't been added by the dialect. See also https://mlir.llvm.org/getting_started/Faq/#registered-loaded-dependent-whats-up-with-dialects-management
```
The issue is caused by the "Merge disjoint stack slots" pass and the
error is not present if the source is built with `-mllvm
--no-stack-coloring`

Thanks to investigation by @stefanp-ibm we found that "the
initializer_list {inputIndices[1], inputIndices[0]} has a lifetime that
only exists for the range of the constructor for ValueRange. Once we get
to stack coloring we merge the stack slot for that element with another
stack slot and then it gets overwritten which corrupts
transposedIndices"

The changes below prevents the corruption of transposedIndices and
passes the test case.

Co-authored-by: Mark Danial <mark.danial@ibm.com>
2024-01-18 10:17:53 -05:00
Andrei Golubev
296a6842d1
[formatv][FmtAlign] Use fill count of type size_t instead of uint32_t (#78459)
FmtAlign::fill() accepts a uint32_t variable while the usages operate on
size_t values. On some platform / compiler combinations, this ends up
being a narrowing conversion. Fix this by changing the function's signature.
This was first seen on MSVC x86.

Co-authored-by: Orest Chura <orest.chura@intel.com>
2024-01-18 10:16:11 -05:00
Simon Pilgrim
33287e35f2
[X86] Emit verbose (constant) comments before EVEX compression tag (#78585)
This helps ensure the encoding details are next to the EVEX tag

Noticed while preparing to add more constant commenting as part of #73783 and #71078
2024-01-18 15:13:42 +00:00
Jannik Silvanus
bd2430b421
[IR] Allow type change in ValueAsMetadata::handleRAUW (#76969)
`ValueAsMetadata::handleRAUW` is a mechanism to replace all metadata
referring to one value by a different value.

Relax an assert that used to enforce the old and new value to have the
same type.
This seems to be a sanity plausibility assert only, as the
implementation actually supports mismatching types.

This is motivated by a downstream mechanism where we use poison
ValueAsMetadata values to annotate pointee types of opaque pointer
function arguments.

When replacing one type with a different one to work around DXIL vs LLVM
incompatibilities, we need to update type annotations, and handleRAUW is
more efficient than creating new MD nodes.
2024-01-18 16:01:23 +01:00
stephenpeckham
a7f9e92d07
Fix typo (#78587) 2024-01-18 08:57:10 -06:00
Luke Lau
9d6e189ee8 [RISCV] Use regexp to check negative extensions in test. NFC
Everytime an extension is added, this test will need to have the negative
extension appended to multiple CHECK lines where we're overriding the arch.
This is quite time consuming since it needs to be in the right order, so this
replaces the explicit list of negative extensions with a regexp instead.
2024-01-18 21:47:06 +07:00
Dominik Adamski
8930c5a4be
[NFC][OpenMP] Fix typo in CHECK line (#78586)
Typo in test: openmp/libomptarget/test/offloading/fortran/basic-target-parallel-do.f90
2024-01-18 15:40:15 +01:00
Quinn Dawkins
5caab8bbc0
[mlir][transform] Add transform.get_operand op (#78397)
Similar to `transform.get_result`, except it returns a handle to the
operand indicated by a positional specification, same as is defined for
the linalg match ops.

Additionally updates `get_result` to take the same positional specification.
This makes the use case of wanting to get all of the results of an
operation easier by no longer requiring the user to reconstruct the list
of results one-by-one.
2024-01-18 09:33:14 -05:00
cor3ntin
e90e43fb9c
[Clang][NFC] Rename CXXMethodDecl::isPure -> is VirtualPure (#78463)
To avoid any possible confusion with the notion of pure function and the
gnu::pure attribute.
2024-01-18 15:30:58 +01:00
Dominik Adamski
d87a53a960
[NFC][OpenMP][Flang] Add test for OpenMP target parallel do (#77776)
Added test which proves that end-to-end compilation of `omp target
parallel do` costruct is successful for Flang compiler.
2024-01-18 15:26:39 +01:00
Leandro Lupori
07abde2717
[flang][driver] Fix Driver/isysroot.f90 test (#78478)
Check for DEFAULT_SYSROOT, because when it is set -isysroot has no
effect.
2024-01-18 11:21:57 -03:00
Timm Baeder
30d458626d
[clang][Interp] Fix diagnosing non-const variables pre-C++11 (#76718)
In CheckConstant(), consider that in C++98 const variables may not be read at all, and diagnose that accordingly.
2024-01-18 15:15:05 +01:00
Piotr Sobczak
57f6a3f7ea
[AMDGPU] Add global_load_tr for GFX12 (#77772)
Support new amdgcn_global_load_tr instructions for load with transpose.

* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic int_amdgcn_global_load_tr
* Clang builtins amdgcn_global_load_tr*
2024-01-18 15:14:42 +01:00
Vassil Vassilev
1566f1ffc6
[clang-repl] Add a interpreter-specific overload of operator new for C++ (#76218)
This patch brings back the basic support for C by inserting the required
for value printing runtime only when we are in C++ mode. Additionally,
it defines a new overload of operator placement new because we can't
really forward declare it in a library-agnostic way.

Fixes the issue described in llvm/llvm-project#69072.
2024-01-18 16:06:04 +02:00
Guillaume Chatelet
e6a6a90fe7
[libc][NFC] Use the Sign type for DyadicFloat (#78577) 2024-01-18 15:03:35 +01:00
Sergio Afonso
0c76865da9
[Flang][OpenMP][Lower] NFC: Combine two calls to ClauseProcessor::processTODO (#78451)
Just a minimal readability improvement that we overlooked during
refactoring.
2024-01-18 14:01:08 +00:00
Jay Foad
745b193260 [AMDGPU] Regenerate tests for #77892 after #77438 2024-01-18 13:50:59 +00:00
Krzysztof Parzyszek
e5a34f9226
[Flang][OpenMP] Push genEval closer to leaf lowering functions (#77760)
This moves the lowering of the nested evaluations all the way to the
bottom of the call stack. This PR does not attempt to change the leaf
lowering functions beyond placing the call to `genEval` in there.
Whether the nested evaluations should be lowered for any given op
depends on the context in which that op is created, hence a `genNested`
parameter was added.

Contexts in which nested evaluations should not be lowered are during
lowering of composite constructs, such as PARALLEL SECTIONS. This
particular case is considered a block construct tied to the SECTIONS
directive, and the lowering code will first create an empty parallel op,
and then recursively lower the SECTIONS code. Similar situations occur
when lowering most (if not all) compound/composite constructs.

Recursive lowering [4/5]
2024-01-18 07:47:35 -06:00
Jay Foad
0a3a0ea591
[AMDGPU] Update uses of new VOP2 pseudos for GFX12 (#78155)
New pseudos were added for instructions that were natively VOP3 on
GFX11: V_ADD_F64_pseudo, V_MUL_F64_pseudo, V_MIN_NUM_F64, V_MAX_NUM_F64,
V_LSHLREV_B64_pseudo

---------

Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
2024-01-18 13:26:13 +00:00
Vlad Serebrennikov
f4fbbebb5e
[clang] Add test for CWG1807 (#77637)
The test checks that objects in arrays are destructed in reverse order during stack unwinding.

This patch is trying to establish a precedent how codegen tests for C++ defect report test suite should be written. Refer to PR for exact reasoning.
2024-01-18 17:14:25 +04:00
Mariusz Sikora
3e6589f21c
[AMDGPU][GFX12] Add 16 bit atomic fadd instructions (#75917)
- image_atomic_pk_add_f16
- image_atomic_pk_add_bf16
- ds_pk_add_bf16
- ds_pk_add_f16
- ds_pk_add_rtn_bf16
- ds_pk_add_rtn_f16
- flat_atomic_pk_add_f16
- flat_atomic_pk_add_bf16
- global_atomic_pk_add_f16
- global_atomic_pk_add_bf16
- buffer_atomic_pk_add_f16
- buffer_atomic_pk_add_bf16
2024-01-18 14:01:09 +01:00
Mariusz Sikora
28b7e498b6
AMDGPU/GFX12: Add new dot4 fp8/bf8 instructions (#77892)
Endoding is VOP3P. Tagged as deep/machine learning instructions. i32
type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and
src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32
fneg(neg_lo[2]) and f32 fabs(neg_hi[2]).

---------

Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
2024-01-18 14:00:27 +01:00
Timm Baeder
18d0a7e4c0
[clang][Interp] Implement ComplexToReal casts (#77294)
Add a new emitComplexReal() helper function and use that for the new
casts as well as the old __real implementation.
2024-01-18 13:55:04 +01:00
Guillaume Chatelet
11ec512f44
[libc][NFC] Introduce a Sign type for FPBits (#78500)
Another patch is needed to cover `DyadicFloat` and `NormalFloat`
constructors.
2024-01-18 13:40:49 +01:00
Yingwei Zheng
9acc404230
[InstCombine] Recognize more rotation patterns (#78107)
InstCombine already handles the pattern `(shl ShVal, (X & (Width - 1)))
| (lshr ShVal, ((-X) & (Width - 1)))`. Under certain circumstances, `X &
(Width - 1)` will be simplified to `X`. Therefore, this patch adds
support for the pattern `(shl ShVal, X) | (lshr ShVal, ((-X) & (Width -
1)))`.

Alive2: https://alive2.llvm.org/ce/z/P7JQ2V
2024-01-18 20:29:53 +08:00
Congcong Cai
64e94438a4
[InstCombine] combine mul(abs(x),abs(y)) to abs(mul(x,y)) (#78395)
Fixes: https://github.com/llvm/llvm-project/issues/78076
Alive2 Proof: https://alive2.llvm.org/ce/z/XEDy0f
2024-01-18 20:12:00 +08:00
paperchalice
a48c1bda74
Revert "[CodeGen] Support start/stop in CodeGenPassBuilder" (#78567)
Reverts llvm/llvm-project#70912. This breaks some bazel tests.
2024-01-18 20:09:53 +08:00
Simon Pilgrim
d12dffacaa [X86] Add X86::getConstantFromPool helper function to replace duplicate implementations.
We had the same helper function in shuffle decode / vector constant code - move this to X86InstrInfo to avoid duplication.
2024-01-18 11:59:46 +00:00
Alexey Lapshin
cf799b3d3b
[DWARFLinker][NFC] Move common code into the base library: IndexedValuesMap. (#77437)
This patch is extracted from #74725.
Both dwarflinkers contain similar classes for indexed values. Move the
code into the DWARFLinkerBase.
2024-01-18 14:29:46 +03:00
Jie Fu
779af9b713 [AMDGPU] Fix -Wunused-variable in SIInsertWaitcnts.cpp (NFC)
llvm-project/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1539:10:
 error: unused variable 'SWaitInst' [-Werror,-Wunused-variable]
    auto SWaitInst =
         ^
1 error generated.
2024-01-18 19:28:48 +08:00
Paul Osmialowski
d5b2e41e20
[OpenMP][omp_lib] Restore compatibility with more restrictive Fortran compilers (#77780)
The most recent changes to `omp_lib.h.var` have re-introduced some
compatibility issues that had to be fixed due to the similar changes in
the past. Namely:

1. D120707 has removed the "use omp_lib_kinds" statement and replaced it
with import

2. D114537 added line continuation to the long lines

This patch introduces the same kind of changes in order to restore
compatibility with some more restrictive Fortran compilers so their
users could still benefit from the LLVM's OpenMP Fortran library.
2024-01-18 11:06:24 +00:00
Utkarsh Saxena
667e58a72e
[coroutines][coro_lifetimebound] Detect lifetime issues with lambda captures (#77066)
### Problem

```cpp
co_task<int> coro() {
    int a = 1;
    auto lamb = [a]() -> co_task<int> {
        co_return a; // 'a' in the lambda object dies after the iniital_suspend in the lambda coroutine.
    }();
    co_return co_await lamb;
}
```
[use-after-free](https://godbolt.org/z/GWPEovWWc)

Lambda captures (even by value) are prone to use-after-free once the
lambda object dies. In the above example, the lambda object appears only
as a temporary in the call expression. It dies after the first
suspension (`initial_suspend`) in the lambda.
On resumption in `co_await lamb`, the lambda accesses `a` which is part
of the already-dead lambda object.

---

### Solution

This problem can be formulated by saying that the `this` parameter of
the lambda call operator is a lifetimebound parameter. The lambda object
argument should therefore live atleast as long as the return object.
That said, this requirement does not hold if the lambda does not have a
capture list. In principle, the coroutine frame still has a reference to
a dead lambda object, but it is easy to see that the object would not be
used in the lambda-coroutine body due to no capture list.

It is safe to use this pattern inside a`co_await` expression due to the
lifetime extension of temporaries. Example:

```cpp
co_task<int> coro() {
    int a = 1;
    int res = co_await [a]() -> co_task<int> { co_return a; }();
    co_return res;
}
```
---
### Background

This came up in the discussion with seastar folks on
[RFC](https://discourse.llvm.org/t/rfc-lifetime-bound-check-for-parameters-of-coroutines/74253/19?u=usx95).
This is a fairly common pattern in continuation-style-passing (CSP)
async programming involving futures and continuations. Document ["Lambda
coroutine
fiasco"](https://github.com/scylladb/seastar/blob/master/doc/lambda-coroutine-fiasco.md)
by Seastar captures the problem.
This pattern makes the migration from CSP-style async programming to
coroutines very bugprone.


Fixes https://github.com/llvm/llvm-project/issues/76995

---------

Co-authored-by: Chuanqi Xu <yedeng.yd@linux.alibaba.com>
2024-01-18 11:56:55 +01:00
Florian Hahn
40d952b874
[CGP] Avoid replacing a free ext with multiple other exts. (#77094)
Replacing a free extension with 2 or more extensions unnecessarily
increases the number of IR instructions without providing any benefits.
It also unnecessarily causes operations to be performed on wider types
than necessary.

In some cases, the extra extensions also pessimize codegen (see
bfis-in-loop.ll).

The changes in arm64-codegen-prepare-extload.ll also show that we avoid
promotions that should only be performed in stress mode.

PR: https://github.com/llvm/llvm-project/pull/77094
2024-01-18 10:48:10 +00:00
Jay Foad
ba52f06f9d
[AMDGPU] CodeGen for GFX12 S_WAIT_* instructions (#77438)
Update SIMemoryLegalizer and SIInsertWaitcnts to use separate wait
instructions per counter (e.g. S_WAIT_LOADCNT) and split VMCNT into
separate LOADCNT, SAMPLECNT and BVHCNT counters.
2024-01-18 10:47:45 +00:00
Jay Foad
9ca36932b5
[AMDGPU] Work around s_getpc_b64 zero extending on GFX12 (#78186) 2024-01-18 10:23:27 +00:00
Jay Foad
4c65787f1e
[AMDGPU] Add GFX12 __builtin_amdgcn_s_sleep_var (#77926) 2024-01-18 10:14:01 +00:00
Jay Foad
c111dc72e9
[AMDGPU] Allow potentially negative flat scratch offsets on GFX12 (#78193)
https://github.com/llvm/llvm-project/pull/70634 has disabled use
of potentially negative scratch offsets, but we can use it on GFX12.

---------

Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
2024-01-18 10:02:40 +00:00
pvanhout
172dbdf931 [AMDGPU][ELF] Reserve 0x4f and 0x50 EFLAGS 2024-01-18 11:01:44 +01:00
Alexey Lapshin
f1fdfe6888
[dsymutil][llvm-dwarfutil] Rename command line options to avoid using vendor names. (#78135)
This patch renames values of dsymutil/llvm-dwarfutil options:

--linker apple -> --linker classic
--linker llvm -> --linker parallel

The purpose to rename options is to avoid using vendor names and to
match with library names. It should be safe to rename options at current
stage as they are not seemed widely used(we may not preserve backward
compatibility).
2024-01-18 12:55:04 +03:00
Kerry McLaughlin
e75720b477
[Clang][SME] Add missing IsStreamingCompatible flag to svget, svcreate & svset (#78430) 2024-01-18 09:51:34 +00:00
Nikita Popov
49e3e75143 [ConstantFold] Clean up binop identity folding
Resolve the two FIXMEs: Perform the binop identitiy fold with
AllowRHSConstant, and remove redundant folds later in the code.
2024-01-18 10:37:48 +01:00
Guillaume Chatelet
bc4f3e31a9
[libc][NFC] Selectively disable GCC warnings (#78462) 2024-01-18 10:36:21 +01:00
Matthew Devereau
51e3d2f73d
[AArch64][SME] Conditionally do smstart/smstop (#77113)
This patch adds conditional enabling/disabling of streaming mode for
functions which have both the aarch64_pstate_sm_compatible and
aarch64_pstate_sm_body attributes.

This combination allows callees to determine if switching streaming mode
is required instead of relying on the caller.
2024-01-18 09:17:23 +00:00
Luke Lau
15b0fabb21
[RISCV] Vectorize phi for loop carried @llvm.vector.reduce.fadd (#78244)
LLVM vector reduction intrinsics return a scalar result, but on RISC-V
vector reduction instructions write the result in the first element of a
vector register. So when a reduction in a loop uses a scalar phi, we end
up with unnecessary scalar moves:

loop:
    vfmv.s.f v10, fa0
    vfredosum.vs v8, v8, v10
    vfmv.f.s fa0, v8

This mainly affects ordered fadd reductions, which has a scalar accumulator
operand.
This tries to vectorize any scalar phis that feed into a fadd reduction
in RISCVCodeGenPrepare, converting:

loop:
%phi = phi <float> [ ..., %entry ], [ %acc, %loop]
%acc = call float @llvm.vector.reduce.fadd.nxv4f32(float %phi, <vscale x 2 x float> %vec)
```

to

loop:
%phi = phi <vscale x 2 x float> [ ..., %entry ], [ %acc.vec, %loop]
%phi.scalar = extractelement <vscale x 2 x float> %phi, i64 0
%acc = call float @llvm.vector.reduce.fadd.nxv4f32(float %x, <vscale x 2 x float> %vec)
%acc.vec = insertelement <vscale x 2 x float> poison, float %acc.next, i64 0

Which eliminates the scalar -> vector -> scalar crossing during
instruction selection.
2024-01-18 16:15:20 +07:00
Chuanqi Xu
085eae6b86 [C++20] [Modules] Allow to merge enums with the same underlying interger types
Close https://github.com/llvm/llvm-project/issues/76638. See the issue
for the context of the change.
2024-01-18 17:09:35 +08:00
LLVM GN Syncbot
9096bcc7c8 [gn build] Port 1d286ad59b 2024-01-18 08:46:34 +00:00
Ivan Kosarev
2a869ced61
[AMDGPU][True16] Support V_FLOOR_F16. (#78446) 2024-01-18 08:43:47 +00:00
jeanPerier
4f62a183d9
[flang] Allow user to define free via BIND(C) (#78428)
A user defining and using free/malloc via BIND(C) would previously cause
flang to crash when generating LLVM IR with error "redefinition of
symbol named 'free'". This was caused by flang codegen not expecting to
find a mlir::func::FuncOp definition of these function and emitting a
new mlir::LLVM::FuncOp that later conflicted when translating the
mlir::func::FuncOp.
2024-01-18 09:37:44 +01:00
Mirko Brkušanin
1d286ad59b
[AMDGPU] Add mark last scratch load pass (#75512) 2024-01-18 09:36:44 +01:00
Paschalis Mpeis
37c87d5689
[LV][AArch64] LoopVectorizer allows scalable frem instructions (#76247)
LoopVectorizer is aware when a target can replace a scalable frem
instruction with a vector library call for a given VF and it returns the
relevant cost. Otherwise, it returns an invalid cost (as previously).

Add test that check costs on AArch64, when there is no vector library
available and when there is (with and without tail-folding).

NOTE: Invoking CostModel directly (not through LV) would still return
invalid costs.
2024-01-18 08:32:53 +00:00