Address review comments in #76709
Add `NoCD8` to class `ITy`, and rewrite the promoted instructions with
`ITy` to avoid unexpected incorrect encoding about `NoCD8`.
Error message
```
*** Bad machine code: Illegal virtual register for instruction ***
- function: test__blsi_u32
- basic block: %bb.0 (0x7a61208)
- instruction: %5:gr32 = MOV32r0 implicit-def $eflags
- operand 0: %5:gr32
Expected a GR32_NOREX2 register, but got a GR32 register
```
Reported by RKSimon in #77433
The failure is b/c compiler emits a MOV32r0 with operand GR32 when
fast-isel is enabled.
```
// X86FastISel.cpp
Register SrcReg = fastEmitInst_(X86::MOV32r0, &X86::GR32RegClass)
```
However, before this patch, compiler only allows GR32_NOREX operand
b/c MOV32r0 is a pseudo instruction. In this patch, we relax the
register class of the operand to GR32 b/c MOV32r0 is always expanded
to XOR32rr, which can use EGPR.
The bug was not introduced by #77433 but caught by it.
In particular, we have internal customers that would like to use nanf
and
scalbnf.
The differences between various entrypoint files can be checked via:
$ comm -3 <(grep libc\.src path/to/entrypoints.txt | sort) \
<(grep libc\.src path/to/other/entrypoints.txt | sort)
These were fixed properly by f1f1875c18.
- Revert "[libc] temporarily set -Wno-shorten-64-to-32 (#77396)"
- Revert "[libc] make off_t 32b for 32b arm (#77350)"
This fixes missing inlined function names when formatting frame and the
`Block` in `SymbolContext` is a lexical block (e.g.
`DW_TAG_lexical_block` in Dwarf).
Summary:
The linker wrapper's job is to sort various embedded inputs into a list
of files that participate in a single link job. So far, this has been
completely 1-to-1, that is, each input file participates in exactly one
link job. However, support for AMD's target-id requires that one input
file may participate in multiple link jobs. For example, if given a
`gfx90a` static library and a `gfx90a:xnack+` object file input, we
should link the gfx90a` target into the `gfx90a:xnack+` job. These are
considered separate CPUs that can be mutually linked more or less.
This patch adds the necessary logic to make this happen. It primarily
reworks the logic to copy relevant input files into a separate list. So,
it moves construction of the final list of link jobs into the extraction
phase. We also need to copy the files in the case that it is needed more
than once, as the entire workflow expects ownership of said file.
Since most of the operations in the `math` dialect don't have
low-precision implementations, add the -math-legalize-to-f32 pass that
goes through and brackets low-precision math funcitons (like `math.sin
%0 : f16`) with `arith.extf` and `arith.truncf`. This preserves the
original semantics of the math operation but allows lowering to proceed.
Versions of this lowering are already implicitly present in some passes,
like ConvertGPUToROCDL. However, because those are implicit rewrites,
they hide the floating-point extension and truncation, preventing anyone
from writing passes that operate on those implitic extf/truncf pairs.
Exposing this legalization explicitly is needed to allow lowening 8-bit
floats on AMD GPUs, as the implementation of extf and truncf on that
platform requires the complex logic found in ArithToAMDGPU, which runs
before the GPU to ROCDL lowering.
Clean-up of the algorithm that assigns MC/DC True/False control-flow
condition IDs when constructing an MC/DC decision region. This patch
creates a common API for setting/getting the condition IDs, making the
binary logical operator visitor functions much cleaner.
This patch also fixes issue
https://github.com/llvm/llvm-project/issues/77873 in which a record's
control flow map can be malformed due to an incorrect calculation of the
True/False condition IDs.
We do the same for the analogous transform in DAGCombine, but this case
was missed in the recent patch which added support for zext nneg.
Sorry for the lack of test coverage. Not sure how to exercise this piece
of logic. It appears to have only minimal impact on LIT tests (only
test/CodeGen/X86/wide-scalar-shift-by-byte-multiple-legalization.ll),
and even then, the changes without it appear uninteresting. Maybe we
should remove this transform instead?
After the removal of the OpenMP early outlining MLIR pass in #67319, the
`EarlyOutliningInterface` stopped doing any useful work. It used to be
necessary to tie the name of the function from which a target region was
outlined to that new function, so it would be used when translating to
LLVM IR in place of the outlined function's name.
This is not necessary anymore, so this patch removes all references to
this interface and uses of the `omp.outline_parent_name` discardable
attribute in tests.
When merging blocks, if the previous block has no any branch instruction
and has one successor, the successor may be SEH landing pad and the
block will always raise exception and nerver fall through to next block.
We can not merge them in such case. isSuccessor should be used to
confirm it can fall through to next block.
The template function call CheckDescriptorEqInt((exitStat.get(), 127) is
deduced to have INT_T equal to std::int32_t instead of std::int64_t, but
the length descriptor points to a 64-byte storage. The comparison does
not work in a big endian.
Co-authored-by: Mark Danial <mark.danial@ibm.com>
When running the `flang/test/HLFIR/simplify-hlfir-intrinsics.fir` test
case on AIX we encounter issues building op as they are not found in the
mlir context:
```
LLVM ERROR: Building op `arith.subi` but it isn't known in this MLIRContext: the dialect may not be loaded or this operation hasn't been added by the dialect. See also https://mlir.llvm.org/getting_started/Faq/#registered-loaded-dependent-whats-up-with-dialects-management
LLVM ERROR: Building op `hlfir.yield_element` but it isn't known in this MLIRContext: the dialect may not be loaded or this operation hasn't been added by the dialect. See also https://mlir.llvm.org/getting_started/Faq/#registered-loaded-dependent-whats-up-with-dialects-management
LLVM ERROR: Building op `hlfir.yield_element` but it isn't known in this MLIRContext: the dialect may not be loaded or this operation hasn't been added by the dialect. See also https://mlir.llvm.org/getting_started/Faq/#registered-loaded-dependent-whats-up-with-dialects-management
```
The issue is caused by the "Merge disjoint stack slots" pass and the
error is not present if the source is built with `-mllvm
--no-stack-coloring`
Thanks to investigation by @stefanp-ibm we found that "the
initializer_list {inputIndices[1], inputIndices[0]} has a lifetime that
only exists for the range of the constructor for ValueRange. Once we get
to stack coloring we merge the stack slot for that element with another
stack slot and then it gets overwritten which corrupts
transposedIndices"
The changes below prevents the corruption of transposedIndices and
passes the test case.
Co-authored-by: Mark Danial <mark.danial@ibm.com>
FmtAlign::fill() accepts a uint32_t variable while the usages operate on
size_t values. On some platform / compiler combinations, this ends up
being a narrowing conversion. Fix this by changing the function's signature.
This was first seen on MSVC x86.
Co-authored-by: Orest Chura <orest.chura@intel.com>
`ValueAsMetadata::handleRAUW` is a mechanism to replace all metadata
referring to one value by a different value.
Relax an assert that used to enforce the old and new value to have the
same type.
This seems to be a sanity plausibility assert only, as the
implementation actually supports mismatching types.
This is motivated by a downstream mechanism where we use poison
ValueAsMetadata values to annotate pointee types of opaque pointer
function arguments.
When replacing one type with a different one to work around DXIL vs LLVM
incompatibilities, we need to update type annotations, and handleRAUW is
more efficient than creating new MD nodes.
Everytime an extension is added, this test will need to have the negative
extension appended to multiple CHECK lines where we're overriding the arch.
This is quite time consuming since it needs to be in the right order, so this
replaces the explicit list of negative extensions with a regexp instead.
Similar to `transform.get_result`, except it returns a handle to the
operand indicated by a positional specification, same as is defined for
the linalg match ops.
Additionally updates `get_result` to take the same positional specification.
This makes the use case of wanting to get all of the results of an
operation easier by no longer requiring the user to reconstruct the list
of results one-by-one.
Support new amdgcn_global_load_tr instructions for load with transpose.
* MC layer support for GLOBAL_LOAD_TR_B64/GLOBAL_LOAD_TR_B128
* Intrinsic int_amdgcn_global_load_tr
* Clang builtins amdgcn_global_load_tr*
This patch brings back the basic support for C by inserting the required
for value printing runtime only when we are in C++ mode. Additionally,
it defines a new overload of operator placement new because we can't
really forward declare it in a library-agnostic way.
Fixes the issue described in llvm/llvm-project#69072.
This moves the lowering of the nested evaluations all the way to the
bottom of the call stack. This PR does not attempt to change the leaf
lowering functions beyond placing the call to `genEval` in there.
Whether the nested evaluations should be lowered for any given op
depends on the context in which that op is created, hence a `genNested`
parameter was added.
Contexts in which nested evaluations should not be lowered are during
lowering of composite constructs, such as PARALLEL SECTIONS. This
particular case is considered a block construct tied to the SECTIONS
directive, and the lowering code will first create an empty parallel op,
and then recursively lower the SECTIONS code. Similar situations occur
when lowering most (if not all) compound/composite constructs.
Recursive lowering [4/5]
New pseudos were added for instructions that were natively VOP3 on
GFX11: V_ADD_F64_pseudo, V_MUL_F64_pseudo, V_MIN_NUM_F64, V_MAX_NUM_F64,
V_LSHLREV_B64_pseudo
---------
Co-authored-by: Mirko Brkusanin <Mirko.Brkusanin@amd.com>
The test checks that objects in arrays are destructed in reverse order during stack unwinding.
This patch is trying to establish a precedent how codegen tests for C++ defect report test suite should be written. Refer to PR for exact reasoning.
Endoding is VOP3P. Tagged as deep/machine learning instructions. i32
type (v4fp8 or v4bf8 packed in i32) is used for src0 and src1. src0 and
src1 have no src_modifiers. src2 is f32 and has src_modifiers: f32
fneg(neg_lo[2]) and f32 fabs(neg_hi[2]).
---------
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
The most recent changes to `omp_lib.h.var` have re-introduced some
compatibility issues that had to be fixed due to the similar changes in
the past. Namely:
1. D120707 has removed the "use omp_lib_kinds" statement and replaced it
with import
2. D114537 added line continuation to the long lines
This patch introduces the same kind of changes in order to restore
compatibility with some more restrictive Fortran compilers so their
users could still benefit from the LLVM's OpenMP Fortran library.
### Problem
```cpp
co_task<int> coro() {
int a = 1;
auto lamb = [a]() -> co_task<int> {
co_return a; // 'a' in the lambda object dies after the iniital_suspend in the lambda coroutine.
}();
co_return co_await lamb;
}
```
[use-after-free](https://godbolt.org/z/GWPEovWWc)
Lambda captures (even by value) are prone to use-after-free once the
lambda object dies. In the above example, the lambda object appears only
as a temporary in the call expression. It dies after the first
suspension (`initial_suspend`) in the lambda.
On resumption in `co_await lamb`, the lambda accesses `a` which is part
of the already-dead lambda object.
---
### Solution
This problem can be formulated by saying that the `this` parameter of
the lambda call operator is a lifetimebound parameter. The lambda object
argument should therefore live atleast as long as the return object.
That said, this requirement does not hold if the lambda does not have a
capture list. In principle, the coroutine frame still has a reference to
a dead lambda object, but it is easy to see that the object would not be
used in the lambda-coroutine body due to no capture list.
It is safe to use this pattern inside a`co_await` expression due to the
lifetime extension of temporaries. Example:
```cpp
co_task<int> coro() {
int a = 1;
int res = co_await [a]() -> co_task<int> { co_return a; }();
co_return res;
}
```
---
### Background
This came up in the discussion with seastar folks on
[RFC](https://discourse.llvm.org/t/rfc-lifetime-bound-check-for-parameters-of-coroutines/74253/19?u=usx95).
This is a fairly common pattern in continuation-style-passing (CSP)
async programming involving futures and continuations. Document ["Lambda
coroutine
fiasco"](https://github.com/scylladb/seastar/blob/master/doc/lambda-coroutine-fiasco.md)
by Seastar captures the problem.
This pattern makes the migration from CSP-style async programming to
coroutines very bugprone.
Fixes https://github.com/llvm/llvm-project/issues/76995
---------
Co-authored-by: Chuanqi Xu <yedeng.yd@linux.alibaba.com>