SITargetLowering::adjustWritemask calls SelectionDAG::UpdateNodeOperands
to update an EXTRACT_SUBREG node in-place to refer to a new IMAGE_LOAD
instruction, before we delete the old IMAGE_LOAD instruction. But in
UpdateNodeOperands can do CSE on the fly and return a different
EXTRACT_SUBREG node, so the original EXTRACT_SUBREG node would still
exist and would refer to the old deleted IMAGE_LOAD instruction. This
caused errors like:
t31: v3i32,ch = <<Deleted Node!>> # D:1
This target-independent node should have been selected!
UNREACHABLE executed at lib/CodeGen/SelectionDAG/InstrEmitter.cpp:1209!
Fix it by detecting the CSE case and replacing all uses of the original
EXTRACT_SUBREG node with the CSE'd one.
Recommit with a fix for a use-after-free bug in the first version of
this patch (#65340) which was caught by asan.
This is an alternative of D157485 and a pre-feature to support AVX10.
AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267
AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343
RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661
Based on the feedbacks from LLVM and GCC community, we have agreed to
start from supporting `-m[no-]evex512` on existing AVX512 features.
The option `-mno-evex512` can be used with `-mavx512xxx` to build
binaries that can run on both legacy AVX512 targets and AVX10-256.
There're still arguments about what's the expected behavior when this
option as well as `-mavx512xxx` used together with `-mavx10.1-256`. We
decided to defer the support of `-mavx10.1` after we made consensus.
Or furthermore, we start from supporting AVX10.2 and not providing any
AVX10.1 options.
Reviewed By: RKSimon, skan
Differential Revision: https://reviews.llvm.org/D159250
This directly models the flags as part of the recipe, which allows
dropping them using the VPlan infrastructure when required.
It also allows removing the full reference to InductionDescriptor and
limit it to only the opcode.
In the LDDB documentation you have the following sentence:
"The --format (which you can shorten to -f) option accepts a format name."
The link points to the wrong place.
I pointed it to the table that precedes the Type summary section.
Reviewed By: DavidSpickett
Differential Revision: https://reviews.llvm.org/D151668
When we call `getEnvironment`, `BlockToState[BlockId]` for the block can
return null even if CFCtx.isBlockReachable(B) returns true if it is
called from a particular block that is marked unreachable to the block.
We still can try to vectorize the bundle of the instructions, even if
the
repeated number of instruction is non-power-of-2. In this case need to
adjust the cost (calculate the cost only for unique scalar instructions)
and cost of the extracts. Also, when scheduling the bundle need to
schedule only unique scalars to avoid compiler crash because of the
multiple dependencies. Can be safely applied only if all scalars's users
are also vectorized and do not require memory accesses (this one is
a temporarily requirement, can be relaxed later).
---------
Co-authored-by: Alexey Bataev <a.bataev@outlook.com>
On GLibc, FreeBSD and macOS systems nl_catd is a pointer type, and
round-tripping this in a variable of ptrdiff_t is not portable.
In fact such a round-trip yields a non-dereferenceable pointer on
CHERI-enabled architectures such as Arm Morello. There pointers (and
therefore intptr_t) are twice the size of ptrdiff_t, which means casting
to ptrdiff_t strips the high (metadata) bits (as well as a hidden pointer
validity bit).
Since catalog is now guaranteed to be the same size or larger than nl_catd,
we can store all return values safely and the shifting workaround from
commit 0c68ed006d should not be needed
anymore (this is also not portable to CHERI systems on since shifting a
valid pointer right will create a massively out-of-bounds pointer that
may not be representable).
This can be fixed by using intptr_t which should be the same type as
ptrdiff_t on all currently supported architectures.
See also: https://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#2028
Differential Revision: https://reviews.llvm.org/D134420
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
This prevents S_NOP from being rescheduled past other (side-effecting)
instructions, which is useful because it is generally used to introduce
a short delay or to avoid hazards. Currently this only affects MIR tests
because the compiler itself only inserts nops in PostRAHazardRecognizer
which runs after all scheduling.
Should be the same logic, but hopefully easier to read this way. Gets
rid of some superfluous state variables, and uses early returns.
Differential Revision: https://reviews.llvm.org/D112956
According to the LLVM language reference, both volatile memory
operations and atomic operations (except unordered) do not simply read
memory but also perform write operations on arbitrary memory[0][1].
In the case of volatile memory operations, this is the case due to the
read possibly having target specific properties. A common real-world
situation where this happens is reading memory mapped registers on an
MCU for example. Atomic operations are more special. They form a kind of
memory barrier which from the perspective of the optimizer/lang-ref
makes writes from other threads visible in the current thread. Any kind
of synchronization can therefore conservatively be modeled as a
write-effect.
This PR therefore adjusts the side effects of `llvm.load` and
`llvm.store` to add unknown global read and write effects if they are
either atomic or volatile.
Regarding testing: I am not sure how to best test this change for
`llvm.store` and the "globalness" of the effect that isn't just a unit
test checking that the output matches exactly. For the time being, I
added a test making sure that `llvm.load` does not get DCEd in
aforementioned cases.
Related logic in LLVM proper:
3398744a61/llvm/lib/IR/Instruction.cpp (L638-L676)3398744a61/llvm/include/llvm/IR/Instructions.h (L258-L262)
[0] https://llvm.org/docs/LangRef.html#volatile-memory-accesses
[1] https://llvm.org/docs/Atomics.html#monotonic
This patch adds support for the zeroinitializer constant to LLVM
dialect. It's meant to simplify zero-initialization of aggregate types
in MLIR, although it can also be used with non-aggregate types.
The issue #55208 describes a current deficiency of the SLPVectorizer,
namely that it doesn't vectorize code written with lrint, while similar
code written with rint is vectorized. Add a test corresponding to this
issue for the RISC-V target.
The issues link as it was was including PRs, so I've made that
issues only and only those in an open state.
Code review is now on Gtihub so same thing, PRs only, open state,
label lldb.
This allows the lowering of > rank 1 transfer_reads/writes to equivalent
lower-rank ones when the trailing dimension is scalable. The resulting
ops still cannot be completely lowered as they depend on arrays of
scalable vectors being enabled, and a few related fixes (see D158517).
This patch also explicitly disables lowering transfer_reads/writes with
a leading scalable dimension, as more changes would be needed to handle
that correctly and it is unclear if it is required.
Examples of ops that can now be further lowered:
%vec = vector.transfer_read %arg0[%c0, %c0], %cst, %mask
{in_bounds = [true, true]} : memref<3x?xf32>, vector<3x[4]xf32>
vector.transfer_write %vec, %arg0[%c0, %c0], %mask
{in_bounds = [true, true]} : vector<3x[4]xf32>, memref<3x?xf32>
Reviewed By: c-rhodes, awarzynski, dcaballe
Differential Revision: https://reviews.llvm.org/D158753
Buitlins from AMD's device-libs are compiled without specifying a
target-cpu, which results in builtins without the target-features
attribute set.
Before this patch, when linking this builtins with -mlink-builtin-bitcode
the target-features were not propagated in the incoming builtins.
With this patch, the default target features are propagated
if they are compatible with the target-features in the incoming builtin.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D159206
Adds the following:
* A note that you can use attaching to debug the right lldb-server
process, though there are drawbacks.
* A section on debugging the remote protocol.
* Reducing bugs, including reducing ptrace bugs to remove the need for
LLDB.
I've added a standlone ptrace program to the examples folder because:
* There's no better place to put it.
* Adding it to the page seems like wasting space, and would be harder to
update.
* I link to Eli Bendersky's classic blog on the subject, but we are
safer with our own example as well.
* Eli's example is for 32 bit Intel, AArch64 is more common these days.
* It's easier to show the software breakpoint steps in code than explain
it (though I still do that in the text).
* It was living on my laptop not helping anyone so I think it's good to
have it upstream for others, including future me.
C++20 modules
Previously, we banned the check for input files from C++20 modules since
we thought the BMI from C++20 modules should be a standalone artifact.
However, during the recent experiment with clangd for modules, I find
it is necessary to tell whether or not a BMI is out-of-date by checking the
input files especially for language servers.
So this patch brings a header search option
ForceCheckCXX20ModulesInputFiles to allow the tools (concretly, clangd)
to check the input files from BMI.
This patch changes how common blocks are aggregated and named in
lowering in order to:
* fix one obvious issue where BIND(C) and non BIND(C) with the same
Fortran name were "merged"
* go further and deal with a derivative where the BIND(C) C name matches
the assembly name of a Fortran common block. This is a bit unspecified
IMHO, but gfortran, ifort, and nvfortran "merge" the common block
without complaints as a linker would have done. This required getting
rid of all the common block mangling early in FIR (\_QC) instead of
leaving that to the phase that emits LLVM from FIR because BIND(C)
common blocks did not have mangled names. Care has to be taken to deal
with the underscoring option of flang-new.
See added flang/test/Lower/HLFIR/common-block-bindc-conflicts.f90 for an
illustration.
A field `FilterClassField` is added to `GenericTable` class, which
is an optional bit field of `FilterClass`. If specified, only those
records with this field being true will have corresponding entries
in the table.
D157648 broke the function because it put the blocking wait into a
critical section. This meant that, if m_iohandler_sync was not updated
before entering the function, no amount of waiting would help.
Fix that by restriciting the scope of the critical section to the
iohandler check.
This reverts commit dc3f758ddc.
Lit decided to show me the least interesting part of the
test output, but from what I gather on Mac OS the DWARF
stays in the object files (https://stackoverflow.com/a/12827463).
So either split DWARF options do nothing or they produce
files I don't know the name of that aren't .dwo, so I'm
skipping these tests on Darwin.
A typical test case goes through the format, stability, ObjC, and messUp
tests. If any of theses tests fails, we should skip the remaining tests
for the same test case.