SCEV treats "or disjoint" the same as "add nsw nuw". However, when
expanding, we cannot generally replace an add SCEV node with an "or
disjoint" instruction. Just dropping the poison flag is insufficient in
this case, we would have to actually convert the or into an add.
This is a partial fix for #79861.
(cherry picked from commit 5b8e1a6ebf11b6e93bcc96a0d009febe4bb3d7bc)
Close https://github.com/llvm/llvm-project/issues/80570.
In
a0b6747804,
we skipped ODR checks for decls in GMF. Then it should be natural to
skip storing the ODR values in BMI.
Generally it should be fine as long as the writer and the reader keep
consistent.
However, the use of preamble in clangd shows the tricky part.
For,
```
// test.cpp
module;
// any one off these is enough to crash clangd
// #include <iostream>
// #include <string_view>
// #include <cmath>
// #include <system_error>
// #include <new>
// #include <bit>
// probably many more
// only ok with libc++, not the system provided libstdc++ 13.2.1
// these are ok
export module test;
```
clangd will store the headers as preamble to speedup the parsing and the
preamble reuses the serialization techniques. (Generally we'd call the
preamble as PCH. However it is not true strictly. I've tested the PCH
wouldn't be problematic.) However, the tricky part is that the preamble
is not modules. It literally serialiaze and deserialize things. So
before clangd parsing the above test module, clangd will serialize the
headers into the preamble. Note that there is no concept like GMF now.
So the ODR bits are stored. However, when clangd parse the file
actually, the decls from preamble are thought as in GMF literally, then
hte ODR bits are skipped. Then mismatch happens.
To solve the problem, this patch adds another bit for decls to record
whether or not the ODR bits are skipped.
(cherry picked from commit 49775b1dc0cdb3a9d18811f67f268e3b3a381669)
In IR or C code, shift amount larger than value size is undefined
behavior. But in practice, backend lowering for shift_parts produces
add/sub of shift amounts, thus constant shift amounts might be
negative or larger than value size, which depends on ISA definition.
PowerPC ISA says, the lowest 7 bits (6 bits for 32-bit instruction)
will be taken, and if the highest among them is 1, result will be
zero, otherwise the low 6 bits (or 5 on 32-bit) are used as shift
amount.
This commit emulates the behavior and avoids array overflow in bit
permutation's value bits calculator.
(cherry picked from commit 292d9e869fcfc2ece694848db4022b0b939847e3)
This patch adds assembly file `z_AIX_asm.S` that contains the 32- and
64-bit XCOFF version of microtasking routines and unnamed common block
definitions. This code has been run through the libomp LIT tests and a
user package successfully.
(cherry picked from commit 94100bc2fb1a39dbeb43d18a95176097c53f1324)
Otherwise we will crash since target intrinsics don't have their types
legalized. Let the mgather get legalized first, then do the combine on
the legal type.
Fixes#81088
Co-authored-by: Craig Topper <craig.topper@sifive.com>
(cherry picked from commit 06c89bd59ca2279f76a41e851b7b2df634a6191e)
This patch sets the stack size of worker threads to `2 x
KMP_DEFAULT_STKSIZE` (2 x 4MB) for AIX if the system stack size is too
big. Also defines maximum stack size for 32-bit AIX.
(cherry picked from commit 2de269a641e4ffbb7a44e559c4c0a91bb66df823)
to satisfy the __start___llvm_orderfile reference when linking with
-bexpfull and -fprofile-generate on AIX.
(cherry picked from commit 15cccc55919d27eb2e89379a65f6c7809f679fda)
If, like powi on windows, the libcall is unavailable we should fall back
to SDAG. Currently we try and generate a call to "".
(cherry picked from commit 47c65cf62d06add9f55a77c9d45390fa3b986fc5)
With the new SystemZ port we noticed that -pie executables generated
from files containing R_390_TLS_IEENT relocations will have unnecessary
relocations in their GOT:
9e8d8: R_390_TLS_TPOFF *ABS*+0x18
This is caused by the config->isPic conditon in addTpOffsetGotEntry:
static void addTpOffsetGotEntry(Symbol &sym) {
in.got->addEntry(sym);
uint64_t off = sym.getGotOffset();
if (!sym.isPreemptible && !config->isPic) {
in.got->addConstant({R_TPREL, target->symbolicRel, off, 0, &sym});
return;
}
It is correct that we need to retain a TPOFF relocation if the target
symbol is preemptible or if we're building a shared library. But when
building a -pie executable, those values are fixed at link time and
there's no need for any remaining dynamic relocation.
Note that the equivalent MIPS-specific code in MipsGotSection::build
checks for config->shared instead of config->isPic; we should use the
same check here. (Note also that on many other platforms we're not even
using addTpOffsetGotEntry in this case as an IE->LE relaxation is
applied before; we don't have this type of relaxation on SystemZ.)
(cherry picked from commit 6f907733e65d24edad65f763fb14402464bd578b)
This CMakeLists.txt is used to build modules without build system
support. This was removed in d06ae33ec3.
This is used in the documentation how to use modules.
Made some minor changes to make it work with the std.compat module using
the std module.
Note the CMakeLists.txt in the build dir should be removed once build
system support is generally available.
(cherry picked from commit fc0e9c8315564288f9079a633892abadace534cf)
multiregister node.
If the node can be span between several registers and same
extractelement instruction is used in several parts, it may be required
to keep such extractelement instruction to avoid compiler crash.
(cherry picked from commit 6fe21bc1dac883efa0dfa807f327048ae9969b81)
before erasing.
Before trying to erase the extractelement instruction, not enough to
check for single use, need to check that it is not used in several nodes
because of the preliminary nodes reordering.
(cherry picked from commit 48bbd7658710ef1699bf2a6532ff5830230aacc5)
Add review references to all items already mentioned.
Move some items to the right section (from the MinGW section to COFF, as
the implementation is in the COFF linker side, and may be relevant for
non-MinGW cases as well).
Refer to commit 6611d58f5b ("Relax R_RISCV_ALIGN"), we can relax
R_LARCH_ALIGN by same way. Reuse `SymbolAnchor`, `RISCVRelaxAux` and
`initSymbolAnchors` to simplify codes. As `riscvFinalizeRelax` is an
arch-specific function, put it override on `TargetInfo::finalizeRelax`,
so that LoongArch can override it, too.
The flow of relax R_LARCH_ALIGN is almost consistent with RISCV. The
difference is that LoongArch only has 4-bytes NOP and all executable
insn is 4-bytes aligned. So LoongArch not need rewrite NOP sequence.
Alignment maxBytesEmit parameter is supported in psABI v2.30.
(cherry picked from commit 06a728f3feab876f9195738b5774e82dadc0f3a7)
The change is included in the 18.x release. Move the release note to the
release branch and reformat.
(cherry picked from commit b40d5b1b08564d23d5e0769892ebbc32447b2987)
This patch adds full support for linking SystemZ (ELF s390x) object
files. Support should be generally complete:
- All relocation types are supported.
- Full shared library support (DYNAMIC, GOT, PLT, ifunc).
- Relaxation of TLS and GOT relocations where appropriate.
- Platform-specific test cases.
In addition to new platform code and the obvious changes, there were a
few additional changes to common code:
- Add three new RelExpr members (R_GOTPLT_OFF, R_GOTPLT_PC, and
R_PLT_GOTREL) needed to support certain s390x relocations. I chose not
to use a platform-specific name since nothing in the definition of these
relocs is actually platform-specific; it is well possible that other
platforms will need the same.
- A couple of tweaks to TLS relocation handling, as the particular
semantics of the s390x versions differ slightly. See comments in the
code.
This was tested by building and testing >1500 Fedora packages, with only
a handful of failures; as these also have issues when building with LLD
on other architectures, they seem unrelated.
Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@redhat.com>
(cherry picked from commit fe3406e349884e4ef61480dd0607f1e237102c74)
The direct lock data structure has bit `0` (the least significant bit)
of the first 32-bit word set to `1` to indicate it is a direct lock. On
the other hand, the first word (in 32-bit mode) or first two words (in
64-bit mode) of an indirect lock are the address of the entry allocated
from the indirect lock table. The runtime checks bit `0` of the first
32-bit word to tell if this is a direct or an indirect lock. This works
fine for 32-bit and 64-bit little-endian because its memory layout of a
64-bit address is (`low word`, `high word`). However, this causes
problems for big-endian where the memory layout of a 64-bit address is
(`high word`, `low word`). If an address of the indirect lock table
entry is something like `0x110035300`, i.e., (`0x1`, `0x10035300`), it
is treated as a direct lock. This patch defines `struct
kmp_base_tas_lock` with the ordering of the two 32-bit members flipped
for big-endian PPC64 so that when checking/setting tags in member
`poll`, the second word (the low word) is used. This patch also changes
places where `poll` is not already explicitly specified for
checking/setting tags.
(cherry picked from commit ac97562c99c3ae97f063048ccaf08ebdae60ac30)
This patch flips bit-fields in `struct flags` for big-endian in test
cases to be consistent with the definition of the structure in libomp
`kmp.h`.
(cherry picked from commit 7a9b0e4acb3b5ee15f8eb138aad937cfa4763fb8)
These were implemented in the COFF linker in
3923e61b96 and
d12b99a431.
This matches the corresponding options in the ELF linker.
(cherry picked from commit d033366bd2189e33343ca93d276b40341dc39770)
Function annotation, as part of llvm.metadata, is for the function
itself and doesn't apply to its corresponding jump table entry, so with
CFI we shouldn't replace function pointer in function annotation with
pointer to its corresponding jump table entry.
(cherry picked from commit c7a0db1e20251f436e3d500eac03bd9be1d88b45)
This optimization tries to optimize bitcasts from `<N x i1>` to iN, but
currently also triggers for `<N x i1>` to `<M x iK>` bitcasts, if custom
lowering has been requested for these for an unrelated reason. Fix this
by explicitly checking that the result type is scalar.
Fixes https://github.com/llvm/llvm-project/issues/81216.
(cherry picked from commit 92d79922051f732560acf3791b543df1e6580689)
This was added in 4b7beab418. When the
flag was added implicitly elsewhere, it was added via
llvm/cmake/modules/HandleLLVMOptions.cmake, where it wasn't added on
Windows/Cygwin targets.
This avoids one warning per object file in OpenMP.
(cherry picked from commit 72f04fa0734f8559ad515f507a4a3ce3f461f196)
If it isn't virtual, we may extend the live range of the physical
register past were it is valid. For example, across a call.
Found while trying to enable -riscv-enable-sink-fold which enables some
copy propagation in machine sink that led to ADDIs with physical
register destinations.
(cherry picked from commit feee627974df81e4cbf15537e4c4688aed66b12f)
When parsing the `la` macro, we add a duplicate `$` prefix in
`getOrCreateSymbol`,
leading to `error: Undefined temporary symbol $$yy` for code like:
```
xx:
la $2,$yy
$yy:
nop
```
Remove the duplicate prefix.
In addition, recognize `.L`-prefixed symbols as local for O32.
See: #65020.
---------
Co-authored-by: Fangrui Song <i@maskray.me>
(cherry picked from commit c007fbb19879f9b597b47ae772c53e53cdc65f29)
This enables specifing "za" or "zt0" to the clobber list for inline asm.
This complies with the acle SME addition to the asm extension here:
https://github.com/ARM-software/acle/pull/276
(cherry picked from commit d9c20e437fe110fb79b5ca73a52762e5b930b361)
This reverts commit bdc41106ee on the
release/18.x branch. This change was the first in a mini-series
and while I'm not aware of any particular problem from having it on
it's own in the branch, it seems safer to ship with the previous
known good state.
Having the test in the header requires including unistd.h on POSIX
platforms. This header has other declarations which may conflict with
code that uses named declarations provided by this header. For example
code using "int pipe;" would conflict with the function pipe in this
header.
Moving the code to the dylib means std::print would not be available on
Apple backdeployment targets. On POSIX platforms there is no transcoding
required so a not Standard conforming implementation is still a useful
and the observable differences are minimal. This behaviour has been done
for print before https://github.com/llvm/llvm-project/pull/76293.
Note questions have been raised in LWG4044 "Confusing requirements for
std::print on POSIX platforms", whether or not the isatty check on POSIX
platforms is required. When this LWG issue is resolved the
backdeployment targets could become Standard compliant.
This patch is intended to be backported to the LLVM-18 branch.
Fixes: https://github.com/llvm/llvm-project/issues/79782
(cherry picked from commit 4fb7b3301bfbd439eb3d30d6a36c7cdb26941a0d)
If we have something like G_TRUNC from v2s32 to v2s16, then lowering
this to a concat of two G_TRUNC s32 to s16 followed by G_TRUNC from
v2s16 to v2s8 does not bring us any closer to legality. In fact, the
first part of that is a G_BUILD_VECTOR whose legalization will produce a
new G_TRUNC from v2s32 to v2s16, and both G_TRUNCs will then get
combined to the original, causing a legalization cycle.
Make the lowering condition more precise, by requiring that the original
vector is >128 bits, which is I believe the only case where this
specific splitting approach is useful.
Note that this doesn't actually produce a legal result (the alwaysLegal
is a lie, as before), but it will cause a proper globalisel abort
instead of an infinite legalization loop.
Fixes https://github.com/llvm/llvm-project/issues/81244.
(cherry picked from commit 070848c17c2944afa494d42d3ad42929f3379842)
If llvm-readobj is built with a 32 bit time_t, it can't print such
timestamps correctly.
(cherry picked from commit 0bf4ff29816c0eead99ba576a2df2e3c4d214b1f)
This adds support for marking arbitrary general purpose registers -
except for those with special purpose (G0, I6-I7, O6-O7) - as reserved,
as needed by some software like the Linux kernel.
(cherry picked from commit c2f9885a8aa3a820eefdacccf3fcc6b9d87e3284)
The SOURCE_DATE_EPOCH environment variable can be set in order to get
reproducible build.
When linking PE/COFF modules with LLD, the timestamp field is set to the
current time, unless either the /timestamp: or /Brepro option is set. If
neither of them is set, check the SOURCE_DATE_EPOCH variable, before
resorting to using the actual current date and time.
See https://reproducible-builds.org/docs/source-date-epoch/ for reference
on the use of this variable.
(cherry picked from commit 0df8aed6c30f08ded526038a6bbb4daf113a31c1)
-DLIBCXX_ENABLE_UNICODE=OFF or -D_LIBCPP_HAS_NO_UNICODE doesn't build
without this change.
(cherry picked from commit 30cd1838dc334775f7a29f57b581f2bdda3f0ea1)