This adopts a similar behavior to AArch64 SVE, where bool vectors are
represented as a vector of chars with 1/8 the number of elements. This
ensures the vector always occupies a power of 2 number of bytes.
A consequence of this is that vbool64_t, vbool32_t, and vool16_t can
only be used with a vector length that guarantees at least 8 bits.
This fixes a miscompile from #79072 where we were taking the wrong SrcVec to do
the M1 shuffle. E.g. if the SrcVecIdx was 2 and we had 2 VRegsPerSrc, we ended
up taking it from V1 instead of V2.
Calling a `__arm_locally_streaming` function from a function that
is not a streaming-SVE function would lead to incorrect inlining.
The issue didn't surface because the tests were not testing what
they were supposed to test.
(cherry picked from commit 3abf55a68caefd45042c27b73a658c638afbbb8b)
This patch introduces a 'COALESCER_BARRIER' which is a pseudo node that
expands to
a 'nop', but which stops the register allocator from coalescing a COPY
node when
its use/def crosses a SMSTART or SMSTOP instruction.
For example:
%0:fpr64 = COPY killed $d0
undef %2.dsub:zpr = COPY %0 // <- Do not coalesce this COPY
ADJCALLSTACKDOWN 0, 0
MSRpstatesvcrImm1 1, 0, csr_aarch64_smstartstop, implicit-def dead $d0
$d0 = COPY killed %0
BL @use_f64, csr_aarch64_aapcs
If the COPY would be coalesced, that would lead to:
$d0 = COPY killed %0
being replaced by:
$d0 = COPY killed %2.dsub
which means the whole ZPR reg would be live upto the call, causing the
MSRpstatesvcrImm1 (smstop) to spill/reload the ZPR register:
str q0, [sp] // 16-byte Folded Spill
smstop sm
ldr z0, [sp] // 16-byte Folded Reload
bl use_f64
which would be incorrect for two reasons:
1. The program may load more data than it has allocated.
2. If there are other SVE objects on the stack, the compiler might use
the
'mul vl' addressing modes to access the spill location.
By disabling the coalescing, we get the desired results:
str d0, [sp, #8] // 8-byte Folded Spill
smstop sm
ldr d0, [sp, #8] // 8-byte Folded Reload
bl use_f64
(cherry picked from commit dd736661826e215ac70ff3a4a4ccd75bda0c5ccd)
On Gentoo, libc++ is indeed in /usr/include/c++/*, but libstdc++ is at
e.g. /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14.
Use '/include/g++' as it should be unique enough. Note that the omission of
a trailing slash is intentional to match g++-*.
See https://github.com/llvm/llvm-project/pull/78534#issuecomment-1904145839.
Reviewed by: mgorny
Closes: https://github.com/llvm/llvm-project/pull/79264
Signed-off-by: Sam James <sam@gentoo.org>
(cherry picked from commit e8f882f83acf30d9b4da8846bd26314139660430)
The Zicond extension was ratified in the last few months, with no
changes that affect the LLVM implementation. Although there's surely
more tuning that could be done about when to select Zicond or not, there
are no known correctness issues. Therefore, we should mark support as
non-experimental.
(cherry-picked from commit d833b9d677c9dd0a35a211e2fdfada21ea9a464b)
When testing on gcc, both exitstat and cmdstat must be a kind=4 integer,
e.g. DefaultInt. This patch changes the input arg requirement from
`AnyInt` to `TypePattern{IntType, KindCode::greaterOrEqualToKind, n}`.
The standard stated in 16.9.73
- EXITSTAT (optional) shall be a scalar of type integer with a decimal
exponent range of at least nine.
- CMDSTAT (optional) shall be a scalar of type integer with a decimal
exponent range of at least four.
```fortran
program bug
implicit none
integer(kind = 2) :: exitstatvar
integer(kind = 4) :: cmdstatvar
character(len=256) :: msg
character(len=:), allocatable :: command
command='echo hello'
call execute_command_line(command, exitstat=exitstatvar, cmdstat=cmdstatvar)
end program
```
When testing the above program with exitstatvar kind<4, an error would
occur:
```
$ ../build-release/bin/flang-new test.f90
error: Semantic errors in test.f90
./test.f90:8:47: error: Actual argument for 'exitstat=' has bad type or kind 'INTEGER(2)'
call execute_command_line(command, exitstat=exitstatvar)
```
When testing the above program with exitstatvar kind<2, an error would
occur:
```
$ ../build-release/bin/flang-new test.f90
error: Semantic errors in test.f90
./test.f90:8:47: error: Actual argument for 'cmdstat=' has bad type or kind 'INTEGER(1)'
call execute_command_line(command, cmdstat=cmdstatvar)
```
Test file for this semantics has been added to `flang/test/Semantic`
Fixes: https://github.com/llvm/llvm-project/issues/77990
(cherry picked from commit 14a15103cc9dbdb3e95c04627e0b96b5e3aa4944)
Fold expressions on Clang are limited to 256 elements. This causes
compilation errors in cases when the amount of elements added exceeds
this limit. Side-step the issue by restoring the original trick that
would use the std::initializer_list. For the record, in our downstream
Clang 16 gives:
mlir/include/mlir/IR/Dialect.h:269:23: fatal error: instantiating fold
expression with 688 arguments exceeded expression nesting limit of 256
(addType<Args>(), ...);
Partially reverts 26d811b3ec.
Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
(cherry picked from commit e3a38a75ddc6ff00301ec19a0e2488d00f2cc297)
When we generate runtime memory checks for an inner loop it's
possible that these checks are invariant in the outer loop and
so will get hoisted out. In such cases, the effective cost of
the checks should reduce to reflect the outer loop trip count.
This fixes a 25% performance regression introduced by commit
49b0e6dcc2
when building the SPEC2017 x264 benchmark with PGO, where we
decided the inner loop trip count wasn't high enough to warrant
the (incorrect) high cost of the runtime checks. Also, when
runtime memory checks consist entirely of diff checks these are
likely to be outer loop invariant.
(cherry picked from commit 962fbafecf4730ba84a3b9fd7a662a5c30bb2c7c)
GEPArg can only be constructed from int32_t and mlir::Value. Explicitly
cast other types (e.g. unsigned, size_t) to int32_t to avoid narrowing
conversion warnings on MSVC. Some recent examples of such are:
```
mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'size_t' to 'T' requires a narrowing
conversion
with
[
T=mlir::LLVM::GEPArg
]
mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'unsigned int' to 'T' requires a narrowing
conversion
with
[
T=mlir::LLVM::GEPArg
]
```
Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
(cherry picked from commit 89cd345667a5f8f4c37c621fd8abe8d84e85c050)
Include LLVM_ENABLE_HTTPLIB along with httplib package finding in
LLVMConfig.cmake, as this dependency is needed by LLVMDebuginfod that is
now used by LLDB. Without it, building LLDB standalone fails with:
```
CMake Error at /usr/lib/llvm/19/lib64/cmake/llvm/LLVMExports.cmake:90 (set_target_properties):
The link interface of target "LLVMDebuginfod" contains:
httplib::httplib
but the target was not found. Possible reasons include:
* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.
Call Stack (most recent call first):
/usr/lib/llvm/19/lib64/cmake/llvm/LLVMConfig.cmake:357 (include)
cmake/modules/LLDBStandalone.cmake:9 (find_package)
CMakeLists.txt:34 (include)
```
(cherry picked from commit 3c9f34c12450345c6eb524e47cf79664271e4260)
Flags `-fveclib=name` were not passed to LTO flags.
This pass fixes that by converting the `-fveclib` flags to their
relevant names for opt's `-vector-lib=name` flags.
For example:
`-fveclib=SLEEF` would become `-vector-library=sleefgnuabi` and passed
through the `-plugin-opt` flag.
(cherry picked from commit 03cf0e9354e7e56ff794e9efb682ed2971bc91ec)
Fixes a regression from #78041 as reported in the review. The original
patch failed to compare the canonical type, which this adds. A slightly
modified test of the original report is added.
(cherry picked from commit e3ee3762304aa81e4a240500844bfdd003401b36)
The glibc now adds the required minimum ISA level for libc-nonshared.a
(linked on all programs) and this is done with an inline asm along with
.note.gnu.property and .pushsection/.popsection. However, the x86
backend always ends the 'note.gnu.property' section when building with
-fcf-protection, leading to assert failure:
llvm/llvm-project-git/llvm/lib/MC/MCStreamer.cpp:1251: virtual void
llvm::MCStreamer::switchSection(llvm::MCSection*, const llvm::MCExpr*):
Assertion `!Section->hasEnded() && "Section already ended"' failed.
[1]
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86/isa-level.c;h=3f1b269848a52f994275bab6f60dded3ded6b144;hb=HEAD
(cherry picked from commit a58c62fa824fd24d20fa2366e0ec8f241cb321fe)
Fix and add a test case for combining '-ffat-lto-objects -c -emit-llvm'
options and fix a spelling mistake in same test.
(cherry picked from commit f1b1611148fa533fe198fec3fa4ef8139224dc80)
Support
R_RISCV_TLSDESC_HI20/R_RISCV_TLSDESC_LOAD_LO12/R_RISCV_TLSDESC_ADD_LO12/R_RISCV_TLSDESC_CALL.
LOAD_LO12/ADD_LO12/CALL relocations reference a label at the HI20
location, which requires special handling. We save the value of HI20 to
be reused. Two interleaved TLSDESC code sequences, which compilers do
not generate, are unsupported.
For -no-pie/-pie links, TLSDESC to initial-exec or local-exec
optimizations are eligible. Implement the relevant hooks
(R_RELAX_TLS_GD_TO_LE, R_RELAX_TLS_GD_TO_IE): the first two instructions
are converted to NOP while the latter two are converted to a GOT load or
a lui+addi.
The first two instructions, which would be converted to NOP, are removed
instead in the presence of relaxation. Relaxation is eligible as long as
the R_RISCV_TLSDESC_HI20 relocation has a pairing R_RISCV_RELAX,
regardless of whether the following instructions have a R_RISCV_RELAX.
In addition, for the TLSDESC to LE optimization (`lui a0,<hi20>; addi a0,a0,<lo12>`),
`lui` can be removed (i.e. use the short form) if hi20 is 0.
```
// TLSDESC to LE/IE optimization
.Ltlsdesc_hi2:
auipc a4, %tlsdesc_hi(c) # if relax: remove; otherwise, NOP
load a5, %tlsdesc_load_lo(.Ltlsdesc_hi2)(a4) # if relax: remove; otherwise, NOP
addi a0, a4, %tlsdesc_add_lo(.Ltlsdesc_hi2) # if LE && !hi20 {if relax: remove; otherwise, NOP}
jalr t0, 0(a5), %tlsdesc_call(.Ltlsdesc_hi2)
add a0, a0, tp
```
The implementation carefully ensures that an instruction unrelated to
the current TLSDESC code sequence, if immediately follows a removable
instruction (HI20 or LOAD_LO12 OR (LE-specific) ADD_LO12), is not
converted to NOP.
* `riscv64-tlsdesc.s` is inspired by `i386-tlsdesc-gd.s` (https://reviews.llvm.org/D112582).
* `riscv64-tlsdesc-relax.s` tests linker relaxation.
* `riscv-tlsdesc-gd-mixed.s` is inspired by `x86-64-tlsdesc-gd-mixed.s` (https://reviews.llvm.org/D116900).
Link: https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/373
Reviewed By: ilovepi
Pull Request: https://github.com/llvm/llvm-project/pull/79239
(cherry picked from commit 1117fdd7c16873eb389e988c6a39ad922bae0fd0)
We convert existed macro fusions to TableGen.
Bacause `Fusion` depend on `Instruction` definitions which is defined
below `RISCVFeatures.td`, so we recommend user to add fusion features
when defining new processor.
(cherry picked from commit 3fdb431b636975f2062b1931158aa4dfce6a3ff1)
These predicates can be used to represent `<`, `<=`, `>`, `>=`.
And a predicate for `in range` is added.
(cherry picked from commit 664a0faac464708fc061d12e5cd492fcbfea979a)
This fixes found non-determinism when `LLVM_REVERSE_ITERATION`
option is `ON`.
Fixes#79420.
Reviewers: ilovepi, MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/79411
(cherry picked from commit 41fe98a6e7e5cdcab4a4e9e0d09339231f480c01)
GCC supports -mtls-dialect= for several architectures to select TLSDESC.
This patch supports the following values
* x86: "gnu". "gnu2" (TLSDESC) is not supported yet.
* RISC-V: "trad" (general dynamic), "desc" (TLSDESC, see #66915)
AArch64 toolchains seem to support TLSDESC from the beginning, and the
general dynamic model has poor support. Nobody seems to use the option
-mtls-dialect= at all, so we don't bother with it.
There also seems very little interest in AArch32's TLSDESC support.
TLSDESC does not change IR, but affects object file generation. Without
a backend option the option is a no-op for in-process ThinLTO.
There seems no motivation to have fine-grained control mixing trad/desc
for TLS, so we just pass -mllvm, and don't bother with a modules flag
metadata or function attribute.
Co-authored-by: Paul Kirth <paulkirth@google.com>
(cherry picked from commit 36b4a9ccd9f7e04010476e6b2a311f2052a4ac20)
This test started to fail when LLVM created the release/18.x branch and
the main branch subsequently had the version number increased from 18 to
19.
I investigated this failure (it was blocking our internal automation)
and discovered that the CHECK statement on line 27 seemed to have the
compiler version number (1800) encoded in octal that it was checking
for. I don't know if this is something that explicitly needs to be
checked, so I am leaving it in, but it should be more flexible so the
test doesn't fail anytime the version number is changed. To accomplish
that, I changed the check for the 4-digit version number to be a regex.
I originally updated this test for the 18->19 transition in
a01195ff5cc3d7fd084743b1f47007645bb385f4. This change makes the CHECK
line more flexible so it doesn't need to be continually updated.
(cherry picked from commit 45f883ed06f39fba7557dfbbff4d10595b45f874)
A SHN_ABS symbol has never been considered for
InputSection::relocateNonAlloc.
Before #74686, the code did made it work in the absence of `-z
dead-reloc-in-nonalloc=`.
There is now a report about such SHN_ABS uses
(https://github.com/llvm/llvm-project/pull/74686#issuecomment-1904101711)
and I think it makes sense for non-SHF_ALLOC to support SHN_ABS, like
SHF_ALLOC sections do.
```
// clang -g
__attribute__((weak)) int symbol;
int *foo() { return &symbol; }
0x00000023: DW_TAG_variable [2] (0x0000000c)
...
DW_AT_location [DW_FORM_exprloc] (DW_OP_addrx 0x0)
```
.debug_addr references `symbol`, which can be redefined by a symbol
assignment or --defsym to become a SHN_ABS symbol.
The problem is that `!sym.getOutputSection()` cannot discern SHN_ABS
from a symbol whose section has been discarded. Since commit
1981b1b6b9, a symbol relative to a
discarded section is changed to `Undefined`, so the `SHN_ABS` check
become trivial.
We currently apply tombstone for a relocation referencing
`SharedSymbol`. This patch does not change the behavior.
(cherry picked from commit 8abf8d124ae346016c56209de7f57b85671d4367)
Reported in 134fcc6278
Incorrect opcode is used b/c there is a `[[fallthrough]]` at line 2386.
(cherry picked from commit 33ecef9812e2c9bfadef035b8e34a949acae2abc)
We should diff against the base branch, not always against `main`. This
allows the BuildKite pre-commit CI to work properly when we target other
branches, such as `release/18.x`.
(cherry picked from commit 3b762891826192ded07286852d326f9c9060f52e)
Close https://github.com/llvm/llvm-project/issues/73023
The direct issue of https://github.com/llvm/llvm-project/issues/73023 is
that we entered a header which is marked as pragma once since the
compiler think it is OK if there is controlling macro.
It doesn't make sense. I feel like it should be sufficient to skip it
after we see the '#pragma once'.
From the context, it looks like the workaround is primarily for
ObjectiveC. So we might need reviewers from OC.
CSR SGPR spilling currently uses the early available physical VGPRs. It
currently imposes a high register pressure while trying to allocate
large VGPR tuples within the default register budget.
This patch changes the spilling strategy by picking the VGPRs in the
reverse order, the highest available VGPR first and later after regalloc
shift them back to the lowest available range. With that, the initial
VGPRs would be available for allocation and possibility
of finding large number of contiguous registers will be more.
Add new pass manager support to `llc`. Users can use
`--passes=pass1,pass2...` to run mir passes, and use `--enable-new-pm`
to run default codegen pipeline.
This patch is taken from [D83612](https://reviews.llvm.org/D83612), the
original author is @yuanfang-chen.
---------
Co-authored-by: Yuanfang Chen <455423+yuanfang-chen@users.noreply.github.com>
f9c2a341b9
causes regressions when we have a slice with integer vector type that is
the same size as the partition, and a ptr load/store slice that is not
the size of the element type.
Ref `vector-promotion.ll:ptrLoadStoreTys`.
Before the patch, we would only consider `<4 x i32>` as a candidate type
for vector promotion, and would find that it is a viable type for all
the slices.
After the patch, we now add `<2 x ptr>` as a candidate type due to slice
with user `store ptr %val0, ptr %obj, align 8` -- and flag that we
`HaveVecPtrTy`. The pre-existing behavior of this flag results in
removing the viable `<4 x i32>` and keeping only the unviable `<2 x
ptr>`, which results in a failure to promote.
The end result is failing to promote an alloca that was previously
promoted -- this does not appear to be the intent of that patch, which
has the goal of increasing promotions by providing more promotion
opportunities.
This PR preserves this behavior via a simple reorganization of the
implemention: try first the slice types with same size as the partition,
then, if there is no promotable type, try the `LoadStoreTys.`