This enables IR expansion for i128 divisions. The vector case is still
broken because ExpandLargeDivRem doesn't try to handle them.
Fixes: SWDEV-426193
(cherry picked from commit a5d206df792b61a0b6c5ac44343a97696fc6071d)
As of 4d20cfcf4e, `__bit_reference`
contains a template `__fill_n` with a bool `_FillValue` parameter.
Unfortunately there is a relatively widely used piece of scientific
software called NetCDF, which exposes a (C) macro `_FillValue` in its
public headers.
When building the NetCDF C++ bindings, this quickly leads to compilation
errors when the macro interferes with the template in `__bit_reference`.
Rename the parameter to `_FillVal` to avoid the conflict.
(cherry picked from commit 1ec252298925de50b27930c557ba9de3cc397afe)
We noticed that some feature-test macros were not conditional on
configuration flags like _LIBCPP_HAS_NO_FILESYSTEM. As a result, code
attempting to use FTMs would not work as intended.
This patch adds conditionals for a few feature-test macros, but more
issues may exist.
rdar://122020466
(cherry picked from commit f2c84211d2834c73ff874389c6bb47b1c76d391a)
This adds GCC-compatible names for code model selection on 64-bit SPARC
with absolute code.
Testing with a 2-stage build then running codegen tests works okay under
all of the supported code models.
(32-bit target does not have selectable code models)
Reviewed By: @brad0, @MaskRay
(cherry picked from commit b0f0babff22e9c0af74535b05e2c6424392bb24a)
JumpThreading may perform AA queries while the dominator tree is not up
to date, which may result in miscompilations.
Fix this by adding a new AAQI option to disable the use of the dominator
tree in BasicAA.
Fixes https://github.com/llvm/llvm-project/issues/79175.
(cherry picked from commit 4f32f5d5720fbef06672714a62376f236a36aef5)
This allows caching AA queries both within and across the calls,
and enables us to use a custom AAQI configuration.
(cherry picked from commit 89dae798cc77789a43e9a60173f647dae03a65fe)
Prior to 885d7b759b, the builtins library
contained two chkstk implementations for each of i386 and x86_64, one
that was used in mingw environments, and one unused (with a symbol name
not matching anything that is used anywhere). Some of the functions
additionally had other, also unused, aliases.
After cleaning this up in 885d7b759b, the
unused symbol names were removed.
At the same time, symbol aliases were added for the names as they are
used by MSVC; the functions are functionally equivalent, but have
different names between mingw and MSVC style environments.
By adding a symbol alias (so that one object file contains two different
symbols for the same function), users can run into problems with
duplicate definitions, if they themselves define one of the symbols (for
various reasons), but need to link in the other one.
This happens for Wine, which provides their own definition of
"__chkstk", but when built in mingw mode does need compiler-rt to
provide the mingw specific symbol names; see
https://github.com/mstorsjo/llvm-mingw/issues/397.
To avoid the issue, remove the extra MS style names. They weren't
entirely usable as such for MSVC style environments anyway, as
compiler-rt builtins don't build these object files at all, when built
in MSVC mode; thus, the effort to provide them for MSVC style
environments in 885d7b759b was a
half-hearted step towards that.
If we really do want to provide those functions (as an alternative to
the ones provided by MSVC itself), we should do it in a separate object
file (even if the function implementation is the same), so that users
who have a definition of one of them but need a definition of the other,
won't have conflicts.
Additionally, if we do want to provide them for MSVC, those files
actually should be built when building the builtins in MSVC mode as well
(see compiler-rt/lib/builtins/CMakeLists.txt).
If we do that, there's a risk that an MSVC style build ends up linking
in and preferring our implementation over the one provided by MSVC,
which would be suboptimal. Our implementation always probes the
requested amount of stack, while the MSVC one checks the amount of
allocated stack and only probes as much as really is needed.
In short - this reverts the situation to what it was in the 17.x release
series (except for unused functions that have been removed).
(cherry picked from commit 248aeac1ad2cf4f583490dd1312a5b448d2bb8cc)
The current implementation (D138849) assumes `Branch`(es) would follow
after the corresponding `Decision`. It is not true if `Branch`(es) are
forwarded to expanded file ID. As a result, consecutive `Decision`(s)
would be confused with insufficient number of `Branch`(es).
`Expansion` will point `Branch`(es) in other file IDs if `Expansion` is
included in the range of `Decision`.
Fixes#77871
---------
Co-authored-by: Alan Phipps <a-phipps@ti.com>
(cherry picked from commit d912f1f0cb49465b08f82fae89ece222404e5640)
To relax scanning record, tweak order by `Decision < Expansion`, or
`Expansion` could not be distinguished whether it belonged to `Decision`
or not.
Relevant to #77871
(cherry picked from commit 438fe1db09b0c20708ea1020519d8073c37feae8)
This is a followup to #76819. After those changes, we can still run into
an assertion failure for a slight variation of the test case: When
fixing up MemoryPhis, we map the incoming access to the access of the
cloned instruction -- which may now no longer exist.
Fix this by reusing the getNewDefiningAccessForClone() helper, which
will look upwards for a new defining access in that case.
(cherry picked from commit a7a1b8b17e264fb0f2d2b4165cf9a7f5094b08b3)
When a function F has ZA and ZT0 state, calls another function G that
only shares ZT0 state with its caller, F will have to save ZA before
the call to G, and restore it afterwards (rather than setting up a
lazy-sve).
This is not yet implemented in LLVM and does not result in a
compile-time error either. So instead of silently generating incorrect
code, it's better to emit an error saying this is not yet implemented.
(cherry picked from commit 319f4c03ba2909c7240ac157cc46216bf1518c10)
When we do not enable vector features, we should return the default
value (`TargetTransformInfoImplBase::getRegisterBitWidth`) instead of
zero.
This should fix the LoongArch [buildbot
breakage](https://lab.llvm.org/staging/#/builders/5/builds/486) from
#78943.
(cherry picked from commit 1e9924c1f248bbddcb95d82a59708d617297dad3)
This returns (probably temporarily) array-referring NTTP behavior to
which was prior to #78041 because ~~I'm fed up~~ have no time to fix
regressions.
(cherry picked from commit 9bf4e54ef42d907ae7550f36fa518f14fa97af6f)
In a `--defsym y0=0 -T a.lds` link where a.lds contains only INSERT
commands, the `script->sectionCommands` layout may be:
```
orphan sections
SymbolAssignment due to --defsym
sections created by INSERT commands
```
The `OutputDesc` objects are not contiguous in sortInputSections, and
`compareSections` will be called with a SymbolAssignment argument,
leading to an assertion failure.
(cherry picked from commit dee8786f70a3d62b639113343fa36ef55bdbad63)
LAA currently adds memory locations with their original AATags to AST.
However, scoped alias AATags may be valid only within one loop
iteration, while LAA reasons across iterations.
Fix this by determining which alias scopes are defined inside the loop,
and drop AATags that reference these scopes.
Fixes https://github.com/llvm/llvm-project/issues/79137.
(cherry picked from commit cd7ea4ea657ea41b42fcbd0e6b33faa46608d18e)
__ARM_STATE_ZA and __ARM_STATE_ZT0 are set when the compiler can parse
the "za" and "zt0" strings in the SME attributes.
__ARM_FEATURE_SME and __ARM_FEATURE_SME2 are set when the compiler can
generate code for attributes with "za" and "zt0" state, respectively.
__ARM_FEATURE_LOCALLY_STREAMING is set when the compiler supports the
__arm_locally_streaming attribute.
(cherry picked from commit 9e649518e6038a5b9ea38cfa424468657d3be59e)
Exclude some using-declarations in the module purview when compiling
with `-fno-char8_t`.
(cherry picked from commit dc4483659fc51890fdc732acc66a4dcda6e68047)
The masked symbols in SLEEF are incorrectly implemented as calls to non
masked variants, what only works fine for functions which do not modify
memory.
For vector variants which modify memory we can only use a non masked
symbols for now.
The SVE ArmPL mappings need to be removed for now as well.
(cherry picked from commit 0f26441cb83c1dea9aef12c748a79e3f38e3230a)
We recently noticed that the unwrap_iter.h file was pushing macros, but
it was pushing them again instead of popping them at the end of the
file. This led to libc++ basically swallowing any custom definition of
these macros in user code:
#define min HELLO
#include <algorithm>
// min is not HELLO anymore, it's not defined
While investigating this issue, I noticed that our push/pop pragmas were
actually entirely wrong too. Indeed, instead of pushing macros like
`move`, we'd push `move(int, int)` in the pragma, which is not a valid
macro name. As a result, we would not actually push macros like `move`
-- instead we'd simply undefine them. This led to the following code not
working:
#define move HELLO
#include <algorithm>
// move is not HELLO anymore
Fixing the pragma push/pop incantations led to a cascade of issues
because we use identifiers like `move` in a large number of places, and
all of these headers would now need to do the push/pop dance.
This patch fixes all these issues. First, it adds a check that we don't
swallow important names like min, max, move or refresh as explained
above. This is done by augmenting the existing
system_reserved_names.gen.py test to also check that the macros are what
we expect after including each header.
Second, it fixes the push/pop pragmas to work properly and adds missing
pragmas to all the files I could detect a failure in via the newly added
test.
rdar://121365472
(cherry picked from commit 7b4622514d232ce5f7110dd8b20d90e81127c467)
The interaction between --warn-backrefs was not tested, but if
--defsym-created reference causes archive member extraction, it seems
reasonable to suppress the diagnostic, which was the behavior before #78944.
(cherry picked from commit 9a1ca245c8bc60b1ca12cd906fb31130801d977e)
This function is used in `jitlink-check` lines in LIT tests. In #78371 I
missed to swap initial instruction bytes for systems that store the
constants as big-endian.
(cherry picked from commit 8a5bdd899f3cb57024d92b96c16e805ca9924ac7)
`OpaqueValueExpr` doesn't necessarily contain a source expression.
Particularly, after #78041, it is used to carry the type and the value
kind of a non-type template argument of floating-point type or referring
to a subobject (those are so called `StructuralValue` arguments).
This fixes#79575.
(cherry picked from commit ef67f63fa5f950f4056b5783e92e137342805d74)
The condition for allowing integer complex number support could also
allow neon fixed length complex numbers if +sve2 was specified. This
tightens the condition to only allow integer complex number support for
scalable vectors.
We could generalize this in the future to generate SVE intrinsics for
fixed-length vectors, but for the moment this opts for the simpler fix.
(cherry picked from commit 9520773c46777adbc1d489f831d6c93b8287ca0e)
This adopts a similar behavior to AArch64 SVE, where bool vectors are
represented as a vector of chars with 1/8 the number of elements. This
ensures the vector always occupies a power of 2 number of bytes.
A consequence of this is that vbool64_t, vbool32_t, and vool16_t can
only be used with a vector length that guarantees at least 8 bits.
This fixes a miscompile from #79072 where we were taking the wrong SrcVec to do
the M1 shuffle. E.g. if the SrcVecIdx was 2 and we had 2 VRegsPerSrc, we ended
up taking it from V1 instead of V2.
Calling a `__arm_locally_streaming` function from a function that
is not a streaming-SVE function would lead to incorrect inlining.
The issue didn't surface because the tests were not testing what
they were supposed to test.
(cherry picked from commit 3abf55a68caefd45042c27b73a658c638afbbb8b)
This patch introduces a 'COALESCER_BARRIER' which is a pseudo node that
expands to
a 'nop', but which stops the register allocator from coalescing a COPY
node when
its use/def crosses a SMSTART or SMSTOP instruction.
For example:
%0:fpr64 = COPY killed $d0
undef %2.dsub:zpr = COPY %0 // <- Do not coalesce this COPY
ADJCALLSTACKDOWN 0, 0
MSRpstatesvcrImm1 1, 0, csr_aarch64_smstartstop, implicit-def dead $d0
$d0 = COPY killed %0
BL @use_f64, csr_aarch64_aapcs
If the COPY would be coalesced, that would lead to:
$d0 = COPY killed %0
being replaced by:
$d0 = COPY killed %2.dsub
which means the whole ZPR reg would be live upto the call, causing the
MSRpstatesvcrImm1 (smstop) to spill/reload the ZPR register:
str q0, [sp] // 16-byte Folded Spill
smstop sm
ldr z0, [sp] // 16-byte Folded Reload
bl use_f64
which would be incorrect for two reasons:
1. The program may load more data than it has allocated.
2. If there are other SVE objects on the stack, the compiler might use
the
'mul vl' addressing modes to access the spill location.
By disabling the coalescing, we get the desired results:
str d0, [sp, #8] // 8-byte Folded Spill
smstop sm
ldr d0, [sp, #8] // 8-byte Folded Reload
bl use_f64
(cherry picked from commit dd736661826e215ac70ff3a4a4ccd75bda0c5ccd)
On Gentoo, libc++ is indeed in /usr/include/c++/*, but libstdc++ is at
e.g. /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14.
Use '/include/g++' as it should be unique enough. Note that the omission of
a trailing slash is intentional to match g++-*.
See https://github.com/llvm/llvm-project/pull/78534#issuecomment-1904145839.
Reviewed by: mgorny
Closes: https://github.com/llvm/llvm-project/pull/79264
Signed-off-by: Sam James <sam@gentoo.org>
(cherry picked from commit e8f882f83acf30d9b4da8846bd26314139660430)
The Zicond extension was ratified in the last few months, with no
changes that affect the LLVM implementation. Although there's surely
more tuning that could be done about when to select Zicond or not, there
are no known correctness issues. Therefore, we should mark support as
non-experimental.
(cherry-picked from commit d833b9d677c9dd0a35a211e2fdfada21ea9a464b)
When testing on gcc, both exitstat and cmdstat must be a kind=4 integer,
e.g. DefaultInt. This patch changes the input arg requirement from
`AnyInt` to `TypePattern{IntType, KindCode::greaterOrEqualToKind, n}`.
The standard stated in 16.9.73
- EXITSTAT (optional) shall be a scalar of type integer with a decimal
exponent range of at least nine.
- CMDSTAT (optional) shall be a scalar of type integer with a decimal
exponent range of at least four.
```fortran
program bug
implicit none
integer(kind = 2) :: exitstatvar
integer(kind = 4) :: cmdstatvar
character(len=256) :: msg
character(len=:), allocatable :: command
command='echo hello'
call execute_command_line(command, exitstat=exitstatvar, cmdstat=cmdstatvar)
end program
```
When testing the above program with exitstatvar kind<4, an error would
occur:
```
$ ../build-release/bin/flang-new test.f90
error: Semantic errors in test.f90
./test.f90:8:47: error: Actual argument for 'exitstat=' has bad type or kind 'INTEGER(2)'
call execute_command_line(command, exitstat=exitstatvar)
```
When testing the above program with exitstatvar kind<2, an error would
occur:
```
$ ../build-release/bin/flang-new test.f90
error: Semantic errors in test.f90
./test.f90:8:47: error: Actual argument for 'cmdstat=' has bad type or kind 'INTEGER(1)'
call execute_command_line(command, cmdstat=cmdstatvar)
```
Test file for this semantics has been added to `flang/test/Semantic`
Fixes: https://github.com/llvm/llvm-project/issues/77990
(cherry picked from commit 14a15103cc9dbdb3e95c04627e0b96b5e3aa4944)
Fold expressions on Clang are limited to 256 elements. This causes
compilation errors in cases when the amount of elements added exceeds
this limit. Side-step the issue by restoring the original trick that
would use the std::initializer_list. For the record, in our downstream
Clang 16 gives:
mlir/include/mlir/IR/Dialect.h:269:23: fatal error: instantiating fold
expression with 688 arguments exceeded expression nesting limit of 256
(addType<Args>(), ...);
Partially reverts 26d811b3ec.
Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
(cherry picked from commit e3a38a75ddc6ff00301ec19a0e2488d00f2cc297)
When we generate runtime memory checks for an inner loop it's
possible that these checks are invariant in the outer loop and
so will get hoisted out. In such cases, the effective cost of
the checks should reduce to reflect the outer loop trip count.
This fixes a 25% performance regression introduced by commit
49b0e6dcc2
when building the SPEC2017 x264 benchmark with PGO, where we
decided the inner loop trip count wasn't high enough to warrant
the (incorrect) high cost of the runtime checks. Also, when
runtime memory checks consist entirely of diff checks these are
likely to be outer loop invariant.
(cherry picked from commit 962fbafecf4730ba84a3b9fd7a662a5c30bb2c7c)
GEPArg can only be constructed from int32_t and mlir::Value. Explicitly
cast other types (e.g. unsigned, size_t) to int32_t to avoid narrowing
conversion warnings on MSVC. Some recent examples of such are:
```
mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'size_t' to 'T' requires a narrowing
conversion
with
[
T=mlir::LLVM::GEPArg
]
mlir\lib\Dialect\LLVMIR\Transforms\TypeConsistency.cpp: error C2398:
Element '1': conversion from 'unsigned int' to 'T' requires a narrowing
conversion
with
[
T=mlir::LLVM::GEPArg
]
```
Co-authored-by: Nikita Kudriavtsev <nikita.kudriavtsev@intel.com>
(cherry picked from commit 89cd345667a5f8f4c37c621fd8abe8d84e85c050)
Include LLVM_ENABLE_HTTPLIB along with httplib package finding in
LLVMConfig.cmake, as this dependency is needed by LLVMDebuginfod that is
now used by LLDB. Without it, building LLDB standalone fails with:
```
CMake Error at /usr/lib/llvm/19/lib64/cmake/llvm/LLVMExports.cmake:90 (set_target_properties):
The link interface of target "LLVMDebuginfod" contains:
httplib::httplib
but the target was not found. Possible reasons include:
* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.
Call Stack (most recent call first):
/usr/lib/llvm/19/lib64/cmake/llvm/LLVMConfig.cmake:357 (include)
cmake/modules/LLDBStandalone.cmake:9 (find_package)
CMakeLists.txt:34 (include)
```
(cherry picked from commit 3c9f34c12450345c6eb524e47cf79664271e4260)