This change saves memory by providing the allocator more freedom to
allocate the most
efficient size class by dropping the alignment requirements for
std::string's
pointer from 16 to 8. This changes the output of std::string::max_size,
which makes it ABI breaking.
That said, the discussion concluded that we don't care about this ABI
break. and would like this change enabled universally.
The ABI break isn't one of layout or "class size", but rather the value
of "max_size()" changes, which in turn changes whether `std::bad_alloc`
or `std::length_error` is thrown for large allocations.
This change is the child of PR #68807, which enabled the change behind
an ABI flag.
Currently, the way that recomputeLiveIns works is that it will recompute
the livein registers for that MachineBasicBlock but it matters what
order you call recomputeLiveIn which can result in incorrect register
allocations down the line.
Now we do not recompute the entire CFG but we do ensure that the newly
added MBB do reach convergence. This fixes a register allocation bug
introduced in AArch64 stack probing.
(cherry picked from commit ff4636a4ab00b633c15eb3942c26126ceb2662e6)
Currently when interleaving vector calls with linear arguments,
the Part is ignored and all vector calls use the initial value
from the first lane of the current iteration.
Fix this to extract from the correct part of the linear vector.
(cherry picked from commit d4c01714239e80d21e441c3886749fc56b743f81)
Change AfterPlacementOperator to a boolean and deprecate SBPO_Never,
which meant never inserting a space except when after new/delete.
Fixes#78892.
(cherry picked from commit 908fd09a13b2e89a52282478544f7f70cf0a887f)
The result of umin may be poison and in that case the added constraints
are not be valid in contexts where poison doesn't cause UB. Only queue
facts for min/max intrinsics if the result is guaranteed to not be
poison.
This could be improved in the future, by only adding the fact when
solving conditions using the result value.
Fixes https://github.com/llvm/llvm-project/issues/78621.
(cherry picked from commit 3d91d9613e294b242d853039209b40a0cb7853f2)
With a690e86 we added -mcpu/mtune=native support to handle the Microsoft
Azure Cobalt 100 CPU as a Neoverse N2. This patch adds a CPU alias in
TargetParser to maintain compatibility with GCC.
(cherry picked from commit ae8005ffb6cd18900de8ed5a86f60a4a16975471)
FastISel may create a redundant BGTZ terminal which fallthroughes.
```
BGTZ %2:gpr32, %bb.1, implicit-def $at
bb.1.bb1:
; predecessors: %bb.0
```
The `!I->isBarrier()` check in
MipsAsmPrinter::isBlockOnlyReachableByFallthrough
will incorrectly not print a label, leading to a `Undefined temporary
symbol `
error when we try assembling the output assembly file. See the updated
`Fast-ISel/pr40325.ll` and
https://github.com/rust-lang/rust/issues/108835
In addition, the `SwitchInst` condition is too conservative and prints
many unneeded labels (see the updated tests).
Just use the generic isBlockOnlyReachableByFallthrough, updated by
commit 1995b9fead for SPARC, which also
handles MIPS.
(cherry picked from commit 6b2fd7aed66d592738f26c76caa8fff95e168598)
In https://github.com/llvm/llvm-project/pull/76873 a warning was added
when the macros INFINITY and NAN are used in binary expressions when
-menable-no-nans or -menable-no-infs are used. If the user uses an
option that nullifies these two options, the warning will still be
generated. This patch adds an additional information to the warning
comment to let the user know about this. It also suppresses the warning
when #ifdef INFINITY, #ifdef NAN, #ifdef NAN or #ifndef NAN are used in
the code.
(cherry picked from commit 62c352e13c145b5606ace88ecbe9164ff011b5cf)
A recent comment modified the job to only run on the main branch, but
the formatting was slightly off, causing the job to not run. This patch
fixes the formatting so the job will run as expected.
(cherry picked from commit 4b34558f43121df9b863ff2492f74fb2e65a5af1)
Modifying a cherry-picked patch to fix code formatting issues can be
risky, so we don't typically do this. Therefore, it's not necessary to
run this job on the release branches.
(cherry picked from commit 2193c95e2459887e7e6e4f9f4aacf9252e99858f)
The assertion in #80597 failed when we were trying to compute known bits
of a value in an unreachable BB.
859b09da08/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp (L749-L810)
In this case, `SignBits` is 30 (deduced from instr info), but `Known` is
`10000101010111010011110101000?0?00000000000000000000000000000000`
(deduced from dom cond). Setting high bits of `lshr Known, 1` will lead
to conflict.
This patch masks out high bits of `Known.Zero` to address this problem.
Fixes#80597.
(cherry picked from commit cb8d83a77c25e529f58eba17bb1ec76069a04e90)
Background: BPI stores a collection of edge branch-probabilities, and
also a set of Callback value-handles for the blocks in the
edge-collection. When a block is deleted, BPI's eraseBlock method is
called to clear the edge-collection of references to that block, to
avoid dangling pointers.
However, when move-constructing or assigning a BPI object, the
edge-collection gets moved, but the value-handles are discarded. This
can lead to to stale entries in the edge-collection when blocks are
deleted without the callback -- not normally a problem, but if a new
block is allocated with the same address as an old block, spurious
branch probabilities will be recorded about it. The fix is to transfer
the handles from the source BPI object.
This was exposed by an unrelated debug-info change, it probably just
shifted around allocation orders to expose this. Detected as
nondeterminism and reduced by Zequan Wu:
f1b0a54451 (commitcomment-136737090)
(No test because IMHO testing for a behaviour that varies with memory
allocators is likely futile; I can add the reproducer with a CHECK for
the relevant branch weights if it's desired though)
(cherry picked from commit 604a6c409e8473b212952b8633d92bbdb22a45c9)
The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a
CMake project to use for generating the profile data. For example, to
use the llvm-test-suite to generate profile data you would do:
$ cmake -G Ninja -B build -S llvm -C <path to
source>/clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite>
\
-DBOOTSTRAP_CLANG_PGO_TRAINING_DEPS=runtimes
Note that the CLANG_PERF_TRAINING_DEPS has been renamed to
CLANG_PGO_TRAINING_DEPS.
---------
Co-authored-by: Petr Hosek <phosek@google.com>
(cherry picked from commit dd0356d741aefa25ece973d6cc4b55dcb73b84b4)
This enables IR expansion for i128 divisions. The vector case is still
broken because ExpandLargeDivRem doesn't try to handle them.
Fixes: SWDEV-426193
(cherry picked from commit a5d206df792b61a0b6c5ac44343a97696fc6071d)
As of 4d20cfcf4e, `__bit_reference`
contains a template `__fill_n` with a bool `_FillValue` parameter.
Unfortunately there is a relatively widely used piece of scientific
software called NetCDF, which exposes a (C) macro `_FillValue` in its
public headers.
When building the NetCDF C++ bindings, this quickly leads to compilation
errors when the macro interferes with the template in `__bit_reference`.
Rename the parameter to `_FillVal` to avoid the conflict.
(cherry picked from commit 1ec252298925de50b27930c557ba9de3cc397afe)
We noticed that some feature-test macros were not conditional on
configuration flags like _LIBCPP_HAS_NO_FILESYSTEM. As a result, code
attempting to use FTMs would not work as intended.
This patch adds conditionals for a few feature-test macros, but more
issues may exist.
rdar://122020466
(cherry picked from commit f2c84211d2834c73ff874389c6bb47b1c76d391a)
This adds GCC-compatible names for code model selection on 64-bit SPARC
with absolute code.
Testing with a 2-stage build then running codegen tests works okay under
all of the supported code models.
(32-bit target does not have selectable code models)
Reviewed By: @brad0, @MaskRay
(cherry picked from commit b0f0babff22e9c0af74535b05e2c6424392bb24a)
JumpThreading may perform AA queries while the dominator tree is not up
to date, which may result in miscompilations.
Fix this by adding a new AAQI option to disable the use of the dominator
tree in BasicAA.
Fixes https://github.com/llvm/llvm-project/issues/79175.
(cherry picked from commit 4f32f5d5720fbef06672714a62376f236a36aef5)
This allows caching AA queries both within and across the calls,
and enables us to use a custom AAQI configuration.
(cherry picked from commit 89dae798cc77789a43e9a60173f647dae03a65fe)
Prior to 885d7b759b, the builtins library
contained two chkstk implementations for each of i386 and x86_64, one
that was used in mingw environments, and one unused (with a symbol name
not matching anything that is used anywhere). Some of the functions
additionally had other, also unused, aliases.
After cleaning this up in 885d7b759b, the
unused symbol names were removed.
At the same time, symbol aliases were added for the names as they are
used by MSVC; the functions are functionally equivalent, but have
different names between mingw and MSVC style environments.
By adding a symbol alias (so that one object file contains two different
symbols for the same function), users can run into problems with
duplicate definitions, if they themselves define one of the symbols (for
various reasons), but need to link in the other one.
This happens for Wine, which provides their own definition of
"__chkstk", but when built in mingw mode does need compiler-rt to
provide the mingw specific symbol names; see
https://github.com/mstorsjo/llvm-mingw/issues/397.
To avoid the issue, remove the extra MS style names. They weren't
entirely usable as such for MSVC style environments anyway, as
compiler-rt builtins don't build these object files at all, when built
in MSVC mode; thus, the effort to provide them for MSVC style
environments in 885d7b759b was a
half-hearted step towards that.
If we really do want to provide those functions (as an alternative to
the ones provided by MSVC itself), we should do it in a separate object
file (even if the function implementation is the same), so that users
who have a definition of one of them but need a definition of the other,
won't have conflicts.
Additionally, if we do want to provide them for MSVC, those files
actually should be built when building the builtins in MSVC mode as well
(see compiler-rt/lib/builtins/CMakeLists.txt).
If we do that, there's a risk that an MSVC style build ends up linking
in and preferring our implementation over the one provided by MSVC,
which would be suboptimal. Our implementation always probes the
requested amount of stack, while the MSVC one checks the amount of
allocated stack and only probes as much as really is needed.
In short - this reverts the situation to what it was in the 17.x release
series (except for unused functions that have been removed).
(cherry picked from commit 248aeac1ad2cf4f583490dd1312a5b448d2bb8cc)
The current implementation (D138849) assumes `Branch`(es) would follow
after the corresponding `Decision`. It is not true if `Branch`(es) are
forwarded to expanded file ID. As a result, consecutive `Decision`(s)
would be confused with insufficient number of `Branch`(es).
`Expansion` will point `Branch`(es) in other file IDs if `Expansion` is
included in the range of `Decision`.
Fixes#77871
---------
Co-authored-by: Alan Phipps <a-phipps@ti.com>
(cherry picked from commit d912f1f0cb49465b08f82fae89ece222404e5640)
To relax scanning record, tweak order by `Decision < Expansion`, or
`Expansion` could not be distinguished whether it belonged to `Decision`
or not.
Relevant to #77871
(cherry picked from commit 438fe1db09b0c20708ea1020519d8073c37feae8)
This is a followup to #76819. After those changes, we can still run into
an assertion failure for a slight variation of the test case: When
fixing up MemoryPhis, we map the incoming access to the access of the
cloned instruction -- which may now no longer exist.
Fix this by reusing the getNewDefiningAccessForClone() helper, which
will look upwards for a new defining access in that case.
(cherry picked from commit a7a1b8b17e264fb0f2d2b4165cf9a7f5094b08b3)
When a function F has ZA and ZT0 state, calls another function G that
only shares ZT0 state with its caller, F will have to save ZA before
the call to G, and restore it afterwards (rather than setting up a
lazy-sve).
This is not yet implemented in LLVM and does not result in a
compile-time error either. So instead of silently generating incorrect
code, it's better to emit an error saying this is not yet implemented.
(cherry picked from commit 319f4c03ba2909c7240ac157cc46216bf1518c10)
When we do not enable vector features, we should return the default
value (`TargetTransformInfoImplBase::getRegisterBitWidth`) instead of
zero.
This should fix the LoongArch [buildbot
breakage](https://lab.llvm.org/staging/#/builders/5/builds/486) from
#78943.
(cherry picked from commit 1e9924c1f248bbddcb95d82a59708d617297dad3)
This returns (probably temporarily) array-referring NTTP behavior to
which was prior to #78041 because ~~I'm fed up~~ have no time to fix
regressions.
(cherry picked from commit 9bf4e54ef42d907ae7550f36fa518f14fa97af6f)
In a `--defsym y0=0 -T a.lds` link where a.lds contains only INSERT
commands, the `script->sectionCommands` layout may be:
```
orphan sections
SymbolAssignment due to --defsym
sections created by INSERT commands
```
The `OutputDesc` objects are not contiguous in sortInputSections, and
`compareSections` will be called with a SymbolAssignment argument,
leading to an assertion failure.
(cherry picked from commit dee8786f70a3d62b639113343fa36ef55bdbad63)
LAA currently adds memory locations with their original AATags to AST.
However, scoped alias AATags may be valid only within one loop
iteration, while LAA reasons across iterations.
Fix this by determining which alias scopes are defined inside the loop,
and drop AATags that reference these scopes.
Fixes https://github.com/llvm/llvm-project/issues/79137.
(cherry picked from commit cd7ea4ea657ea41b42fcbd0e6b33faa46608d18e)
__ARM_STATE_ZA and __ARM_STATE_ZT0 are set when the compiler can parse
the "za" and "zt0" strings in the SME attributes.
__ARM_FEATURE_SME and __ARM_FEATURE_SME2 are set when the compiler can
generate code for attributes with "za" and "zt0" state, respectively.
__ARM_FEATURE_LOCALLY_STREAMING is set when the compiler supports the
__arm_locally_streaming attribute.
(cherry picked from commit 9e649518e6038a5b9ea38cfa424468657d3be59e)
Exclude some using-declarations in the module purview when compiling
with `-fno-char8_t`.
(cherry picked from commit dc4483659fc51890fdc732acc66a4dcda6e68047)
The masked symbols in SLEEF are incorrectly implemented as calls to non
masked variants, what only works fine for functions which do not modify
memory.
For vector variants which modify memory we can only use a non masked
symbols for now.
The SVE ArmPL mappings need to be removed for now as well.
(cherry picked from commit 0f26441cb83c1dea9aef12c748a79e3f38e3230a)
We recently noticed that the unwrap_iter.h file was pushing macros, but
it was pushing them again instead of popping them at the end of the
file. This led to libc++ basically swallowing any custom definition of
these macros in user code:
#define min HELLO
#include <algorithm>
// min is not HELLO anymore, it's not defined
While investigating this issue, I noticed that our push/pop pragmas were
actually entirely wrong too. Indeed, instead of pushing macros like
`move`, we'd push `move(int, int)` in the pragma, which is not a valid
macro name. As a result, we would not actually push macros like `move`
-- instead we'd simply undefine them. This led to the following code not
working:
#define move HELLO
#include <algorithm>
// move is not HELLO anymore
Fixing the pragma push/pop incantations led to a cascade of issues
because we use identifiers like `move` in a large number of places, and
all of these headers would now need to do the push/pop dance.
This patch fixes all these issues. First, it adds a check that we don't
swallow important names like min, max, move or refresh as explained
above. This is done by augmenting the existing
system_reserved_names.gen.py test to also check that the macros are what
we expect after including each header.
Second, it fixes the push/pop pragmas to work properly and adds missing
pragmas to all the files I could detect a failure in via the newly added
test.
rdar://121365472
(cherry picked from commit 7b4622514d232ce5f7110dd8b20d90e81127c467)
The interaction between --warn-backrefs was not tested, but if
--defsym-created reference causes archive member extraction, it seems
reasonable to suppress the diagnostic, which was the behavior before #78944.
(cherry picked from commit 9a1ca245c8bc60b1ca12cd906fb31130801d977e)
This function is used in `jitlink-check` lines in LIT tests. In #78371 I
missed to swap initial instruction bytes for systems that store the
constants as big-endian.
(cherry picked from commit 8a5bdd899f3cb57024d92b96c16e805ca9924ac7)
`OpaqueValueExpr` doesn't necessarily contain a source expression.
Particularly, after #78041, it is used to carry the type and the value
kind of a non-type template argument of floating-point type or referring
to a subobject (those are so called `StructuralValue` arguments).
This fixes#79575.
(cherry picked from commit ef67f63fa5f950f4056b5783e92e137342805d74)
The condition for allowing integer complex number support could also
allow neon fixed length complex numbers if +sve2 was specified. This
tightens the condition to only allow integer complex number support for
scalable vectors.
We could generalize this in the future to generate SVE intrinsics for
fixed-length vectors, but for the moment this opts for the simpler fix.
(cherry picked from commit 9520773c46777adbc1d489f831d6c93b8287ca0e)