This allows testing the upcoming CMake 3.28 release in the CI. CMake
3.28 will have non-experimental support for C++20 modules. So this would
be a better CMake version for the modular builds.
The goal is to remove CMake 3.27 from the CI when the builder work
properly with 3.28.
This patch implements `std::basic_syncbuf` and `std::basic_osyncstream` as specified in paper p0053r7. ~~For ease of reviewing I am submitting this patch before submitting a patch for `std::basic_osyncstream`. ~~
~~Please note, this patch is not 100% complete. I plan on adding more tests (see comments), specifically I plan on adding tests for multithreading and synchronization.~~
Edit: I decided that it would be far easier for me to keep track of this and make changes that affect both `std::basic_syncbuf` and `std::basic_osyncstream` if both were in one patch.
The patch was originally written by @zoecarver
Implements
- P0053R7 - C++ Synchronized Buffered Ostream
- LWG-3127 basic_osyncstream::rdbuf needs a const_cast
- LWG-3334 basic_osyncstream move assignment and destruction calls basic_syncbuf::emit() twice
- LWG-3570 basic_osyncstream::emit should be an unformatted output function
- LWG-3867 Should std::basic_osyncstream's move assignment operator be noexcept?
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D67086
This reverts commit 957efa4ce4.
Original commit message below -- in this follow up, I've shifted
un-necessary inclusions of DebugProgramInstruction.h into being forward
declarations (fixes clang-compile time I hope), and a memory leak in the
DebugInfoTest.cpp IR unittests.
I also tracked a compile-time regression in D154080, more explanation
there, but the result of which is hiding some of the changes behind the
EXPERIMENTAL_DEBUGINFO_ITERATORS compile-time flag. This is tested by the
"new-debug-iterators" buildbot.
[DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info
This patch adds a variety of classes needed to record variable location
debug-info without using the existing intrinsic approach, see the rationale
at [0].
The two added files and corresponding unit tests are the majority of the
plumbing required for this, but at this point isn't accessible from the
rest of LLVM as we need to stage it into the repo gently. An overview is
that classes are added for recording variable information attached to Real
(TM) instructions, in the form of DPValues and DPMarker objects. The
metadata-uses of DPValues is plumbed into the metadata hierachy, and a
field added to class Instruction, which are all stimulated in the unit
tests. The next few patches in this series add utilities to convert to/from
this new debug-info format and add instruction/block utilities to have
debug-info automatically updated in the background when various operations
occur.
This patch was reviewed in Phab in D153990 and D154080, I've squashed them
together into this commit as there are dependencies between the two
patches, and there's little profit in landing them separately.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
This commit adds skewed distribution of iterations in
nonmonotonic:dynamic schedule (static steal) for hybrid systems when
thread affinity is assigned. Currently, it distributes the iterations at
60:40 ratio. Consider this loop with dynamic schedule type,
for (int i = 0; i < 100; ++i). In a hybrid system with 20 hardware
threads (16 CORE and 4 ATOM core), 88 iterations will be assigned to
performance cores and 12 iterations will be assigned to efficient cores.
Each thread with CORE core will process 5 iterations + extras and with
ATOM core will process 3 iterations.
Differential Revision: https://reviews.llvm.org/D152955
This adds scalable handling for scalable vectors in emitGEPOffset. This
was noticed in some tests that Biplob was creating, so might be unlikely
to come up much in practice. I've attempted to add test coverage for
various places EmitGEPOffset is called. The vscale intrinsics will
currently emit multiple copies, relying on later CSE to combine them.
This code was incorrectly checking that the CtxI has required FMF, but
the context instruction need not always be the instrinsic call.
Check that the intrinsic call has the required FMF.
Fixes PR71548.
We can assume that the minimal SVE register is 128-bit, when NEON is not
available. And we can lower the shuffle shuffle operation with one
operand to TBL1 SVE instruction.
Update addInfoForInductions to also check if the add-rec is for the
current loop. Otherwise we might add incorrect facts or crash.
Fixes a miscompile & crash introduced by 00396e6a1a.
- Revert "[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))`
to use rotate or to getter constants." - causes a miscompile, see
112e49b381 (commitcomment-131943923)
- Revert "[X86] Fix gcc warning about mix of enumeral and non-enumeral
types. NFC", which fixes a compiler warning in the commit above
GCC supports building individual functions with backchain using the
__attribute__((target("backchain"))) syntax, and Clang should too.
Clang translates this into the "target-features"="+backchain" attribute,
and the -mbackchain command-line option into the "backchain" attribute.
The backend currently checks only the latter; furthermore, the backchain
target feature is not defined.
Handle backchain like soft-float. Define a target feature, convert
function attribute into it in getSubtargetImpl(), and check for target
feature instead of function attribute everywhere. Add an end-to-end test
to the Clang testsuite.
When hovering over variables larger than 64 bits, with more than 64
active bits, there were assertion failures since Hover is trying to
print the value as a 64-bit hex value.
There is already protection avoiding to call printHex if there is more
than 64 significant bits. And we already truncate and print negative
values using only 32 bits, when possible. So we can simply truncate
values with more than 64 bits to avoid the assert when using
getZExtValue. The result will be that for example a negative 128 bit
variable is printed using 64 bits, when possible.
There is still no support for printing more than 64 bits. That would
involve more changes since for example llvm::FormatterNumber is limited
to 64 bits.
This is a second attempt at landing this patch. Now with protection
to ensure we use a triple that supports __int128_t.
This PR improves and cleans-up verifiers of TmaCreateDescriptor and
TmaAsyncLoad Ops and unifies them.
The PR verifiers followings that didn't before:
- address space
- rank match between descriptor and memref
- element type match between descriptor and memref
- shape type match between descriptor and memref
BOLT currently hooks its its instrumentation finalization function via
`DT_FINI`. However, this method of calling finalization routines is not
supported anymore on newer ABIs like RISC-V. `DT_FINI_ARRAY` is
preferred there.
This patch adds support for hooking into `DT_FINI_ARRAY` instead if the
binary does not have a `DT_FINI` entry. If it does, `DT_FINI` takes
precedence so this patch should not change how the currently supported
instrumentation targets behave.
`DT_FINI_ARRAY` points to an array in memory of `DT_FINI_ARRAYSZ` bytes.
It consists of pointer-length entries that contain the addresses of
finalization functions. However, the addresses are only filled-in by the
dynamic linker at load time using relative relocations. This makes
hooking via `DT_FINI_ARRAY` a bit more complicated than via `DT_FINI`.
The implementation works as follows:
- While scanning the binary: find the section where `DT_FINI_ARRAY`
points to, read its first dynamic relocation and use its addend to find
the address of the fini function we will use to hook;
- While writing the output file: overwrite the addend of the dynamic
relocation with the address of the runtime library's fini function.
Updating the dynamic relocation required a bit of boiler plate: since
dynamic relocations are stored in a `std::multiset` which doesn't
support getting mutable references to its items, functions were added to
`BinarySection` to take an existing relocation and insert a new one.
When hovering over variables larger than 64 bits, with more than 64
active bits, there were assertion failures since Hover is trying to
print the value as a 64-bit hex value.
There is already protection avoiding to call printHex if there is more
than 64 significant bits. And we already truncate and print negative
values using only 32 bits, when possible. So we can simply truncate
values with more than 64 bits to avoid the assert when using
getZExtValue. The result will be that for example a negative 128 bit
variable is printed using 64 bits, when possible.
There is still no support for printing more than 64 bits. That would
involve more changes since for example llvm::FormatterNumber is limited
to 64 bits.
In commit 5a9a02f67b scalar evolution got support for
computing SCEV:s for (ashr(add(shl(x, n), c), m)) constructs. The code
however used APInt::getZExtValue without first checking that the APInt
would fit inside an uint64_t. When for example using 128-bit types we
ended up in assertion failures (or maybe miscompiles in non-assert
builds).
This patch simply avoid converting from APInt to uint64_t when creating
the truncated constant. We can just truncate the APInt instead.
Enums without enumerators (empty) are now excluded from analysis as it's
not possible to peroperly determinate new narrowed type, and such enums
can be used in diffrent way, like as strong-types.
Closes#71544
The contents of which are mostly SPSR_EL1 as shown in the Arm manual,
with a few adjustments for things Linux says userspace shouldn't concern
itself with.
```
(lldb) register read cpsr
cpsr = 0x80001000
= (N = 1, Z = 0, C = 0, V = 0, SS = 0, IL = 0, ...
```
Some fields are always present, some depend on extensions. I've checked
for those extensions using HWCAP and HWCAP2.
To provide this for core files and live processes I've added a new class
LinuxArm64RegisterFlags. This is a container for all the registers we'll
want to have fields and handles detecting fields and updating register
info.
This is used by the native process as follows:
* There is a global LinuxArm64RegisterFlags object.
* The first thread takes a mutex on it, and updates the fields.
* Subsequent threads see that detection is already done, and skip it.
* All threads then update their own copy of the register information
with pointers to the field information contained in the global object.
This means that even though every thread will have the same fields, we
only detect them once and have one copy of the information.
Core files instead have a LinuxArm64RegisterFlags as a member, because
each core file could have different saved capabilities. The logic from
there is the same but we get HWACP values from the corefile note.
This handler class is Linux specific right now, but it can easily be
made more generic if needed. For example by using LLVM's FeatureBitset
instead of HWCAPs.
Updating register info is done with string comparison, which isn't
ideal. For CPSR, we do know the register number ahead of time but we do
not for other registers in dynamic register sets. So in the interest of
consistency, I'm going to use string comparison for all registers
including cpsr.
I've added tests with a core file and live process. Only checking for
fields that are always present to account for CPU variance.
Looking through inttoptr / ptrtoint intermixed with GEPs is very
questionable from a provenance perspective. We also don't seem to
have any test coverage that shows this is useful (apart from one
test I added to guard against a crash).
Track the live register state immediately before, instead of after,
MBBI. This makes it simple to track the state at the start or end of a
basic block without a separate (and poorly named) Tracking flag.
This changes the API of the backward(MachineBasicBlock::iterator I)
method, which now recedes to the state just before, instead of just
after, *I. Some clients are simplified by this change.
There is one small functional change shown in the lit tests where
multiple spilled registers all need to be reloaded before the same
instruction. The reloads will now be inserted in the opposite order.
This should not affect correctness.
Operands of `__builtin_amdgcn_update_dpp` need to evaluate to constant
to match the intrinsic requirements.
Fixes: SWDEV-426822, SWDEV-431138
---------
Authored-by: Pravin Jagtap <Pravin.Jagtap@amd.com>