When the mask bounds of a `vector.constant_mask` exactly equal the shape
of the vector, any transfer op consuming that mask will be unaffected by
it. Drop the mask in such cases.
It seems that some functions (.text.unlikely.xxx) may have zero size,
which
makes some builds with enabled assertions fail. Removing the assertion
and
extending one test to fix the build.
The sorting can process such zero-sized functions so no changes there
are needed
Change the separator in the `uniqueCGIdent` method to `X`. This change
is required to enable OpenMP offloading for the NVPTX target, as dots
are not valid identifiers in PTX and `uniqueCGIdent` is used to mangle
some literals. Follow up patches will change the remainder of `.`
appearances in names to `X` and add support for the NVPTX target.
Enable merging #71439 by removing a definitely-wrong usage of
std::unique_ptr<SmallVectorImpl<char>> as a return value with passing in
a SmallVectorImpl<char>&
Also change the following function to take ArrayRef<char> instead of
const SmalVectorImpl<char>& .
This reverts commit 578a4716f5.
This causes multiple issues. Compile time slowdown due to more path
canonicalization, and weird behavior on Windows.
Will reland under a separate flag `-f[no-]canonical-system-headers` to
match gcc in the future and further limit when it's passed by default.
Fixes#70011.
Function parameters marked with inreg are supposed to be allocated to
SGPRs. However, for compute functions, this is ignored and function
parameters are allocated to VGPRs. This fix modifies CC_AMDGPU_Func in
AMDGPUCallingConv.td to use SGPRs if input arg is marked inreg.
---------
Co-authored-by: Jun Wang <jun.wang7@amd.com>
1. Instead of using individual "boolean" macros, have an "enum" macro
`_LIBCPP_HARDENING_MODE`. This avoids issues with macros being
mutually exclusive and makes overriding the hardening mode within a TU
more straightforward.
2. Rename the safe mode to debug-lite.
This brings the code in line with the RFC:
https://discourse.llvm.org/t/rfc-hardening-in-libc/73925Fixes#65101
In this patch, we create a new ModulePass that mimics the LinkInModules
API from CodeGenAction.cpp, and a new command line option to enable the
pass. As part of the implementation, we needed to refactor the
BackendConsumer class definition into a new separate header (instead of
embedded in CodeGenAction.cpp). With this new pass, we can now re-link
bitcodes supplied via the -mlink-built-in bitcodes as part of the
RunOptimizationPipeline.
With the re-linking pass, we now handle cases where new device library
functions are introduced as part of the optimization pipeline.
Previously, these newly introduced functions (for example a fused sincos
call) would result in a linking error due to a missing function
definition. This new pass can be initiated via:
-mllvm -relink-builtin-bitcode-postop
Also note we intentionally exclude bitcodes supplied via the
-mlink-bitcode-file option from the second linking step
In 8244ff6739, I've introduced an
assertion that incorrectly used BasicBlock::empty(). Some basic blocks
may contain only pseudo instructions and thus BB->empty() will evaluate
to false, while the actual code size will be zero.
The DWARFUnitVector class lives inside of the DWARFContextState. Prior
to this fix a non const reference was being handed out to clients. When
fetching the DWO units, there used to be a "bool Lazy" parameter that
could be passed that would allow the DWARFUnitVector to parse individual
units on the fly. There were two major issues with this approach:
- not thread safe and causes crashes
- the accessor would check if DWARFUnitVector was empty and if not empty
it would return a partially filled in DWARFUnitVector if it was
constructed with "Lazy = true"
This patch fixes the issues by always fully parsing the DWARFUnitVector
when it is requested and only hands out a "const DWARFUnitVector &".
This allows the thread safety mechanism built into the DWARFContext
class to work corrrectly, and avoids the issue where if someone
construct DWARFUnitVector with "Lazy = true", and then calls an API that
partially fills in the DWARFUnitVector with individual entries, and then
someone accesses the DWARFUnitVector, they would get a partial and
incomplete listing of the DWARF units for the DWOs.
I received a couple of nullptr-deref crash reports with no line numbers
in this function. The way the function was written it was a bit
diffucult to keep track of when result_sp could be null, so this patch
simplifies the function to make it more obvious when a nullptr can be
contained in the variable.
The flag seems to be doing practically the same thing for zero cost and
pinned dma. In addition, the register host is not truly the right zero
cost mechanism according to Thomas. So we are simplifying the setup for
now, until we have a better definition for what to implement and test.
https://github.com/llvm/llvm-project/issues/64316
initCacheMaybe() will init all the size class arrays at once and it
doesn't have much work to do even if it supports partial initialization.
This avoids the call to initCacheMaybe in each allocate()/deallocate().
This allows testing the upcoming CMake 3.28 release in the CI. CMake
3.28 will have non-experimental support for C++20 modules. So this would
be a better CMake version for the modular builds.
The goal is to remove CMake 3.27 from the CI when the builder work
properly with 3.28.
This patch implements `std::basic_syncbuf` and `std::basic_osyncstream` as specified in paper p0053r7. ~~For ease of reviewing I am submitting this patch before submitting a patch for `std::basic_osyncstream`. ~~
~~Please note, this patch is not 100% complete. I plan on adding more tests (see comments), specifically I plan on adding tests for multithreading and synchronization.~~
Edit: I decided that it would be far easier for me to keep track of this and make changes that affect both `std::basic_syncbuf` and `std::basic_osyncstream` if both were in one patch.
The patch was originally written by @zoecarver
Implements
- P0053R7 - C++ Synchronized Buffered Ostream
- LWG-3127 basic_osyncstream::rdbuf needs a const_cast
- LWG-3334 basic_osyncstream move assignment and destruction calls basic_syncbuf::emit() twice
- LWG-3570 basic_osyncstream::emit should be an unformatted output function
- LWG-3867 Should std::basic_osyncstream's move assignment operator be noexcept?
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D67086
This reverts commit 957efa4ce4.
Original commit message below -- in this follow up, I've shifted
un-necessary inclusions of DebugProgramInstruction.h into being forward
declarations (fixes clang-compile time I hope), and a memory leak in the
DebugInfoTest.cpp IR unittests.
I also tracked a compile-time regression in D154080, more explanation
there, but the result of which is hiding some of the changes behind the
EXPERIMENTAL_DEBUGINFO_ITERATORS compile-time flag. This is tested by the
"new-debug-iterators" buildbot.
[DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info
This patch adds a variety of classes needed to record variable location
debug-info without using the existing intrinsic approach, see the rationale
at [0].
The two added files and corresponding unit tests are the majority of the
plumbing required for this, but at this point isn't accessible from the
rest of LLVM as we need to stage it into the repo gently. An overview is
that classes are added for recording variable information attached to Real
(TM) instructions, in the form of DPValues and DPMarker objects. The
metadata-uses of DPValues is plumbed into the metadata hierachy, and a
field added to class Instruction, which are all stimulated in the unit
tests. The next few patches in this series add utilities to convert to/from
this new debug-info format and add instruction/block utilities to have
debug-info automatically updated in the background when various operations
occur.
This patch was reviewed in Phab in D153990 and D154080, I've squashed them
together into this commit as there are dependencies between the two
patches, and there's little profit in landing them separately.
[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
This commit adds skewed distribution of iterations in
nonmonotonic:dynamic schedule (static steal) for hybrid systems when
thread affinity is assigned. Currently, it distributes the iterations at
60:40 ratio. Consider this loop with dynamic schedule type,
for (int i = 0; i < 100; ++i). In a hybrid system with 20 hardware
threads (16 CORE and 4 ATOM core), 88 iterations will be assigned to
performance cores and 12 iterations will be assigned to efficient cores.
Each thread with CORE core will process 5 iterations + extras and with
ATOM core will process 3 iterations.
Differential Revision: https://reviews.llvm.org/D152955
This adds scalable handling for scalable vectors in emitGEPOffset. This
was noticed in some tests that Biplob was creating, so might be unlikely
to come up much in practice. I've attempted to add test coverage for
various places EmitGEPOffset is called. The vscale intrinsics will
currently emit multiple copies, relying on later CSE to combine them.
This code was incorrectly checking that the CtxI has required FMF, but
the context instruction need not always be the instrinsic call.
Check that the intrinsic call has the required FMF.
Fixes PR71548.
We can assume that the minimal SVE register is 128-bit, when NEON is not
available. And we can lower the shuffle shuffle operation with one
operand to TBL1 SVE instruction.
Update addInfoForInductions to also check if the add-rec is for the
current loop. Otherwise we might add incorrect facts or crash.
Fixes a miscompile & crash introduced by 00396e6a1a.
- Revert "[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))`
to use rotate or to getter constants." - causes a miscompile, see
112e49b381 (commitcomment-131943923)
- Revert "[X86] Fix gcc warning about mix of enumeral and non-enumeral
types. NFC", which fixes a compiler warning in the commit above
GCC supports building individual functions with backchain using the
__attribute__((target("backchain"))) syntax, and Clang should too.
Clang translates this into the "target-features"="+backchain" attribute,
and the -mbackchain command-line option into the "backchain" attribute.
The backend currently checks only the latter; furthermore, the backchain
target feature is not defined.
Handle backchain like soft-float. Define a target feature, convert
function attribute into it in getSubtargetImpl(), and check for target
feature instead of function attribute everywhere. Add an end-to-end test
to the Clang testsuite.
When hovering over variables larger than 64 bits, with more than 64
active bits, there were assertion failures since Hover is trying to
print the value as a 64-bit hex value.
There is already protection avoiding to call printHex if there is more
than 64 significant bits. And we already truncate and print negative
values using only 32 bits, when possible. So we can simply truncate
values with more than 64 bits to avoid the assert when using
getZExtValue. The result will be that for example a negative 128 bit
variable is printed using 64 bits, when possible.
There is still no support for printing more than 64 bits. That would
involve more changes since for example llvm::FormatterNumber is limited
to 64 bits.
This is a second attempt at landing this patch. Now with protection
to ensure we use a triple that supports __int128_t.