The test shows a mis-compile where @test gets incorrectly simplified to
unreachable. The test case is reduced from a ThinLTO build of Clang,
with only the relevant pass sequence included.
This patch tries to reduce the size of the debug_loclist section by
replacing the DW_LLE_start_length opcodes currently emitted by dsymutil
in favor of using DW_LLE_base_address + DW_LLE_offset_pair instead.
The DW_LLE_start_length is one AddressSize followed by a ULEB per entry,
whereas, the DW_LLE_base_address + DW_LLE_offset_pair will use one
AddressSize for the base address, and then the DW_LLE_offset_pair is a
pair of ULEBs. This will be more efficient where a loclist fragment has
many entries.
Differential Revision: https://reviews.llvm.org/D153080
Sometimes LLVM generates branch to return instruction, like PR63227.
It is because in function MachineBlockPlacement::canTailDuplicateUnplacedPreds
we avoid duplicating a BB into another already placed BB to prevent destroying
computed layout. But if the successor BB is a return block, duplicating it will
only reduce taken branches without hurt to any other branches.
Differential Revision: https://reviews.llvm.org/D153093
PTX does not have a notion of `unreachable`, which results in emitted basic
blocks having an edge to the next block:
```
block1:
call @does_not_return();
// unreachable
block2:
// ptxas will create a CFG edge from block1 to block2
```
This may result in significant changes to the control flow graph, e.g., when
LLVM moves unreachable blocks to the end of the function. That's a problem
in the context of divergent control flow, as `ptxas` uses the CFG to determine
divergent regions, while some intructions may not be executed divergently.
For example, `bar.sync` is not allowed to be executed divergently on Pascal
or earlier. If we start with the following:
```
entry:
// start of divergent region
@%p0 bra cont;
@%p1 bra unlikely;
...
bra.uni cont;
unlikely:
...
// unreachable
cont:
// end of divergent region
bar.sync 0;
bra.uni exit;
exit:
ret;
```
it is transformed by the branch-folder and block-placement passes to:
```
entry:
// start of divergent region
@%p0 bra cont;
@%p1 bra unlikely;
...
bra.uni cont;
cont:
bar.sync 0;
bra.uni exit;
unlikely:
...
// unreachable
exit:
// end of divergent region
ret;
```
After moving the `unlikely` block to the end of the function, it has an edge
to the `exit` block, which widens the divergent region and makes the `bar.sync`
instruction happen divergently. That causes wrong computations, as we've been
running into for years with Julia code (which emits a lot of `trap` +
`unreachable` code all over the place).
To work around this, add an `exit` instruction before every `unreachable`,
as `ptxas` understands that exit terminates the CFG. Note that `trap` is not
equivalent, and only future versions of `ptxas` will model it like `exit`.
Another alternative would be to emit a branch to the block itself, but emitting
`exit` seems like a cleaner solution to represent `unreachable` to me.
Also note that this may not be sufficient, as it's possible that the block
with unreachable control flow is branched to from different divergent regions,
e.g. after block merging, in which case it may still be the case that `ptxas`
could reconstruct a CFG where divergent regions are merged (I haven't confirmed
this, but also haven't encountered this pattern in the wild yet):
```
entry:
// start of divergent region 1
@%p0 bra cont1;
@%p1 bra unlikely;
bra.uni cont1;
cont1:
// intended end of divergent region 1
bar.sync 0;
// start of divergent region 2
@%p2 bra cont2;
@%p3 bra unlikely;
bra.uni cont2;
cont2:
// intended end of divergent region 2
bra.uni exit;
unlikely:
...
exit;
exit:
// possible end of merged divergent region?
```
I originally tried to avoid the above by cloning paths towards `unreachable` and
splitting the outgoing edges, but that quickly became too complicated. I propose
we go with the simple solution first, also because modern GPUs with more flexible
hardware thread schedulers don't even suffer from this issue.
Finally, although I expect this to fix most of
https://bugs.llvm.org/show_bug.cgi?id=27738, I do still encounter
miscompilations with Julia's unreachable-heavy code when targeting these
older GPUs using an older `ptxas` version (specifically, from CUDA 11.4 or
below). This is likely due to related bugs in `ptxas` which have been fixed
since, as I have filed several reproducers with NVIDIA over the past couple of
years. I'm not inclined to look into fixing those issues over here, and will
instead be recommending our users to upgrade CUDA to 11.5+ when using these GPUs.
Also see:
- https://github.com/JuliaGPU/CUDAnative.jl/issues/4
- https://github.com/JuliaGPU/CUDA.jl/issues/1746
- https://discourse.llvm.org/t/llvm-reordering-blocks-breaks-ptxas-divergence-analysis/71126
Reviewed By: jdoerfert, tra
Differential Revision: https://reviews.llvm.org/D152789
This patch should address the failure of TestStackCoreScriptedProcess
that is happening specifically on x86_64.
It turns out that in 1370a1cb5b97, I changed the way we extract integers
from a `StructuredData::Dictionary` and in order to get a stop info from
the scripted process, we call a method that returns a `SBStructuredData`
containing the stop reason data.
TestStackCoreScriptedProcess` was failing specifically on x86_64 because
the stop info dictionary contains the signal number, that the `Scripted
Thread` was trying to extract as a signed integer where it was actually
parsed as an unsigned integer. That caused `GetValueForKeyAsInteger` to
return the default value parameter, `LLDB_INVALID_SIGNAL_NUMBER`.
This patch address the issue by extracting the signal number with the
appropriate type and re-enables the test.
Differential Revision: https://reviews.llvm.org/D152848
Signed-off-by: Med Ismail Bennani <ismail@bennani.ma>
The COFFAsmParser (to my surprise) didn't support the .pushsection and
.popsection directives. These directives aren't directly useful, however for
frontends that have inline asm support this is really useful. Rust in
particular, has support for inline asm, which can be used together with these
directives to "emulate" features like static generics. This patch adds support
for the two mentioned directives.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D152085
When LLDB needs to access a debug section, it generally calls
SectionList::FindSectionByType with the corresponding type (we have one type for
each DWARF section). However, the missing entries made some sections be
classified as "eSectionTypeOther", which makes all calls to `FindSectionByType`
fail.
With this patch, a check-lldb build with
`-DLLDB_TEST_USER_ARGS=--dwarf-version=5` reports a much lower number of
failures:
Unsupported : 327
Passed : 2423
Expectedly Failed: 16
Unresolved : 2
Failed : 52
This is down from previously 400~ failures.
Differential Revision: https://reviews.llvm.org/D153433
This reverts commit f55fd19b6b565827af5fbf504952dcc35b8b7360.
As noted on the original thread, other uses of LLVM_LIBRARY_OUTPUT_INTDIR are optional. Will make a separate patch that makes this use optional as well.
It seems just replacing the operation was not replacing all of the uses
when the types of the expression before and after this pass differ (due
to differing shape information). Now the shape information is always
kept the same.
This fixes https://github.com/llvm/llvm-project/issues/63399
Differential Revision: https://reviews.llvm.org/D153333
The bytecode writer config was heap-allocated, but was never freed, causing ASAN errors.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D153440
This patch prepares the RPC interface to be installed. We place this in
the existing `llvm-gpu-none` directory as it will also give us access to
the generated `libc` headers for the opcodes.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D153040
This avoids undefs from being expanded to a build vector of zeroes.
As noted by @craig.topper in D153399
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D153411
posix_compat.h uses struct timeval which is defined in <sys/time.h>
but it doesn't include it. On most POSIX platforms like Linux or macOS,
that headers is transitively included by other headers like <sys/stat.h>,
but there are other platforms where this is not the case.
Differential Revision: https://reviews.llvm.org/D153384
A build_vector is the canonical representation rather than multiple
insert_vector_elts.
Unfortunately, this regresses quite a few tests now primarily due to not
having a vmv.s.x special case, but I hope we can improve this with future
patches.
Stress testing in our downstream found an infinite loop in DAG combine.
This patch breaks the infinite loop.
The insert_vector_element chain starts with a fixed vector undef.
Fixed vector undef is currently expanded to a build_vector of 0s
which gets lowered to a vmv.v.i. The insert chain overwrites all
elements so SimplifyDemandedVectorElts turns the vmv.v.i back into
undef and the cycle repeats.
We probably should custom lower fixed vector undef to scalable
vector undef. I think that would also fix the infinite loop, but
I didn't test that.
Reviewed By: luke
Differential Revision: https://reviews.llvm.org/D153399
They will demonstrate some symbol that --adjust-vma= should not adjust.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D153401
The definition for ISD::EXTRACT_SUBVECTOR says the index must be
aligned to the known minimum elements of the extracted type. We mostly
got away with this but it turns out there are places that depend on this.
For example, this code in getNode for ISD::EXTRACT_SUBVECTOR
```
// EXTRACT_SUBVECTOR of CONCAT_VECTOR can be simplified if the pieces of
// the concat have the same type as the extract.
if (N1.getOpcode() == ISD::CONCAT_VECTORS && N1.getNumOperands() > 0 &&
VT == N1.getOperand(0).getValueType()) {
unsigned Factor = VT.getVectorMinNumElements();
return N1.getOperand(N2C->getZExtValue() / Factor);
}
```
This depends on N2C->getZExtValue() being evenly divisible by Factor.
Reviewed By: luke
Differential Revision: https://reviews.llvm.org/D153380
This reverts commit 9119325a5666e557a19f38a05525578b556c215b.
A buildbot is broken, probably because of this change breaking the
SHARED_LIBS=ON build more.
The instructions in the documentation only mentioned how to include
bindings for clang-format into vim using python2. Add the instructions
for python3 which were already present in the source comments.
Differential Revision: https://reviews.llvm.org/D153338
Change-Id: I25fdbd36f0c7e745061908be8e26f68cb31c7dd5
The behaviors of violating assume instruction or !nonnull metadata is
different. The former is immediate undefined behavior, but the latter is
returning poison value. This patch adds !noundef to trigger immediate
undefined behavior if !nonnull is violated.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D153400
There are multiple places in the code where the type of memory being accessed from an instruction needs to be obtained, including an upcoming patch to improve GEP cost modeling. This deduplicates the logic between them. It's not strictly NFC as EarlyCSE/LoopStrengthReduce may catch more intrinsics now.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D150583
This reverts commit c4fea3905617af89d1ad87319893e250f5b72dd6.
I am reverting this for now until I figure out how to fix
the build bot errors and warnings.
Errors:
llvm-project/lld/ELF/Arch/ARM.cpp:1300:29: error: expected primary-expression before ‘>’ token
osec->writeHeaderTo<ELFT>(++sHdrs);
Warnings:
llvm-project/lld/ELF/Arch/ARM.cpp:1306:31: warning: left operand of comma operator has no effect [-Wunused-value]
Implement XCVmac intrinsics for CV32E40P according to the specification.
This is the first commit of a patch-set to upstream the 7 vendor specific extensions of CV32E40P.
The patch-set aims at upstreaming the extensions on MC. The following will be on CodeGen, and the final patch-set will be on builtins if possible. The implemented version is on [0].
Contributors: @CharKeaney, Serkan Muhcu, @jeremybennett, @lewis-revill, @liaolucy, @simoncook, @xmj
Spec: 62bec66b36/docs/source/instruction_set_extensions.rst
[0] https://github.com/openhwgroup/corev-llvm-project
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D152821
This commit provides linker support for Cortex-M Security Extensions (CMSE).
The specification for this feature can be found in ARM v8-M Security Extensions:
Requirements on Development Tools.
The linker synthesizes a security gateway veneer in a special section;
`.gnu.sgstubs`, when it finds non-local symbols `__acle_se_<entry>` and `<entry>`,
defined relative to the same text section and having the same address. The
address of `<entry>` is retargeted to the starting address of the
linker-synthesized security gateway veneer in section `.gnu.sgstubs`.
In summary, the linker translates input:
```
.text
entry:
__acle_se_entry:
[entry_code]
```
into:
```
.section .gnu.sgstubs
entry:
SG
B.W __acle_se_entry
.text
__acle_se_entry:
[entry_code]
```
If addresses of `__acle_se_<entry>` and `<entry>` are not equal, the linker
considers that `<entry>` already defines a secure gateway veneer so does not
synthesize one.
If `--out-implib=<out.lib>` is specified, the linker writes the list of secure
gateway veneers into a CMSE import library `<out.lib>`. The CMSE import library
will have 3 sections: `.symtab`, `.strtab`, `.shstrtab`. For every secure gateway
veneer <entry> at address `<addr>`, `.symtab` contains a `SHN_ABS` symbol `<entry>` with
value `<addr>`.
If `--in-implib=<in.lib>` is specified, the linker reads the existing CMSE import
library `<in.lib>` and preserves the entry function addresses in the resulting
executable and new import library.
Reviewed By: MaskRay, peter.smith
Differential Revision: https://reviews.llvm.org/D139092
Whether we include operator new and delete into libc++ has always
been a build time setting, and piggy-backing on a macro like
_LIBCPP_DISABLE_NEW_DELETE_DEFINITIONS is inconsistent with how
we handle similar cases for e.g. LIBCXX_ENABLE_RANDOM_DEVICE. Instead,
simply avoid including new.cpp in the sources of the library when we
do not wish to include these operators in the build.
This also makes us much closer to being able to share the definitions
between libc++ and libc++abi, since we could technically build those
definitions into a standalone static library and decide whether we link
it into libc++abi.dylib or libc++.dylib.
Differential Revision: https://reviews.llvm.org/D153272
The failing test comes from https://reviews.llvm.org/D152570.
Root cause of the failure is that a string constant on SystemZ
has an alignment of 2, not 1. The CSKY target has a similar problem.
The solution is to replace the fixed number with a regex.
Reviewed By: uweigand, tuliom, Zibi
Differential Revision: https://reviews.llvm.org/D153352
Once integrated in our codebase the patch triggered a bunch of failing
tests. We do not yet understand where the bug is but we revert it to
move forward with integration.
This reverts commit 5e32765c15ab8df3d2635a2bb5078c5b1d5714d5.
This one is a bit twisted. Some platforms don't have support for
exiting in a clean manner, so they don't provide std::exit(). As
a result, defining `terminate_successful()` on those platforms won't
work, and the PSTL tests that rely on `terminate_successful()` also
won't work.
However, we don't have a notion of "no clean termination" in libc++,
so we can't properly guard this. Since embedded platforms that don't
support clean termination usually also don't enable exceptions, we
don't need to be able to run those `terminate_successful` PSTL tests,
and guarding the definition of `terminate_successful` with
TEST_HAS_NO_EXCEPTIONS works pretty well.
This is kind of a hack for the lack of having a concept of "no clean
termination" in the library and in the test suite.
Differential Revision: https://reviews.llvm.org/D153302
AMDGPUAtomicOptimizer updates the dominator tree whenever
it modified the control flow. Therefore preserving the
analysis similar to legacy PM.
Reviewed By: arsenm, yassingh, #amdgpu
Differential Revision: https://reviews.llvm.org/D153349