If we know that x is nonzero and not a power of 2, then
llvm::findLastSet(x) + 1 is the index of the bit just above the
highest set bit in x. That is, 1 << (llvm::findLastSet(x) + 1) is the
same as llvm::bit_ceil(x).
Since llvm::bit_ceil is a nop on a power of 2, we can unconditionally
call llvm::bit_ceil. The end result actually matches the comment.
If x is known to be nonzero, findLastSet(x) returns the index of the
highest set bit counting from the LSB, so 1 << findLastSet(x) is the
same as llvm::bit_floor(x).
Sometimes memory addresses are treated as immediate values. Thus
immediate operands have to be relocatable.
Differential Revision: https://reviews.llvm.org/D137902
32-bit immediates require special cares because they go across the
normal word (16 bits) boundaries.
This patch also fixes some incorrect disassembler test cases.
Differential Revision: https://reviews.llvm.org/D142080
Just like the encoder directive for variable-length instructions, this
patch adds a new decoder directive to allow custom decoder function on
an operand.
Right now, due to the design of DecoderEmitter each operand can only
have a single custom decoder in a given instruction.
Differential Revision: https://reviews.llvm.org/D142079
zero-call-used-regs pass generate an xor instruction to help mitigate
return-oriented programming exploits via zeroing out used registers. But
in this below test case with -g option there is dbg.value instruction
associating the register with the debug-info description of the formal
parameter d, which makes the register appear used, therefore it zero the
register edi in -g case and makes binary different from without -g option.
The pass should be looking only at the non-debug uses.
$ cat test.c
char a[];
int b;
__attribute__((zero_call_used_regs("used"))) char c(int d) {
*a = ({
int e = d;
b;
});
}
This fixes https://github.com/llvm/llvm-project/issues/57962.
Differential Revision: https://reviews.llvm.org/D138757
This reverts commit 4240c91462.
The current solution won't work since getLocalOrGlobal does not
support returning a vector. More work needs to be put into
ensuring both the local and global way of setting the options
are available during the transition period.
As discussed in https://github.com/llvm/llvm-project/issues/59901
This change is not NFC. There is one SCEV and EarlyCSE test that have an
improved analysis/optimization case. Rest of the tests are not failing.
I've mostly only added cleanup to SCEV since that is where this issue
started. As a follow up, I believe there is more cleanup opportunity in
SCEV and other affected passes.
There could be cases where there are missed registerAssumption of
guards, but this case is not so bad because there will be no
miscompilation. AssumptionCacheTracker should take care of deleted
guards.
Differential Revision: https://reviews.llvm.org/D142330
In Clang, in order to determine the type of `omp_allocator_handle_t`, Clang
checks the type of those predefined allocators. The first one it checks is
`omp_null_allocator`. If the language is C, and the system is 64-bit, what Clang
gets is a `int`, instead of an enum of size 8, given the fact how we define
`omp_allocator_handle_t` in `omp.h`. If the allocator is captured by a region,
let's say a parallel region, the allocator will be privatized. Because Clang deems
`omp_allocator_handle_t` as an `int`, it will first cast the value returned by
the runtime library (for `libomp` it is a `void *`) to `int`, and then in the
outlined function, it casts back to `omp_allocator_handle_t`. This two casts
completely shaves the first 32-bit of the pointer value returned from `libomp`,
and when the private "new" pointer is fed to another runtime function
`__kmpc_allocate()`, it causes segment fault. That is the root cause of PR54082.
I have no idea why `-fno-pic` could hide this bug.
In this patch, we detect `omp_allocator_handle_t` using roughly the same method
as `omp_event_handle_t`, by looking it up into the identifier table.
Fix#54082.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D142297
LLDB only supports Python3 now, so the `six` shim for Python2 is no longer necessary.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D142140
Otherwise we may be inserting a decl into a DeclContext that's not fully defined yet.
This simplifies/removes some clang AST node creation code. Instead, use
clang::printTemplateArgumentList().
Reviewed By: Michael137
Differential Revision: https://reviews.llvm.org/D142413
When `libomp` is initialized, it creates a temp file in `/dev/shm` to store
registration flag. Some systems, like Android, don't have `/dev/shm`, then this
feature is disabled by the macro `KMP_USE_SHM`, though most Linux distributions
have that. However, some customized distribution, such as the one reported in
https://github.com/llvm/llvm-project/issues/53955, doesn't support it either.
It causes a core dump. In this patch, if it is the case, we will try to create a
temporary file in `/tmp`, and if it still doesn't make it, then we error out.
Note that we don't consider in this patch if the temporary directory has been
set to `TMPDIR` in this patch. If `/tmp` is not accessible, we error out.
Fix#53955.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142175
This changes the mechanism for verbose termination (again!) to make it
support compile-time customization in addition to link-time customization,
which is important for users who need fine-grained control over what code
gets generated around sites that call the verbose termination handler.
This concern had been raised to me both privately by prospecting users
and in https://llvm.org/D140944, so I think it is clearly worth fixing.
We still support _LIBCPP_AVAILABILITY_CUSTOM_VERBOSE_ABORT_PROVIDED for
a limited time since the same functionality can be achieved by overriding
the _LIBCPP_VERBOSE_ABORT macro.
Differential Revision: https://reviews.llvm.org/D141326
The stage1 and stage2 builds aren't packaged, so we only need to build
enough of the toolchain to build the next phase.
Reviewed By: thieta, amyk
Differential Revision: https://reviews.llvm.org/D141552
This patch remove xfail decorator from
builtins/Unit/trampoline_setup_test.c as it is passing on Windows/AArch64
nowz. It is being skipped in code with __clang__ not defined.
https://lab.llvm.org/buildbot/#/builders/120/builds/3873
If we're extracting an element and inserting into a undef vector
with the same number of elements, we can use the original vector.
This pattern occurs around reductions that have been cascaded
together.
This can be generalized to wider/narrow vectors by using
insert_subvector/extract_subvector, but we don't have lit tests
for that case currently.
We can also support non-undef before by using a slide or vmv.v.v
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D142264
This is what ld64 does, and also what we already do for most of the
other load commands. I'm not aware of a good way to test this, but I
don't think it really matters.
Differential Revision: https://reviews.llvm.org/D141462
2db6b34ea introduces circular dependency on llvm::ArrayRef. By
inspecting commit history, it appears that we have some issue using
deduction guide on std::array. Why don't we try std::array with explicit
template arguments?
Differential revision: https://reviews.llvm.org/D141352
The check for "no SOURCE_DATE_EPOCH" wasn't especially interesting, and
I am not aware of a _portable_ way to unset and environment variable in
a lit test. So remove it since it can fail if the build environment has
SOURCE_DATE_EPOCH set globally.
Differential Revision: https://reviews.llvm.org/D142511
In some binaries produced with ThinLTO there are CUs that share entry in
.debug_addr. Before we would generate a new entry for each. Which lead to binary
size increase. This changes the behavior so that we re-use entries in
.debug_addr.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D142425
By default, ASan generates an asan.module_ctor function that initializes asan and
registers the globals in the module. This function is added to the
@llvm.global_ctors array. Previously, there was no way to control the
generation of this function.
This patch adds a way to control the generation of this function. The
flag -asan-constructor-kind has two options:
global: This is the default option and the default behavior of ASan. It generates an
asan.module_ctor function.
none: This skips the generation of the asan.module_ctor function.
rdar://104448572
Differential revision: https://reviews.llvm.org/D142505
This patch implements the memory lock/unlock API, introduced in patch https://reviews.llvm.org/D139208,
in the NextGen plugins. Locked buffers feature reference counting and we allow certain overlapping. Given
an already locked buffer A, other buffers that are fully contained inside A can be locked again, even if
they are smaller than A. In this case, the reference count of locked buffer A will be incremented. However,
extending an existing locked buffer is not allowed. The original buffer is actually unlocked once all its
users have released the locked buffer and sub-buffers (i.e., the reference counter becomes zero).
Differential Revision: https://reviews.llvm.org/D141227
The AMDGPU target can only emit LLVM-IR, so we can always rely on LTO to
link the static version of the runtime optimally. Using the static
library only has a few advantages. Namely, it avoids several known bugs
and allows us to optimize out more functions. This is legal since the
changes in D142486 and D142484
Depends on D142486 D142484
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142491
Currently we have two versions of the static library. One is built as
individual bitcode files and linked via `-mlink-builtin-bitcode`. The
other is built as a single static archive `omptarget.devicertl.a` and is
linked via `-lomptarget.devicertl` and handled by the linker wrapper
during LTO. We use the former in the case that we are not performing
LTO, because linking the library late wouldn't allow us to optimize the
runtime library effectively. The support in D142484 allows us to
unconditionally link this library, so it will only be pulled in if
needed. That is, if we linked already via `-mlink-builtin-bitcode` then
we will not pull in the static library even if it's linked on the
command line.
Depends on D142484
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D142486
Currently, we pull in every single static archive member as long as we
have an offloading architecture that requires it. This goes against the
standard sematnics of static libraries that only pull in symbols that
define currently undefined symbols. In order to support this we roll
some custom symbol resolution logic to check if a static library is
needed. Because of offloading semantics, this requires an extra check
for externally visibile symbols. E.g. if a static member defines a
kernel we should import it.
The main benefit to this is that we can now link against the
`libomptarget.devicertl.a` library unconditionally. This removes the
requirement for users to specify LTO on the link command. This will also
allow us to stop using the `amdgcn` bitcode versions of the libraries.
```
clang foo.c -fopenmp --offload-arch=gfx1030 -foffload-lto -c
clang foo.o -fopenmp --offload-arch=gfx1030 -foffload-lto
```
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D142484
AVX/AVX512 instructions may cause frequency drop on e.g. Skylake.
The magnitude of frequency/performance drop depends on instruction
(multiplication vs load/store) and vector width. Currently users,
that want to avoid this drop can specify -mprefer-vector-width=128.
However this also prevents generations of 256-bit wide instructions,
that have no associated frequency drop (mainly load/stores).
Add a tuning flag that allows generations of 256-bit AVX load/stores,
even when -mprefer-vector-width=128 is set, to speed-up memcpy&co.
Verified that running memcpy loop on all cores has no frequency impact
and zero CORE_POWER:LVL[12]_TURBO_LICENSE perf counters.
Makes coping memory faster e.g.:
BM_memcpy_aligned/256 80.7GB/s ± 3% 96.3GB/s ± 9% +19.33% (p=0.000 n=9+9)
Differential Revision: https://reviews.llvm.org/D134982
GCC doesn't support `-fopenmp-version`, causing test failure if the compiler used
for testing is GCC.
GCC's OpenMP 5.2 support is very limited yet. Disable those tests requiring 5.2
feature for GCC as well.
We might want to take a look at all `libomp` tests and mark those tests that
don't support GCC yet.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D142173
As pointed out by @arsenm in https://reviews.llvm.org/D141451#4045099,
we don't handle ConstantExpressions for dontcall-{warn|error} IR Fn
Attrs.
Use CallBase::getCalledOperand() and Value::stripPointerCasts() should
the call to CallBase::getCalledFunction return nullptr.
I don't know how to express the IR test case in C, otherwise I'd add a
clang test, too.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D142058
These are essentially add/sub 1 with a clamping value.
AMDGPU has instructions for these. CUDA/HIP expose these as
atomicInc/atomicDec. Currently we use target intrinsics for these,
but those do no carry the ordering and syncscope. Add these to
atomicrmw so we can carry these and benefit from the regular
legalization processes.
https://alive2.llvm.org/ce/z/2iC4oB
This is similar to changes made for zext + lshr:
21d3871b7c6c39a3aae1
The existing fold did not account for extra uses, so we
see some instruction count reductions in the test diffs.
This is intended to improve analysis (icmp likely has more
transforms than any other opcode), make other transforms
more symmetric with zext/lshr, and it can be inverted
in codegen if profitable.
As with the earlier changes, there is potential to uncover
infinite combine loops, but I have not found any yet.
This will allow an entry in the table to access data that is stored
immediately after the end of the table, by adding its opcode value
to its address.
Differential Revision: https://reviews.llvm.org/D142217
The Ampere1A core improves on the Ampere1 with key differences being:
* memory tagging is supported
* SM3/SM4 are supported
* adds a new fusion pair for (A+B+1 and A-B-1)
(added in a later commit)
Depends on D142395
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D142396
Combine the implicit uses and defs lists into a single list of uses
followed by defs. Instead of 0-terminating the list, store the number
of uses and defs. This avoids having to scan the whole list to find the
length and removes one pointer from MCInstrDesc (although it does not
get any smaller due to alignment issues).
Remove the old accessor methods getImplicitUses, getNumImplicitUses,
getImplicitDefs and getNumImplicitDefs as all clients are using the new
implicit_uses and implicit_defs.
Differential Revision: https://reviews.llvm.org/D142216