BroadcastEvent currently takes its EventData* param and shoves it into
an Event object, which takes ownership of the pointer and places it into
a shared_ptr to manage the lifetime.
Instead of relying on `new` and passing raw pointers around, I think it
would make more sense to create the shared_ptr up front.
'num_gangs', 'num_workers', 'device_num', and 'default_async' are all
exactly the same (for the purposes of parsing) as 'vector_length', so
implement these the same way.
This implements a way for the compiler to find the modules.json
associated with the C++23 Standard library modules.
This is based on a discussion in SG15. At the moment no Standard library
installs this manifest. #75741 adds this feature in libc++.
This patch removes the noexcept specifier introduced in #69407 since the
Standard allows a new handler to throw an exception of type bad_alloc
(or derived from it). With the noexcept specifier on the helper
functions, we would immediately terminate the program.
The patch also adds tests for the case that had regressed.
Co-authored-by: Alison Zhang <alisonzhang@ibm.com>
- With PGO, indirect call edges are constructed using value profiles, and the profile address is mapped to a function's PGO name. The PGO name is computed using a functions linkage before LTO internalization or global promotion.
- With ThinLTO, local functions [could be
promoted](2663d2cb9c/llvm/lib/Transforms/Utils/FunctionImportUtils.cpp (L288)) to have external linkage; and with
[full](2663d2cb9c/llvm/lib/LTO/LTO.cpp (L1328))
or
[thin](2663d2cb9c/llvm/lib/LTO/LTO.cpp (L448))
LTO, global functions could be internalized. Edge construction should use a function's PGO name before its linkage is updated.
This patch fixes commit 89aa3355, which added tests for
the removal of redundant DPVAssigns; unlike other cases where
adding tests for DPVAssigns before they are enabled is harmless,
these tests require them to be enabled, so must be deleted until
we enable them.
Fixes failures on llvm-new-debug-iterators buildbot:
https://lab.llvm.org/buildbot/#/builders/275/builds/3581
acc.loop was redesigned in https://reviews.llvm.org/D159229. This patch
updates the lowering to match the new op.
DO CONCURRENT construct will be added in a follow up patch.
Note that the pre-commit ci will fail until D159229 is merged.
Depends on #67355
The initial design of the `acc.loop` was to be an operation that
encapsulates a loop like operation. This was an early design and we now
want to change it so the `acc.loop` operation becomes a real loop-like
operation by implementing the LoopLikeInterface.
Differential Revision: https://reviews.llvm.org/D159229
This patch is just moved from Phabricator to github
When there are debug intrinsics in-between groups of select
instructions, select-optimise sinks them into the "end" block. This
needs to be replicated for DPValues, the non-instruction variable
assignment object. Implement that and add a RUN line to a test that was
sensitive to this to ensure it gets tested.
(The exact range of instructions being transformed here is a little
fiddly, hence I've gone with a helper lambda).
The CUDA SDK contains an unfortunate definition for the `__noinline__`
macro. This patch works around it by using `__attribute__((noinline))`
instead of `__attribute__((__noinline__))` on CUDA. We are still waiting
for a long-term resolution to this issue in NVIDIA/cccl#1235.
Commit 1981b1b6b9 unexpectedly strengthened
--no-allow-shlib-undefined to catch a kind of ODR violation.
More precisely, when all three conditions are met, the new
`--no-allow-shlib-undefined` code reports an error.
* There is a DSO undef that has been satisfied by a definition from
another DSO.
* The `SharedSymbol` is overridden by a non-exported (usually of hidden
visibility) definition in a relocatable object file (`Defined`).
* The section containing the `Defined` is garbage-collected (it is not
part of `.dynsym` and is not marked as live).
Technically, the hidden Defined in the executable can be intentional: it
can be meant to remain non-exported and not interact with any dynamic
symbols of the same name that might exist in other DSOs. To allow for
such use cases, allocate a new bit in
Symbol and relax the --no-allow-shlib-undefined check to before
commit 1981b1b6b9.
DPValues are already supported by most of the utilities that remove
redundant debug info after certain passes; the exception to this is
`removeUndefDbgAssignsFromEntryBlock`, which applies only to
llvm.dbg.assigns which were previously unimplemented for DPValues. Now
that DPVAssigns exist, we have to support removing redundant instances
in the same way, which this patch implements.
For some reasons, we are using `-fpie`
(libc/cmake/modules/LLVMLibCObjectRules.cmake:31) without supporting it.
According to @lntue, some of the hermetic tests are broken without
proper PIE support. This patch implements basic relocations support for
PIE.
This reverts commit 7d9b5aa65b since
std/utilities/format/format.arguments/format.arg/visit.return_type.pass.cpp
is failing on Windows when building with Clang-cl.
libc on macOS does not provide at_quick_exit or quick_exit. This allows
modules to build on macOS and defer any errors to usage site of these
symbols.
Fixes: https://github.com/llvm/llvm-project/issues/77559
Based on https://reviews.llvm.org/D45375 . Introduce a new InputFile
kind `InternalKind`, use it for
* `ctx.internalFile`: for linker-defined symbols and some synthesized
`Undefined`
* `createInternalFile`: for symbol assignments and --defsym
I picked "internal" instead of "synthetic" to avoid confusion with
SyntheticSection.
Currently a symbol's file is one of: nullptr, ObjKind, SharedKind,
BitcodeKind, BinaryKind. Now it's non-null (I plan to add an
`assert(file)` to Symbol::Symbol and change `toString(const InputFile
*)`
separately).
Debugging and error reporting gets improved. The immediate user-facing
difference is more descriptive "File" column in the --cref output. This
patch may unlock further simplification.
Currently each symbol assignment gets its own
`createInternalFile(cmd->location)`. Two symbol assignments in a linker
script do not share the same file. Making the file the same would be
nice, but would require non trivial code.
Summary:
This patch removes the bulk of the handling of the
`__tgt_offload_entries` out of the plugins itself. The reason for this
is because the plugins themselves should not be handling this
implementation detail of the OpenMP runtime. Instead, we expose two new
plugin API functions to get the points to a device pointer for a global
as well as a kernel type.
This required introducing a new type to represent a binary image that
has been loaded on a device. We can then use this to load the addresses
as needed. The creation of the mapping table is then handled just in
`libomptarget` where we simply look up each address individually. This
should allow us to expose these operations more generically when we
provide a separate API.
In the test from https://reviews.llvm.org/D7098, `char array[len];` is
32-byte aligned on most targets whether it is instrumented or not
(optimized by StackSafetyAnalysis), due to the the used `*FrameLowering`
being `StackRealignable`.
However, when using `SystemZELFFrameLowering`, an un-instrumented
`char array[len];` is only 8-byte aligned.
Ensure `char array[len];` gets instrumented like what we did to
`alloca_vla_interact.cpp`, to make the test pass on s390x.
This is a follow-up to 8c1b7fba1f -- GlobalISel currently doesn't handle
RemoveDIs mode debug-info, but will (see #75228). Disable this runline
until then.
(This is a patch-landing ordering problem)
updateNewZAFunctions is extended to generate the following on entry to a
function with either the "aarch64_pstate_za_new" or "arm_new_zt0"
attribute:
- Private-ZA interface: commit any active lazy-saves & enable PSTATE.ZA.
- "aarch64_pstate_za_new": zero ZA.
- "arm_new_zt0": zero ZT0.
Additionally, PSTATE.ZA should disabled before returning if the function
has a private-ZA interface.
This patch enables applications that did not request OpenMP
unified_shared_memory to run with the same zero-copy behavior, where
mapped memory does not result in extra memory allocations and memory
copies, but CPU-allocated memory is accessed from the device. The name
for this behavior is "automatic zero-copy" and it relies on detecting:
that the runtime is running on a MI300A, that the user did not select
unified_shared_memory in their program, and that XNACK (unified memory
support) is enabled in the current GPU configuration. If all these
conditions are met, then automatic zero-copy is triggered.
This patch also introduces an environment variable OMPX_APU_MAPS that,
if set, triggers automatic zero-copy also on non APU GPUs (e.g., on
discrete GPUs).
This patch is still missing support for global variables, which will be
provided in a subsequent patch.
Co-authored-by: Thorsten Blass <thorsten.blass@amd.com>
Check that flags chained comparison expressions,
such as a < b < c or a == b == c, which may have
unintended behavior due to implicit operator
associativity.
Moved from Phabricator (D144429).
Flang was recently updated on Compiler Explorer and by default it's in
assemble only mode, you have to enable linking and executing.
This means that the default output for flang-to-external-fc is nothing,
as it doesn't know what `-S` means. You'd have to know to enable the
link to binary option to see any output.
Handle `-S` so that users of Compiler Explorer don't have to wonder why
the "compiler" is broken.
When generating declaration fragments for types that use typedefs to
pointer types ensure that we keep the user-defined typedef form instead
of desugaring the typedef.
rdar://102137655
Allows cases where movss/movsd etc. are loading constant (ConstantDataSequential) sub-vectors, ensuring we pad with the correct number of zero upper elements by making repeated printConstant calls to print zeroes in a matching int/fp format.
This allows us to check the entire constant address calculation, and ensure we're not performing any runtime address math into the constant pool (noticed in an upcoming patch).
This patch abstracts visitEntryValueDbgValue to deal with the substance
of variable locations (Value, Var, Expr, DebugLoc) rather than how
they're stored. That allows us to call it from handleDebugValue, which
is similarly abstracted. This allows the entry-value behaviour (see the
test) to be supported with non-instruction debug-info too!.
Produces now valid fixes for a member variables initialized with macros.
Correctly uses expansion location instead of location inside macro to
get init code.
Close#70189
This patch implements __cxa_init_primary_exception, an extension to the
Itanium C++ ABI. This extension is already present in both libsupc++ and
libcxxrt. This patch also starts making use of this function in
std::make_exception_ptr: instead of going through a full throw/catch
cycle, we are now able to initialize an exception directly, thus making
std::make_exception_ptr around 30x faster.