We've been displaying types and addresses for containers, but that's not
very useful information. A better approach is to compose the summary of
containers with the summary of a few of its children.
Not only that, we can dereference simple pointers and references to get
the summary of the pointer variable, which is also better than just
showing an anddress.
And in the rare case where the user wants to inspect the raw address,
they can always use the debug console for that.
For the record, this is very similar to what the CodeLLDB extension
does, and it seems to give a better experience.
An example of the new output:
<img width="494" alt="Screenshot 2023-09-06 at 2 24 27 PM"
src="https://github.com/llvm/llvm-project/assets/1613874/588659b8-421a-4865-8d67-ce4b6182c4f9">
And this is the
<img width="476" alt="Screenshot 2023-09-06 at 2 46 30 PM"
src="https://github.com/llvm/llvm-project/assets/1613874/5768a52e-a773-449d-9aab-1b2fb2a98035">
old output:
On x86 targets, -fsplit-machine-functions enables splitting of machine
functions using profile information. This patch rolls out the flag for
AArch64 targets.
Depends on D158647
Differential Revision: https://reviews.llvm.org/D157157
The OpenACC standard specifies an `atomic` construct in section 2.12 (of
3.3 spec), used to ensure that a specific location is accessed or
updated atomically. Four different clauses are allowed: `read`, `write`,
`update`, or `capture`. If no clause appears, it is as if `update` is
used.
The OpenMP specification defines the same clauses for `omp atomic`. The
types of expression and the clauses in the OpenACC spec match the OpenMP
spec exactly. The main difference is that the OpenMP specification is a
superset - it includes clauses for `hint` and `memory order`. It also
allows conditional expression statements. But otherwise, the expression
definition matches.
Thus, for OpenACC, we refactor and reuse the OpenMP implementation as
follows:
* The atomic operations are duplicated in OpenACC dialect. This is
preferable so that each language's semantics are precisely represented
even if specs have divergence.
* However, since semantics overlap, a common interface between the
atomic operations is being added. The semantics for the interfaces are
not generic enough to be used outside of OpenACC and OpenMP, and thus
new folders were added to hold common pieces of the two dialects.
* The atomic interfaces define common accessors (such as getting `x` or
`v`) which match the OpenMP and OpenACC specs. It also adds common
verifiers intended to be called by each dialect's operation verifier.
* The OpenMP write operation was updated to use `x` and `expr` to be
consistent with its other operations (that use naming based on spec).
The frontend lowering necessary to generate the dialect can also be
reused. This will be done in a follow up change.
NFC changes to explicitly specify the type we are matching when creating
Int32 reg. This will allow use to have multiple types mapping those
register without causing ambigous matching.
On AArch64, it is safe to let the linker handle relaxation of
unconditional branches; in most cases, the destination is within range,
and the linker doesn't need to do anything. If the linker does insert
fixup code, it clobbers the x16 inter-procedural register, so x16 must
be available across the branch before linking. If x16 isn't available,
but some other register is, we can relax the branch either by spilling
x16 OR using the free register for a manually-inserted indirect branch.
This patch builds on D145211. While that patch is for correctness, this
one is for performance of the common case. As noted in
https://reviews.llvm.org/D145211#4537173, we can trust the linker to
relax cross-section unconditional branches across which x16 is
available.
Programs that use machine function splitting care most about the
performance of hot code at the expense of the performance of cold code,
so we prioritize minimizing hot code size.
Here's a breakdown of the cases:
Hot -> Cold [x16 is free across the branch]
Do nothing; let the linker relax the branch.
Cold -> Hot [x16 is free across the branch]
Do nothing; let the linker relax the branch.
Hot -> Cold [x16 used across the branch, but there is a free register]
Spill x16; let the linker relax the branch.
Spilling requires fewer instructions than manually inserting an
indirect branch.
Cold -> Hot [x16 used across the branch, but there is a free register]
Manually insert an indirect branch.
Spilling would require adding a restore block in the hot section.
Hot -> Cold [No free regs]
Spill x16; let the linker relax the branch.
Cold -> Hot [No free regs]
Spill x16 and put the restore block at the end of the hot function; let the linker relax the branch.
Ex:
[Hot section]
func.hot:
... hot code...
func.restore:
... restore x16 ...
B func.hot
[Cold section]
func.cold:
... spill x16 ...
B func.restore
Putting the restore block at the end of the function instead of
just before the destination increases the cost of executing the
store, but it avoids putting cold code in the middle of hot code.
Since the restore is very rarely taken, this is a worthwhile
tradeoff.
Differential Revision: https://reviews.llvm.org/D156767
Instead of hard-coding the name `lldbDataFormatters`, use `__name__` to
get the module's name.
This allows the formatters to be loaded from any path, with any
filename.
Based on https://en.cppreference.com/w/c/memory/aligned_alloc, the
`size` is supposed
to be a multiple of `alignment`, and it is implementation defined
behavior if not.
We have a non-conformant use in `kmp_barrier.h` when allocating
distribute barrier.
The size of the barrier is 576 and the alignment is `4*CACHE_LINE`,
which is 256
on most systems. Apparently it works perfectly fine for Linux and
Intel-based Mac,
but not for Apple Silicon based Mac.
Fix#63194.
Like concepts checking, a trailing return type of a lambda
in a dependent context may refer to captures in which case
they may need to be rebuilt, so the map of local decl
should include captures.
This patch reveal a pre-existing issue.
`this` is always recomputed by TreeTransform.
`*this` (like all captures) only become `const`
after the parameter list.
However, if try to recompute the value of `this` (in a parameter)
during template instantiation while determining the type of the call operator,
we will determine it to be const (unless the lambda is mutable).
There is no good way to know at that point that we are in a parameter
or not, the easiest/best solution is to transform the type of this.
Note that doing so break a handful of HLSL tests.
So this is a prototype at this point.
Fixes#65067Fixes#63675
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D159126
Summary:
A previous introduced a new object type for the GPU functions
implemented by an external vendor library. This was done so they we did
not attempt to run tests on functions which we did not implement,
however this accidentally stopped them from being included in the actual
output. Fix this by checking the new type as well.
The long term goal is to remove this vendor handling altogether, but is
being used as a short-term solution to provide a math library on the
GPU which currently lacks one.
Added a class to hold such common state. The goal is to both reduce the argument list of other utilities used by `computeImportForModule` (which will be brought as members in a subsequent patch), and to make it easy to extend such state later.
This patch is part of a larger initiative aimed at fixing floating-point
`max` and `min` operations in MLIR:
https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.
This patch addresses task 1.1 from the plan. It involves modifying the
lowering process for `arith.minf` and `arith.maxf` operations.
Specifically, the change replaces the usage of `llvm.minnum` and
`llvm.maxnum` with `llvm.minimum` and `llvm.maximum`, respectively. This
adjustment is necessary because the `m**num` intrinsics are not suitable
for the mentioned MLIR operations due to semantic discrepancies in
handling NaNs, positive and negative floating-point zeros.
This is the first step to implement time zone support in libc++. This
adds the complete tzdb_list class and a minimal tzdb class. The tzdb
class only contains the version, which is used by reload_tzdb.
Next to these classes it contains documentation and build system support
needed for time zone support. The code depends on the IANA Time Zone
Database, which should be available on the platform used or provided by
the libc++ vendors.
The code is labeled as experimental since there will be ABI breaks
during development; the tzdb class needs to have the standard headers.
Implements parts of:
- P0355 Extending <chrono> to Calendars and Time Zones
Addresses:
- LWG3319 Properly reference specification of IANA time zone database
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D154282
Summary:
The `omp_get_num_procs()` function should return the amount of
parallelism availible. On the GPU, this was not defined. We have elected
to define this function as the maximum amount of wavefronts / warps that
can be simultaneously resident on the device. For AMDGPU this is the
number of CUs multiplied byth CU's per wave. For NVPTX this is the
maximum threads per SM divided by the warp size and multiplied by the
number of SMs.
Previously, these tests expected that calling mktime with a struct tm
that caused overlow to succeed with return -1
(TimeConstants::OUT_OF_RANGE_RETURN_VALUE), however, the Succeeds call
expects the errno to be zero (no failure).
This patch fixes the expected calls to fail with EOVERFLOW. These tests
are only enabled to 32-bit systems, and are probably not being tested on
the arm32 buildbot, that's why this was not a problem before.
This test was setting tv_nsec to a negative value, which as per the
standard this is an EINVAL:
The value in the tv_nsec field was not in the range [0, 999999999] or
tv_sec was negative.
https://man7.org/linux/man-pages/man2/nanosleep.2.html
Fixes `lldb/test/Shell/SymbolFile/NativePDB/inline_sites.test` to use the correct line number now that f2f36c9b29 is causing the inline call site info to be taken into account.
This allows us to produce better error messages for types that were only
forward-declared, but where a full definition was expected.
The first user will be https://reviews.llvm.org/D159013; this change is
sent to review separately to reduce the scope of the other patch.
This commit replaces some calls to the deprecated `FileEntry::getName()` with `FileEntryRef::getName()` by swapping current usages of `SourceManager::getFileEntryForID()` with `SourceManager::getFileEntryRefForID()`. This lowers the number of usages of the deprecated `FileEntry::getName()` from 95 to 50.
Use the same arguments as other builds. This gives better output to
validate what the CI did.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D158860
`clang` currently links `libgcc_s` unconditionally on Solaris, which is
unnecessary.
This patch wraps it in `-z ignore`/`-z record` instead.
Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.
Issue Details:
When building up line information for CodeView debug info, LLVM attempts to gather the "range" of instructions within a function as these are printed together in a single record. If there is an inlined function, then those lines are attributed to the original function to enable generating `S_INLINESITE` records. However, this thus requires there to be instructions from the inlining function after the inlined function otherwise the instruction range would not include the inlined function.
Fix Details:
Include any inlined functions when finding the extent of a function in `getFunctionLineEntries`
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D159226
The function assumes that `__kmp_gtid_get_specific` always returns a valid gtid.
That is not always true, because when creating the key for thread-specific data,
a destructor is assigned. The dtor will be called at thread exit. However, before
the dtor is called, the thread-specific data will be reset to NULL first
(https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html):
> At thread exit, if a key value has a non-NULL destructor pointer, and the thread
> has a non-NULL value associated with that key, the value of the key is set to NULL.
This will lead to that `__kmp_gtid_get_specific` returns `KMP_GTID_DNE`.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D159369
The outlined function is typically invoked by using
`__kmp_invoke_microtask`,
which is written in asm. D138495 introduces a new interface function for
parallel
region for OpenMPIRBuilder, where the outlined function is called via
the function
pointer. For some reason, it works perfectly well on x86 and x86-64
system, but
doesn't work on Apple Silicon. The 3rd argument in the callee is always
`nullptr`, even
if it is not in caller. It appears `x2` always contains `0x0`. This
patch adopts
the typical method to invoke the function pointer. It works on my M2
Ultra Mac.
Fix#63194.