Add a new LC_NOTE for Mach-O corefiles, "proces metadata", which is a
JSON string. Currently there may be a `threads` key in the JSON,
and if `threads` is present, it is an array with the same number of
elements as there are LC_THREADs in the corefile. This patch adds
support for a `thread_id` key-value for each `thread` entry, to
supply a thread ID for that LC_THREAD.
Differential Revision: https://reviews.llvm.org/D158785
rdar://113037252
I'm reverting this on principle, since it didn't get the Phabricator
approval I thought it had (only an informal LGTM). Will re-apply once
it has been properly approved.
This reverts commit e1bfeb6bcc.
This is information that the compiler already has, and should be exposed
so that the library doesn't need to reimplement the exact same
functionality.
Differential Revision: https://reviews.llvm.org/D135341
Currently, dimlvlmap with identity affine map will be treated as empty
affine map. But the new syntax would treat it as an actual identity
affine map such as {d0} -> {d0}. This mismatch could raise an error when
we are comparing sparse encodings.
Reduce YAML profile processing times:
- preprocessProfile: speed up buildNameMaps by replacing ProfileNameToProfile
mapping with ProfileFunctionNames set and ProfileBFs vector.
Pre-look up YamlBF->BF correspondence, memoize in ProfileBFs.
- readProfile: replace iteration over all functions in the binary by iteration
over profile functions (strict match and LTO name match).
On a large binary (1.9M functions) and large YAML profile (121MB, 30k functions)
reduces profile steps runtime:
pre-process profile data: 12.4953s -> 10.7123s
process profile data: 9.8195s -> 5.6639s
Compared to fdata profile reading:
pre-process profile data: 8.0268s
process profile data: 1.0265s
process profile data pre-CFG: 0.1644s
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D159460
The issue is uncovered by #47698: for assembly files, -triple= specifies the
full target triple while -arch= merely sets the architecture part of the default
target triple, leaving a target triple which may not make sense, e.g.
riscv64-apple-darwin.
Therefore, -arch= is error-prone and not recommended for tests. The issue has
been benign as we recognize $unknown-apple-darwin as ELF instead of rejecting it
outrightly.
Due to the nature of the issue, we don't see the issue in tests using
architectures that any of Mach-O/COFF/XCOFF supports.
The issue is uncovered by #47698: for IR files without a target triple,
-mtriple= specifies the full target triple while -march= merely sets the
architecture part of the default target triple, leaving a target triple which
may not make sense, e.g. riscv64-apple-darwin.
Therefore, -march= is error-prone and not recommended for tests without a target
triple. The issue has been benign as we recognize $unknown-apple-darwin as ELF instead
of rejecting it outrightly.
According to 7.5.6.3 point 3, finalization occurs when
> A nonpointer, nonallocatable object that is not a dummy argument or
function result is finalized immediately before it would become
undefined due to execution of a RETURN or END statement (19.6.6, item
(3)).
We were not calling the finalization on empty derived-type. There is no
such restriction so this patch updates the code so the finalization is
called for empty type as well.
With this, check-llvm passes on an arm mac if x86 isn't in
LLVM_TARGETS_TO_BUILD.
This pattern to skip the tests if x86 isn't enabled is used
in every other test in this file.
Make sure every conditional branch constructed by `LoopUnrollRuntime`
code sets branch weights.
- Add new 1:127 weights for the conditional jumps checking whether the
whole (unrolled) loop should be skipped in the generated prolog or
epilog code.
- Remove `updateLatchBranchWeightsForRemainderLoop` function and just
add weights immediately when constructing the relevant branches. This
leads to simpler code and makes the code more obvious as every call
to `CreateCondBr` now has a `BranchWeights` parameter.
- Rework formula for epilogue latch weights, to assume equal
distribution of remainders and remove `assert` (as I was able to
reach this code when forcing small unroll factors on the commandline).
Differential Revision: https://reviews.llvm.org/D158642
Exposes the existing `get(ShapedType, StringRef, AsmResourceBlob)`
builder publicly (was protected) and adds a CAPI
`mlirUnmanagedDenseBlobResourceElementsAttrGet`.
While such a generic construction interface is a big help when it comes
to interop, it is also necessary for creating resources that don't have
a standard C type (i.e. f16, the f8s, etc).
Previously reviewed/approved as part of https://reviews.llvm.org/D157064
Fixes: https://github.com/llvm/llvm-project/issues/65806
Currently clang put extern shared var ODR-used by host device functions
in global var __clang_gpu_used_external. This behavior was due to
https://reviews.llvm.org/D123441. However, clang should not do that for
extern shared vars since their addresses are per warp, therefore cannot
be accessed by host code.
"descriptive summaries" should only be used for small to medium binaries
because of the performance penalty the cause when completing types. I'm
defaulting it to false.
Besides that, the "raw child" for synthetics should be optional as well.
I'm defaulting it to false.
Both options can be set via a launch or attach config, following the
pattern of most settings. javascript extension wrappers can set these
settings on their own as well.
Since the OpenACC atomics specification is a subset of OpenMP atomics,
the same lowering implementation can be used. This change extracts out
the necessary pieces from the OpenMP lowering and puts them in a shared
spot. The shared spot is a header file so that each implementation can
template specialize directly.
After putting the OpenMP implementation in a common spot, the following
changes were needed to make it work for OpenACC:
* Ensure parsing works correctly by avoiding hardcoded offsets.
* Templatize based on atomic type.
* The checking whether it is OpenMP or OpenACC is done by checking for
OmpAtomicClauseList (OpenACC does not implement this so we just
templatize with void). It was preferable to check this instead of atomic
type because in some cases, like atomic capture, the read/write/update
implementations are called - and we want compile time evaluation of
these conditional parts.
* The memory order and hint are used only for OpenMP.
* Generate acc dialect operations instead of omp dialect operations.
The cache directive is attached directly to the acc.loop operation when
the directive appears in the loop. When it appears before a loop, the
OpenACCCacheConstruct is saved and attached when the acc.loop is
created.
Directive that cannot be attached to a loop are silently discarded.
Depends on #65521
The `cache` directive may appear at the top of (inside of) a loop. It
specifies array elements or subarrays that should be fetched into the
highest level of the cache for the body of the loop.
The `cache` directive is modeled as a data entry operands attached to
the acc.loop operation.
Deprecate the `gpu-to-cubin` & `gpu-to-hsaco` passes in favor of the
`TargetAttr` workflow. This patch removes remaining upstream uses of the
aforementioned passes, including the option to use them in `mlir-opt`. A
future patch will remove these passes entirely.
The passes can be re-enabled in `mlir-opt` by adding the CMake flag: `-DMLIR_ENABLE_DEPRECATED_GPU_SERIALIZATION=1`.
To match other internal symbolizer functions.
This makes harder to distighush small buffer from a different failure,
but we has the same problem for the rest of the lib.
Still we use 16k buffer so it should be enough most of the time.
We can fix all function togerher if future, if needed.
Add a DAG combine to form a masked.load from a masked_strided_load
intrinsic with stride equal to element size. This covers a couple of
extra test cases, and allows us to simplify and common some existing
code on the concat_vector(load, ...) to strided load transform.
This is the first in a mini-patch series to try and generalize our
strided load and gather matching to handle more cases, and common up
different approaches to the same problems in different places.
The internal header library target with name suffix `.__header_library`
has been removed as it serves no purpose now. It was added to make older
versions of CMake happy.
The dylib contains multiple global variables of type locale::id. Those
can be marked as constinit to make it clear that static initialization
is performed.
Runnign some tests with asan built of LLD would throw errors similar to the following:
AddressSanitizer:DEADLYSIGNAL
#0 0x55d8e6da5df7 in operator() /mnt/ssd/repo/lld/llvm-project/lld/MachO/Arch/ARM64.cpp:612
#1 0x55d8e6daa514 in operator() /mnt/ssd/repo/lld/llvm-project/lld/MachO/Arch/ARM64.cpp:650
Differential Revision: https://reviews.llvm.org/D157027
This is to fix an error that occurs when the char type is a class type.
Thanks to Yichen Yan for the patch.
Differential Revision: https://reviews.llvm.org/D100005
I'm making a change in this area (https://reviews.llvm.org/D138461), so update the test:
* Add proper synchronization instead of a sleep.
* Avoid some unnecessary size_t casts.
* Spawn the number of hardware threads instead of 10.
* Check that `__cxa_get_globals` and `__cxa_get_globals_fast` return
the same values.
* Split the test in with-threads and without-threads tests to simplify
the code.
Differential Revision: https://reviews.llvm.org/D138460
Co-authored-by: Louis Dionne <ldionne.2@gmail.com>
GNU readelf introduced --extra-sym-info/-X to display the section name
for --syms (https://sourceware.org/PR30684). Port the feature, which is
currently llvm-readelf only.
For STO_AARCH64_VARIANT_PCS/STO_RISCV_VARIANT_PCS, the Ndx and Name
columns may not be aligned.
Summary:
There is currently effort to change over the default AMDGPU code object
version https://github.com/llvm/llvm-project/pull/65410. However, this
unfortunately causes problems in the LLVM LibC test suite that leads to
a hang while executing. This is most likely a bug to do with indirect
call optimization, as it can be avoided without optimizations or with
manually preventing inlining in the AMDGPU startup code.
This patch sets the AMDGPU code object version to be four explicitly on
the LibC test suite. This should unblock the efforts to move the default
to 5 without breaking the test suite. This isn't a great solution, but
there is currently some time pressure to get COV5 landed and this seems
to be the easiest solution.