When a section has the SHF_COMPRESSED flag, -p/-x dump the compressed
content by default. In GNU readelf, if --decompress/-z is specified,
-p/-x will dump the decompressed content. This patch implements the
option.
Close#82507
(cherry picked from commit 26d71d9ed56c4c23e6284dac7a9bdf603a5801f3)
The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a
CMake project to use for generating the profile data. For example, to
use the llvm-test-suite to generate profile data you would do:
$ cmake -G Ninja -B build -S llvm -C <path to
source>/clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite>
\
-DBOOTSTRAP_CLANG_PGO_TRAINING_DEPS=runtimes
Note that the CLANG_PERF_TRAINING_DEPS has been renamed to
CLANG_PGO_TRAINING_DEPS.
---------
Co-authored-by: Petr Hosek <phosek@google.com>
(cherry picked from commit dd0356d741aefa25ece973d6cc4b55dcb73b84b4)
The Zicond extension was ratified in the last few months, with no
changes that affect the LLVM implementation. Although there's surely
more tuning that could be done about when to select Zicond or not, there
are no known correctness issues. Therefore, we should mark support as
non-experimental.
(cherry-picked from commit d833b9d677c9dd0a35a211e2fdfada21ea9a464b)
This is a high level description and FAQ for what we're doing in
RemoveDIs, and how old code should be behave with new debug-info
(exactly the same 99% of the time).
Currently, the UnifiedLTO pipeline seems to have trouble with several
LTO features, like SplitLTO units, which means we cannot use important
optimizations like Whole Program Devirtualization or security hardening
instrumentation like CFI.
This patch reverts FatLTO to using distinct pipelines for Full LTO and
ThinLTO. It still avoids module cloning, since that was error prone.
`--function=<regex>` Include functions matching regex in the output
`--no-function=<regex>` Exclude functions matching regex from the output
If both are specified, `--no-function` has a higher precedence if a
function name matches both filters
C++20 accepted two papers, [P0668](https://wg21.link/P0668) and
[P0982](https://wg21.link/P0982), which changed the atomics memory model
slightly in order to reflect the realities of the existing
implementations.
The rationale for these changes applies as well to the LLVM IR atomics
model. No code changes are expected to be required from this change: it
is primarily a matter of more-correctly-documenting the existing state
of the world.
There's three changes: two of them weaken guarantees, and one
strengthens them:
1. The memory ordering guaranteed by some backends/CPUs when seq_cst
operations are mixed with acquire/release operations on the same
location was weaker than the spec guaranteed. Therefore, the
specification is changed to remove the requirement that seq_cst ordering
is consistent with happens-before, and replaces it with a slightly
weaker requirement of consistency with a new relation named
strongly-happens-before.
2. The rules for a "release sequence" were weakened. Previously, an
acquire synchronizes with an release even if it observes a later
monotonic store from the same thread as the release store. That has now
been removed: now, only read-modify-write operations can extend a
release sequence.
3. The model for a a seq_cst fence is strengthened, such that placing a
seq_cst between monotonic accesses now _is_ sufficient to guarantee
sequential consistency in the model (as it always has been on existing
implementations.)
Note that I've directly referenced the C++ standard's atomics.order
section for the precise semantics of seq_cst, instead of fully
describing them. They are quite complex, and a lot of work has gone into
refining the words in the standard. I'm afraid if I attempt to reiterate
them, I would only introduce errors.
Although there are predicated versions of minnum/maxnum, the ones for
minimum/maximum are currently missing. This patch introduces these
intrinsics and implements their lowering to RISC-V.
Named '.amdhsa_code_object_version'. This directive sets the
e_ident[ABIVERSION] in the ELF header, and should be used as the assumed
COV for the rest of the asm file.
This commit also weakens the --amdhsa-code-object-version CL flag.
Previously, the CL flag took precedence over the IR flag. Now the IR
flag/asm directive take precedence over the CL flag. This is implemented
by merging a few COV-checking functions in AMDGPUBaseInfo.h.
Currently, the IR parser requires that %n style numbered values are
consecutive. This means that the IR becomes invalid as soon as you
remove an instruction, argument or block. This makes it very annoying to
modify IR without running it through instnamer first.
I don't think there is any good reason to impose this requirement. This
PR relaxes it to allow value IDs to be non-consecutive, but it still
keeps the requirement that they're increasing (i.e. you can't skip a
value number and then assign it later).
This only implements support for skipping numbers for local values. We
should extend this to global values in the future as well.
This adds minimal support for 7 new unprivileged extensions that were
defined as a part of
the RISC-V Profiles specification here:
https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#7-new-isa-extensions
* Ziccif: Main memory supports instruction fetch with atomicity
requirement
* Ziccrse: Main memory supports forward progress on LR/SC sequences
* Ziccamoa: Main memory supports all atomics in A
* Zicclsm: Main memory supports misaligned loads/stores
* Za64rs: Reservation set size of 64 bytes
* Za128rs: Reservation set size of 128 bytes
* Zic64b: Cache block size isf 64 bytes
As stated in the specification, these extensions don't add any new
features but
describe existing features. So this patch only adds parsing and
subtarget
features.
Printing the raw symbol is useful in inline asm (e.g. getting the C++
mangled name, referencing a symbol in a custom way while ensuring it is
not optimized out even if internal). Similar constraints are available
in other targets (e.g. "S" for aarch64/riscv, "Cs" for m68k).
```
namespace ns { extern int var, a[4]; }
void foo() {
asm(".pushsection .xxx,\"aw\"; .dc.a %p0; .popsection" :: "Ws"(&ns::var));
asm(".reloc ., BFD_RELOC_NONE, %p0" :: "Ws"(&ns::a[3]));
}
```
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105576
This commit includes the necessary changes to clang and LLVM to support
codegen of `RVE` and the `ilp32e`/`lp64e` ABIs.
The differences between `RVE` and `RVI` are:
* `RVE` reduces the integer register count to 16(x0-x16).
* The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits.
`RVE` can be combined with all current standard extensions.
The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are:
* Only 6 integer argument registers (rather than 8).
* Only 2 callee-saved registers (rather than 12).
* A Stack Alignment of 32bits (rather than 128bits).
* ilp32e isn't compatible with D ISA extension.
If `ilp32e` or `lp64` is used with an ISA that has any of the registers
x16-x31 and f0-f31, then these registers are considered temporaries.
To be compatible with the implementation of ilp32e in GCC, we don't use
aligned registers to pass variadic arguments and set stack alignment\
to 4-bytes for types with length of 2*XLEN.
FastCC is also supported on RVE, while GHC isn't since there is only one
avaiable register.
Differential Revision: https://reviews.llvm.org/D70401
Closes#75620
As I mentioned on the issue, this PR aims to hash-pin the CI
dependencies used on sensitive context -- i.e., they either are called
with write permissions, or are being used to build critical artifacts
like a release. In summary, this PR brings 3 changes:
1. Hash pin GitHub Actions called on sensitive context
2. Hash pin python dependencies used on sensitive context
3. Configure dependabot to automatically update those hashes
I'm further explaining the steps bellow.
The dependencies in format of GitHub Actions, I simply hash-pinned them.
I also made sure to keep the human-readable version as comments at the
same line.
At the
[release-tasks.yml](https://github.com/llvm/llvm-project/blob/main/.github/workflows/release-tasks.yml)
file, I've changed the installation method of some python dependencies
to install them considering their hashpinning. That required the
generation of a requirements file that had all the correct hashes, and
for that I used [pip-tools](https://pypi.org/project/pip-tools/2.0.0/).
While configuring dependabot, I set it to send a monthly PR updating all
the GitHub Actions, and a weekly PR to update any python dependency
required by
[/llvm/docs/requirements.txt](https://github.com/llvm/llvm-project/blob/main/llvm/docs/requirements.txt).
Let me know if you have any questions or concerns, I'd be happy to
clarify and help.
Thanks!
---------
Signed-off-by: Diogo Teles Sant'Anna <diogoteles@google.com>
LLVM assumes round-to-nearest mode and sometimes performs constant-folding based on that assumption. This updates the language ref documentation for the rint and nearbyint intrinsics to mention that fact.
The GHC CC got added to RISCV in
a8dc2110cd but it never got documented in
the LangRef. This adds documentation in the LangRef noting that RISCV is
supports the GHC calling convention and notes the specific limitations
of the GHC CC on RISCV.
If a category has no options associated with it, the `--help-hidden`
command still shows that category with the annotation "This option
category has no options", and this is how it was implemented from the
beginning when the categories were introduced, see commit 0537a98878. A
feature to hide unrelated options was added later, in
https://reviews.llvm.org/D7100. Now, if a tool needs to hide unrelated
options that are associated with categories, leaving some of them empty,
those categories will still be visible on the `--help-hidden` output,
even if they have no use for the tool; see the changes in
`llvm/test/tools/llvm-debuginfo-analyzer/cmdline.test` for an example.
The patch ensures that only categories with options are shown on both
main and hidden help output.
Added -p / --no-params flag to skip demangling function parameters
similar to how it is supported by GNU c++filt tool.
There are cases when users want to demangle a large number of symbols in
bulk, for example, at startup, and do not care about function parameters
and overloads at that time. Skipping the demangling of parameter types
led to a measurable improvement in performance. Our users reported about
15% speed up with GNU c++filt and we expect similar results with
llvm-cxxfilt with this patch.
"hidden_dynamic_lds_size" argument will be added in the reserved section
at offset 120 of the implicit argument layout.
Add "isDynamicLDSUsed" flag to AMDGPUMachineFunction to identify if a
function uses dynamic LDS.
hidden argument will be added in below cases:
- LDS global is used in the kernel.
- Kernel calls a function which uses LDS global.
- LDS pointer is passed as argument to kernel itself.
Uses machine analyses to emit PGOAnalysisMap into the bb-addr-map ELF
section. Implements filecheck tests to verify emitting new fields.
This patch emits optional PGO related analyses into the bb-addr-map ELF
section during AsmPrinter. This currently supports Function Entry Count,
Machine Block Frequencies. and Machine Branch Probabilities. Each is
independently enabled via the `feature` byte of `bb-addr-map` for the given
function.
A part of [RFC - PGO Accuracy Metrics: Emitting and Evaluating Branch and Block Analysis](https://discourse.llvm.org/t/rfc-pgo-accuracy-metrics-emitting-and-evaluating-branch-and-block-analysis/73902).