For empty sections, RuntimeDyld always allocates 1 byte but leaves it
uninitialized. This causes the contents of some output sections to be
non-deterministic.
Note that this issue is also solved by D147544.
Fixes#59008
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D149243
The byte offsets in the output of 'cmp' start from 1, not from 0 as the
current parser assumes. This caused mismatched bytes to sometimes be
attributed to the wrong section.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D149046
Since D148847, many tests are detected as being unsupported. This is
caused by BOLT_TARGETS_TO_BUILD being ;-separated whereas the previously
used TARGETS_TO_BUILD is space-separated.
This patch fixes this by creating config.targets lit.cfg.py by splitting
on ';'.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D149026
This commit splits the generic part of `LoopInfo` into separate files.
These new `GenericLoopInfo` files are located in `llvm/Support` to be inline
with `GenericDomTree`.
Furthermore, this change ensures that MLIR's Bazel build does not have
to link against `LLVMAnalysis` just to use these template headers.
Depends on D148219
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D148235
These checks are unnecessary -- we've already bailed if the format was wrong.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D148848
Adds BOLT_TARGETS_TO_BUILD, which defaults to the intersection of
X86;AArch64 and LLVM_TARGETS_TO_BUILD, but allows configuration to
alter that -- for instance omitting one of those two targets even if
llvm supports both.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D148847
The BOLT runtime is specifically hard coded for x86_64 linux or x86_64
darwin. (Using x86_64 syscalls, hardcoding syscall numbers.)
Make it very clear this is for those specific pair of systems.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D148825
Defaults to ON for x86_64 && (Linux | Darwin).
If enabled, checks that /proc/self/map_files is readable. Some systems are configured so that getdents fails with EPERM.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D148742
This typedef is only used inside the RewriteInstance source file, let's not
expose it in the header file -- even if private.
Differential Revision: https://reviews.llvm.org/D148667
Shdr's are not necesarily size 2^n, and there is no reason to align to
that boundary if they are.
Differential Revision: https://reviews.llvm.org/D148666
When input is DWP with DWARF5 bolt wasn't handling correctly CUs that didn't
have TU references. Which resulted in a crash.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D148589
When a cold function is too large, its section gets deregistered.
However, the section is still dereferenced later to get its RuntimeDyld
ID. This patch moves the deregistration to after the last dereference.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D148427
The following test fails when enabling UBSan due to a left shift of a
negative value:
> runtime error: left shift of negative value -2
BOLT :: AArch64/ext-island-ref.s
This patch fixes this by using a multiplication instead of a shift.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D148218
section-end-sym.s contains x86_64 assembly instruction execution on target.
I have changed REQURIES: field system-linux --> x86_64-linux
This came up while testing LLVM 16.0.1 release on AArch64 Linux.
When there is a direct jump right after an indirect one, in
the absence of code jumpting to this direct jump, this is obviously
dead code. However, BOLT was failing to recognize that by mistakenly
placing both jmp instructions in the same basic block, and creating
wrong successor edges. Fix that, so we can safely run UCE on
that. This bug also causes validateCFG to fail and BOLT to crash if it
is running ICP on that function.
Reviewed By: #bolt, Amir
Differential Revision: https://reviews.llvm.org/D148055
All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream
to encode the instruction into a SmallVector. The raw_ostream however
incurs some overhead for the actual encoding.
This change allows an MCCodeEmitter to directly emit an instruction into
a SmallVector without using a raw_ostream and therefore allow for
performance improvments in encoding. A default path that uses existing
raw_ostream implementations is provided.
Reviewed By: MaskRay, Amir
Differential Revision: https://reviews.llvm.org/D145791
`Function.RawBranchCount` is initialized for fdata profile but not for yaml one.
The diff adds the computation of the field for yaml profiles
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D144211
When computing symbol hashes in BinarySection::hash, we try to find relocations
in the section which reference the passed BinaryData. We do so by doing
lower_bound on data begin offset and upper_bound on data end offset. Since
offsets are relative to the current section, if it is a data from the previous
section, we get underflow when computing offset and lower_bound returns
Relocations.end(). If this data also ends where current section begins,
upper_bound on zero offset will return some valid iterator if we have any
relocations after the first byte. Then we'll try to iterate from lower_bound to
upper_bound, since they're not equal, which in that case means we'll dereference
Relocations.end(), increment it, and try to do so until we reach the second
valid iterator. Of course we reach segfault earlier. In this patch we stop BOLT
from searching relocations for symbols outside of the current section.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D146620
Sometimes, symbols are present that point to the end of a section (i.e.,
one-past the highest valid address). Currently, BOLT either rejects
those symbols when they don't point to another existing section, or errs
when they do and the other section is not executable. I suppose BOLT
would accept the symbol when it points to an executable section.
In any case, these symbols should not be considered while discovering
functions and should not result in an error. This patch implements that.
Note that this patch checks explicitly for symbols whose value equals
the end of their section. It might make more sense to verify that the
symbol's value is within [section start, section end). However, I'm not
sure if this could every happen *and* its value does not equal the end.
Another way to implement this is to verify that the BinarySection we
find at the symbol's address actually corresponds to the symbol's
section. I'm not sure what the best approach is so feedback is welcome.
Reviewed By: yota9, rafauler
Differential Revision: https://reviews.llvm.org/D146215
Move out prepareToParse lambda, generalize it to handle mem events perf process.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D146002
The two methods don't belong in BinaryFunction methods.
Move the dispatch tables into target-specific MCPlusBuilder methods.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D131813
Pre-calculate the register size table in MCPlusBuilder constructor,
similar to `AliasMap`/`SmallerAliasMap` in `initAliases`.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D145828
The golang support creates 2 new data segments, one of them contains
relocations in PIC binaries, so the section must have writable rights.
Currently BOLT creates only one new segment that contains new sections
with RX rights, now also create RW segment if there are any new writable
sections were allocated during BOLT binary processing.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D143390
Leverage move semantics for `std::vector`.
This also makes it consistent with `createInstrumentationSnippet`.
Reviewed By: Elvina
Differential Revision: https://reviews.llvm.org/D145465
It was using a redundant iteration over super regs to build
SmallerAliasMap. Removing this results in exactly the same alias maps
and a noticeable performance gain on targets with a large number of
registers.
Just anecdotally: on my machine, processing a small AArch64 binary went
from 2.7s down to 80ms.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D145779
This patch fixes few problems with supporting dynamic relocations in CI.
1. After dynamic relocations and functions were read search for dynamic
relocations located in functions. Currently we expected them only to be
relative and only to be in constant island. Mark islands of such
functions to have dynamic relocations and create CI access symbol on the
relocation offset, so the BD would be created for such place.
2. During function disassemble and handling address reference for
constant island check if the referred external CI has dynamic
relocation. And if it has one we would continue to refer original CI
rather then creating a local copy.
3. After function disassembly stage mark function that has dynamic reloc
in CI as non-simple. We don't want such functions to be optimized, since
such passes as split function would create 2 copies of CI which we
unable to support currently.
4. During updating output values for BF search for BD located in CI and
update their output locations.
5. On dynamic relocation patching stage search for binary data located
on relocation offset. If it was moved use new relocation offset value
rather then an old one.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D143748
Some build bots have not been updated to the new minimal CMake version.
Reverting for now and ping the buildbot owners.
This reverts commit 44c6b905f8.
This partly undoes D137724.
This change has been discussed on discourse
https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-cmake-version/66193
Note this does not remove work-arounds for older CMake versions, that
will be done in followup patches.
Reviewed By: mehdi_amini, MaskRay, ChuanqiXu, to268, thieta, tschuett, phosek, #libunwind, #libc_vendors, #libc, #libc_abi, sivachandra, philnik, zibi
Differential Revision: https://reviews.llvm.org/D144509
Remove the usage of StringMap in places where the iteration order
affects the output since the iteration over StringMap is
non-deterministic.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D145194
In case of a function with unknown control flow but with a single jump
table and a single jump table site, we attempt to match the jump table
and a site and update block successors using jump table targets.
Restrict this behavior for split jump tables which have targets in a
fragment function.
Fixes https://github.com/llvm/llvm-project/issues/60795.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D144602
ICF optimization runs multiple passes and the order in which functions
are folded could be dependent on the order they are being processed.
This order is indeterministic as functions are intermediately stored in
std::unordered_map<>. Note that this order is mostly stable, but is not
guaranteed to be and can change e.g. after switching to a different C++
library implementation.
Because the processing (and folding) order is indeterministic, the
previous way of calculating merged function call count could produce
different results.
Change the way we calculate the ICF call count to make it independent of
the function folding/processing order.
Mostly NFC as the output binary should remain the same, the change
affects only the console output.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D144807
When createInstrumentedIndirectCall() was invoked for tail calls, we
attached annotation instruction twice to the new call instruction.
First in createDirectCall(), and then again while copying over the
metadata operands.
As a result, the annotations were not properly stripped for such calls
before the call to freeAnnotations() in LowerAnnotations pass. That lead
to use-after-free while restoring the offsets with setOffset() call.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D144806
Simplify `MCPlusBuilder::evaluateX86MemoryOperand`: make it return a struct
with memory operand analysis struct `X86MemOperand`.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D144310
Avoid replacing one adr instruction with two adrp+add by utilizing linker-provided nops
when they are present. By doing so we preserve relative offsets of next instructions
in a function which reduces chances to break undetected jump tables. This commit makes
release-mode lld-linked clang, lld and etc work after BOLT.
Reviewed By: rafauler, yota9
Differential Revision: https://reviews.llvm.org/D143887
`isChildOf` is a more concise name for the check. Also, there's no need to
test if the function is a fragment before doing `isChildOf` check.
Reviewed By: #bolt, rafauler, maksfb
Differential Revision: https://reviews.llvm.org/D142667
Fix llvm-bolt-wrapper to skip output file checks if llvm-bolt exits with error
code.
Test Plan:
- checkout to revision with invalid NFC mismatch in `is-strip.s` test
(e.g. 056af487831fb573e6895901d1e48f93922f9635~)
- run `nfc-check-setup.py`
- run `bin/llvm-lit -a tools/bolt/test/X86/is-strip.s`
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D143614
YAML specification does not allow keys duplication an a mapping. However, YAML
parser in LLVM does not have any check on that and uses only the last key entry.
In this change duplicated keys are merged to satisfy the spec.
Differential Revision: https://reviews.llvm.org/D141848
In lite mode, include split function fragments to the list of functions to
process even if a fragment has no samples. This is required to properly
detect and update split jump tables (jump tables that contain pointers to code
in the main and cold fragments).
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D140457
We need to search referenced section based on relocations symbol section
to properly match end section symbols. For example on some binaries we
can observe that init_array_end/fini_array_end might be "placed" in to
the gap and since no section could be found for address the relocation
would be skipped resulting in wrong ADRP imm after emitting new text
resulting in binary sigsegv.
Credits for the test to Vladislav Khmelevskii aka yota9.
Reject stripped binaries as a policy.
The core issue with stripped binaries is that we can't detect the presence
of split functions which require extra handling. Therefore BOLT can't ensure
functional correctness of produced binary if the input stripped binary contains
split functions. Supporting such cases is an interesting problem but it goes
against BOLT's intended goal of achieving peak program performance.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D142686
The dependence is needed since Utils includes VCSRevision.h, and other
LLVM components that include this header also have the llvm_vcsrevision_h
dependency.
Fixes#60460.
Reviewed By: #bolt, ayermolo
Differential Revision: https://reviews.llvm.org/D143101
Allow partial name matching wrt LTO suffixes in `function-order`
user-supplied function list, the same as permitted by profile matching.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D142269
In some binaries produced with ThinLTO there are CUs that share entry in
.debug_addr. Before we would generate a new entry for each. Which lead to binary
size increase. This changes the behavior so that we re-use entries in
.debug_addr.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D142425
Reuse getLTOCommonName in components other than Profile (to be used in Core)
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D142259
In profile matching, if `.__uniq` suffix added for internal linkage
symbols with `-funique-internal-linkage-names` prevents BOLT from
matching to a binary function, try to strip the suffix and perform
fuzzy name matching.
Follow-up to D124117.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D142073
Break out stats-printing code from ReorderFunctions::reorder for brevity.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D142250
Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the
same for `makeMutableArrayRef`.
Once all of the places in-tree are using the deduction guides for
`MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated.
Differential Revision: https://reviews.llvm.org/D141814
Current case that triggers BOLT assertion. Marked XFAIL.
In this test case, we reproduce the behavior seen in gcc where the
base address of a jump table is decremented by some number and ends up
at the exact addess of a jump table from another function. After
linking, the instruction references another jump table and that
confuses BOLT.
Reviewed By: #bolt, Amir
Differential Revision: https://reviews.llvm.org/D138245
Changed contribution data structure to 64 bit. I added the 32bit and 64bit
accessors to make it explicit where we use 32bit and where we use 64bit. Also to
make sure sure we catch all the cases where this data structure is used.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D139379