Adds BOLT_TARGETS_TO_BUILD, which defaults to the intersection of
X86;AArch64 and LLVM_TARGETS_TO_BUILD, but allows configuration to
alter that -- for instance omitting one of those two targets even if
llvm supports both.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D148847
The BOLT runtime is specifically hard coded for x86_64 linux or x86_64
darwin. (Using x86_64 syscalls, hardcoding syscall numbers.)
Make it very clear this is for those specific pair of systems.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D148825
Defaults to ON for x86_64 && (Linux | Darwin).
If enabled, checks that /proc/self/map_files is readable. Some systems are configured so that getdents fails with EPERM.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D148742
This typedef is only used inside the RewriteInstance source file, let's not
expose it in the header file -- even if private.
Differential Revision: https://reviews.llvm.org/D148667
Shdr's are not necesarily size 2^n, and there is no reason to align to
that boundary if they are.
Differential Revision: https://reviews.llvm.org/D148666
When input is DWP with DWARF5 bolt wasn't handling correctly CUs that didn't
have TU references. Which resulted in a crash.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D148589
When a cold function is too large, its section gets deregistered.
However, the section is still dereferenced later to get its RuntimeDyld
ID. This patch moves the deregistration to after the last dereference.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D148427
The following test fails when enabling UBSan due to a left shift of a
negative value:
> runtime error: left shift of negative value -2
BOLT :: AArch64/ext-island-ref.s
This patch fixes this by using a multiplication instead of a shift.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D148218
section-end-sym.s contains x86_64 assembly instruction execution on target.
I have changed REQURIES: field system-linux --> x86_64-linux
This came up while testing LLVM 16.0.1 release on AArch64 Linux.
When there is a direct jump right after an indirect one, in
the absence of code jumpting to this direct jump, this is obviously
dead code. However, BOLT was failing to recognize that by mistakenly
placing both jmp instructions in the same basic block, and creating
wrong successor edges. Fix that, so we can safely run UCE on
that. This bug also causes validateCFG to fail and BOLT to crash if it
is running ICP on that function.
Reviewed By: #bolt, Amir
Differential Revision: https://reviews.llvm.org/D148055
All users of MCCodeEmitter::encodeInstruction use a raw_svector_ostream
to encode the instruction into a SmallVector. The raw_ostream however
incurs some overhead for the actual encoding.
This change allows an MCCodeEmitter to directly emit an instruction into
a SmallVector without using a raw_ostream and therefore allow for
performance improvments in encoding. A default path that uses existing
raw_ostream implementations is provided.
Reviewed By: MaskRay, Amir
Differential Revision: https://reviews.llvm.org/D145791
`Function.RawBranchCount` is initialized for fdata profile but not for yaml one.
The diff adds the computation of the field for yaml profiles
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D144211
When computing symbol hashes in BinarySection::hash, we try to find relocations
in the section which reference the passed BinaryData. We do so by doing
lower_bound on data begin offset and upper_bound on data end offset. Since
offsets are relative to the current section, if it is a data from the previous
section, we get underflow when computing offset and lower_bound returns
Relocations.end(). If this data also ends where current section begins,
upper_bound on zero offset will return some valid iterator if we have any
relocations after the first byte. Then we'll try to iterate from lower_bound to
upper_bound, since they're not equal, which in that case means we'll dereference
Relocations.end(), increment it, and try to do so until we reach the second
valid iterator. Of course we reach segfault earlier. In this patch we stop BOLT
from searching relocations for symbols outside of the current section.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D146620
Sometimes, symbols are present that point to the end of a section (i.e.,
one-past the highest valid address). Currently, BOLT either rejects
those symbols when they don't point to another existing section, or errs
when they do and the other section is not executable. I suppose BOLT
would accept the symbol when it points to an executable section.
In any case, these symbols should not be considered while discovering
functions and should not result in an error. This patch implements that.
Note that this patch checks explicitly for symbols whose value equals
the end of their section. It might make more sense to verify that the
symbol's value is within [section start, section end). However, I'm not
sure if this could every happen *and* its value does not equal the end.
Another way to implement this is to verify that the BinarySection we
find at the symbol's address actually corresponds to the symbol's
section. I'm not sure what the best approach is so feedback is welcome.
Reviewed By: yota9, rafauler
Differential Revision: https://reviews.llvm.org/D146215
Move out prepareToParse lambda, generalize it to handle mem events perf process.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D146002
The two methods don't belong in BinaryFunction methods.
Move the dispatch tables into target-specific MCPlusBuilder methods.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D131813
Pre-calculate the register size table in MCPlusBuilder constructor,
similar to `AliasMap`/`SmallerAliasMap` in `initAliases`.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D145828
The golang support creates 2 new data segments, one of them contains
relocations in PIC binaries, so the section must have writable rights.
Currently BOLT creates only one new segment that contains new sections
with RX rights, now also create RW segment if there are any new writable
sections were allocated during BOLT binary processing.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D143390
Leverage move semantics for `std::vector`.
This also makes it consistent with `createInstrumentationSnippet`.
Reviewed By: Elvina
Differential Revision: https://reviews.llvm.org/D145465
It was using a redundant iteration over super regs to build
SmallerAliasMap. Removing this results in exactly the same alias maps
and a noticeable performance gain on targets with a large number of
registers.
Just anecdotally: on my machine, processing a small AArch64 binary went
from 2.7s down to 80ms.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D145779
This patch fixes few problems with supporting dynamic relocations in CI.
1. After dynamic relocations and functions were read search for dynamic
relocations located in functions. Currently we expected them only to be
relative and only to be in constant island. Mark islands of such
functions to have dynamic relocations and create CI access symbol on the
relocation offset, so the BD would be created for such place.
2. During function disassemble and handling address reference for
constant island check if the referred external CI has dynamic
relocation. And if it has one we would continue to refer original CI
rather then creating a local copy.
3. After function disassembly stage mark function that has dynamic reloc
in CI as non-simple. We don't want such functions to be optimized, since
such passes as split function would create 2 copies of CI which we
unable to support currently.
4. During updating output values for BF search for BD located in CI and
update their output locations.
5. On dynamic relocation patching stage search for binary data located
on relocation offset. If it was moved use new relocation offset value
rather then an old one.
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei
Differential Revision: https://reviews.llvm.org/D143748
Some build bots have not been updated to the new minimal CMake version.
Reverting for now and ping the buildbot owners.
This reverts commit 44c6b905f8.
This partly undoes D137724.
This change has been discussed on discourse
https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-cmake-version/66193
Note this does not remove work-arounds for older CMake versions, that
will be done in followup patches.
Reviewed By: mehdi_amini, MaskRay, ChuanqiXu, to268, thieta, tschuett, phosek, #libunwind, #libc_vendors, #libc, #libc_abi, sivachandra, philnik, zibi
Differential Revision: https://reviews.llvm.org/D144509
Remove the usage of StringMap in places where the iteration order
affects the output since the iteration over StringMap is
non-deterministic.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D145194
In case of a function with unknown control flow but with a single jump
table and a single jump table site, we attempt to match the jump table
and a site and update block successors using jump table targets.
Restrict this behavior for split jump tables which have targets in a
fragment function.
Fixes https://github.com/llvm/llvm-project/issues/60795.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D144602
ICF optimization runs multiple passes and the order in which functions
are folded could be dependent on the order they are being processed.
This order is indeterministic as functions are intermediately stored in
std::unordered_map<>. Note that this order is mostly stable, but is not
guaranteed to be and can change e.g. after switching to a different C++
library implementation.
Because the processing (and folding) order is indeterministic, the
previous way of calculating merged function call count could produce
different results.
Change the way we calculate the ICF call count to make it independent of
the function folding/processing order.
Mostly NFC as the output binary should remain the same, the change
affects only the console output.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D144807
When createInstrumentedIndirectCall() was invoked for tail calls, we
attached annotation instruction twice to the new call instruction.
First in createDirectCall(), and then again while copying over the
metadata operands.
As a result, the annotations were not properly stripped for such calls
before the call to freeAnnotations() in LowerAnnotations pass. That lead
to use-after-free while restoring the offsets with setOffset() call.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D144806