1. Using ADT/Bitfields.h for hash computation; this is equivalent but shorter than the existing implementation
2. Getting rid of Layout indices for stale matching; using BB->getIndex for indexing
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D155748
Adding some logs related to stale profile matching. The new data can be helpful
to understand how "stale" the input profile is and how well the inference is
able to utilize the stale data.
Example of outputs on clang-10 built with LTO (profile collected on a year-old release):
```
BOLT-INFO: inferred profile for 2101 (18.52% of profiled, 100.00% of stale) functions responsible for 30.95% samples (14754697 out of 47670654)
BOLT-INFO: stale inference matched 89.42% of basic blocks (79052 out of 88402 stale) responsible for 76.99% samples (645737 out of 838719 stale)
```
LTO+AutoFDO:
```
BOLT-INFO: inferred profile for 6146 (57.57% of profiled, 100.00% of stale) functions responsible for 90.34% samples (50891403 out of 56330313)
BOLT-INFO: stale inference matched 74.55% of basic blocks (191295 out of 256589 stale) responsible for 57.30% samples (1288632 out of 2248799 stale)
```
Reviewed By: Amir, maksfb
Differential Revision: https://reviews.llvm.org/D154737
Jump tables may contain a function start address. One real-world example
is when a target basic block contains a recursive tail call that is
later optimized/folded into a jump table target.
While analyzing a jump table, we treat start address similar to an
address past the end of the containing function (a result of
__builtin_unreachable), i.e. we require another "regular" entry for the
heuristic to proceed.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D156206
Specify blocks order used in YAML profile. Needed to ensure profile backwards
compatibility with pre-D155514 DFS order by default.
Reviewed By: #bolt, maksfb
Differential Revision: https://reviews.llvm.org/D156176
A jump table in a split function may contain an entry matching a start
address of another fragment of the function. While converting addresses
to labels, we used to ignore such entries resulting in underpopulated
jump table. Change that, so we always create one label per address.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D156013
* Sort ORC entries in the internal table. Older Linux kernels did not
sort them in the file (only during boot time).
* Add an option to dump sorted ORC tables (--dump-orc).
* Associate entries in the internal ORC table with a BinaryFunction
even when we are not changing the function.
* If the function doesn't have ORC entry at the start, propagate ORC
state from a previous entry.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D155767
In one of the previous diffs LocBuffer was changed to pass by value. This lead to
performance regression running BOLT on binaries with DWARF4 split dwarf.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D155763
Pass the revision to checkout to (cmp-rev) as nfc-check-setup option.
Simpifies the comparison against arbitrary commit, not just the previous one.
Reviewed By: #bolt, rafauler
Differential Revision: https://reviews.llvm.org/D155657
Use layout order in YAML profile reading/writing. Preserve old behavior (DFS order)
under `-profile-use-dfs` option.
Reviewed By: spupyrev
Differential Revision: https://reviews.llvm.org/D155514
Propagate Linux Kernel ORC information read from the file to the whole
function CFG once the graph has been built. We have a choice to either
attach ORC state annotation to every instruction, or to the first
instruction in the basic block to conserve processing memory. I chose to
attach to every instruction under --print-orc option which is currently
on by default.
Depends on D155153, D154815
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D155156
Read ORC (oops rewind capability) info used for unwinding the stack by
Linux Kernel. The info is stored in .orc_unwind and .orc_unwind_ip
sections. There is also a related .orc_lookup section that is being
populated by the kernel during runtime. Contents of the sections are
sorted for quicker lookup by a post-link objtool.
Unless we modify stack access instructions, we don't have to change ORC
info attributed to instructions in the binary. However, we need to
update instruction addresses and sort both sections based on the new
layout.
For pretty printing, we add "--print-orc" option that prints ORC info
next to instructions in code dumps.
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D154815
There are cases in DWARF4 when Skeleton CU has ranges, but dwo CU doesn't.
Bug was introduced in new DWARFRewriter where for DWARF4 it would fall through
to DWARF5 case.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D155033
The DWO Unit DIE, doesn't have low_pc/high_pc, so we were printing this error
for valid cases.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D155032
Setting initial offset of DIE to input DIE. This is to make "printf" debugging
easier.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D155031
This is a preparatory patch for extending DWARFDebugLine to properly
parse line number programs with maximum_operations_per_instruction > 1
for VLIW targets.
Add some scaffolding for handling op-index in line number programs, and
add printouts for that in the table. As this affects a lot of tests,
this is done in a separate commit to get a cleaner review for the actual
op-index implementation.
Verbose printouts are not present in many tests, and adding op-index to
those will require a bit more code changes, so that is done in the
actual implementation patch.
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D152535
We need to explicitly mark DWARFUnitInfo as non-copyable since MSVC's
STL has a `noexcept(false)` move constructor for `unordered_map`; see
the added comment for more details.
An alternative might be using SmallVector instead of std::vector, since
that never tries to copy elements [1]. That would result in a bunch of
API changes though, so I figured a smaller targeted fix was better.
[1] https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h
Reviewed By: ayermolo, maksfb
Differential Revision: https://reviews.llvm.org/D154924
BOLT used `ToolOutputFile::keep` to make sure the intermediary object
file was written to disk for debugging purposes when `--keep-tmp` was
passed. However, since and intermediary `buffer_ostream` was used to
stream to, and this class only writes to its output stream in its
destructor, the object file was lost whenever its destructor wouldn't
run. This could happen, for example, if there is a crash while linking.
This patch makes sure the object file is written to disk immediately
after we're done creating it. This is very useful while debugging
JITLink crashes. This patch also gets rid of creating a temporary file
when `--keep-tmp` is not passed by streaming the object file directly to
a `SmallString`.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D154826
To reduce memory footprint changed so that we process and write out TUs first,
reset DIEBuilder and process CUs. CUs are processed in buckets. First bucket
contains all the CUs with cross CU references. Rest processd one at a time.
clang-17 build in debug mode, by clang-17.
before
8:25.81 real, 834.37 user, 86.03 sys, 0 amem, 79525064 mmem
8:02.20 real, 820.46 user, 81.81 sys, 0 amem, 79501616 mmem
7:52.69 real, 802.01 user, 83.99 sys, 0 amem, 79534392 mmem
after
7:49.35 real, 822.04 user, 66.19 sys, 0 amem, 34934260 mmem
7:42.16 real, 825.46 user, 63.52 sys, 0 amem, 34951660 mmem
7:46.71 real, 821.11 user, 63.14 sys, 0 amem, 34981164 mmem
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D151909
* Some cleanup and minor fixes for the new debug information re-writer before moving on
to productatization.
* The new rewriter wasn't handling binary with DWARF5 and DWARF4 with
-fdebug-types-sections.
* Removed dead cross cu reference code.
* Added support for DW_AT_sibling.
* With the new re-writer abbrev number can change which can lead to offset of Type
Units changing. Before we would just copy raw data. Changed to write out Type
Unit List. This is generated by gdb-add-index.
* Fixed how bolt handles gdb-index generated by gdb-11 with types sections.
Simplified logic that handles variations of gdb-index.
* Clang can generate two type units with the same hash, but different content. LLD
does not de-duplicate when ThinLTO is involved. Changed so that TU hash and
offset are used to make TU's unique.
* It is possible to have references within location expression to another DIE.
Fixed it so that relative offset is updated correctly.
* Removed all the code related to patching.
* Removed dead code. Changed how we handling writting out TUs and TU Index. It now
should fully work for DWARF4 and DWARF5.
* Removed unused arguments from some APIs, changed return type to void, and other
small cleanups.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D151906
This revision implement new mechanism for DWARFRewriter.
In the new mechanism, we adopt the same way with DWARFLinker did.
By parsing Debug information into IR, we are allowed to handle debug information more flexible.
Now the debug information updating process relies on IR and IR will be written out to binary once the updating finished.
A new class was added: DIEBuilder. This class is responsible for parsing debug information and raising it to the IR level.
This class is also used to write out the .debug_info and .debug_abbrev sections.
Since we output brand new Abbrev section we won't need to always convert low_pc/high_pc into ranges.
When conversion does happen we can also remove low_pc entry.
Reviewed By: maksfb, ayermolo
Differential Revision: https://reviews.llvm.org/D130315
The issue was caused by the absence of placement new definition. It
worked for clang and thus passed Phabricator checks, but broke when
compiled with GCC on buildbot.
Full problem description: https://reviews.llvm.org/D153771#4468239
Original patch description:
In absence of instrumentation-file-append-pid option,
global allocator uses shared pages for allocation. However, since it is a
global variable, it gets COW'd after fork if instrumentation-sleep-time
is used, or if a process forks by itself. This means it handles the same
pages to every process which causes hash table corruption. Thus, if we
want shared pages, we need to put the allocator itself in a shared page,
which we do in this commit in __bolt_instr_setup.
I also added a couple of assertions to sanity-check the hash table.
Reviewed By: rafauler, Amir
Differential Revision: https://reviews.llvm.org/D153771
This reverts commit 460a224443.
It breaks building on macOS, and it was landed with a review URL
pointing to some Facebook-internal service.
Also reverts a bunch of follow-ups:
Revert "[BOLT][DWARF] Don't check string offsets"
This reverts commit f9d6f48c8b.
Revert "[BOLT][DWARF] Change to process and write out TUs first then CUs in batches"
This reverts commit 88e95c1e4b.
Revert "[BOLT][DWARF] Output DWO files as they are being processed"
This reverts commit 46ca2e3fcd.
Revert "[BOLT][DWARF] Don't check string offsets"
This reverts commit cfe4a4b04f.
Revert "[BOLT][DWARF] Numerous fixes for a new DWARFRewriter"
This reverts commit 2701a661da.
Summary:
To reduce memory footprint changed so that we process and write out TUs first,
reset DIEBuilder and process CUs. CUs are processed in buckets. First bucket
contains all the CUs with cross CU references. Rest processd one at a time.
clang-17 build in debug mode, by clang-17.
before
8:25.81 real, 834.37 user, 86.03 sys, 0 amem, 79525064 mmem
8:02.20 real, 820.46 user, 81.81 sys, 0 amem, 79501616 mmem
7:52.69 real, 802.01 user, 83.99 sys, 0 amem, 79534392 mmem
after
7:49.35 real, 822.04 user, 66.19 sys, 0 amem, 34934260 mmem
7:42.16 real, 825.46 user, 63.52 sys, 0 amem, 34951660 mmem
7:46.71 real, 821.11 user, 63.14 sys, 0 amem, 34981164 mmem
Differential Revision: https://phabricator.intern.facebook.com/D45883198
Summary:
* Some cleanup and minor fixes for the new debug information re-writer before moving on
to productatization.
* The new rewriter wasn't handling binary with DWARF5 and DWARF4 with
-fdebug-types-sections.
* Removed dead cross cu reference code.
* Added support for DW_AT_sibling.
* With the new re-writer abbrev number can change which can lead to offset of Type
Units changing. Before we would just copy raw data. Changed to write out Type
Unit List. This is generated by gdb-add-index.
* Fixed how bolt handles gdb-index generated by gdb-11 with types sections.
Simplified logic that handles variations of gdb-index.
* Clang can generate two type units with the same hash, but different content. LLD
does not de-duplicate when ThinLTO is involved. Changed so that TU hash and
offset are used to make TU's unique.
* It is possible to have references within location expression to another DIE.
Fixed it so that relative offset is updated correctly.
* Removed all the code related to patching.
* Removed dead code. Changed how we handling writting out TUs and TU Index. It now
should fully work for DWARF4 and DWARF5.
* Removed unused arguments from some APIs, changed return type to void, and other
small cleanups.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: https://phabricator.intern.facebook.com/D46168257
Summary:
This revision implement new mechanism for DWARFRewriter.
In the new mechanism, we adopt the same way with DWARFLinker did.
By parsing Debug information into IR, we are allowed to handle debug information more flexible.
Now the debug information updating process relies on IR and IR will be written out to binary once the updating finished.
A new class was added: DIEBuilder. This class is responsible for parsing debug information and raising it to the IR level.
This class is also used to write out the .debug_info and .debug_abbrev sections.
Since we output brand new Abbrev section we won't need to always convert low_pc/high_pc into ranges.
When conversion does happen we can also remove low_pc entry.
Differential Revision: https://phabricator.intern.facebook.com/D39484421
Tasks: T117448832
There was a bug in a code that pre-populated line string for a case where parts
of .debug_line are not processed by BOLT, but copied as raw data. We were not
switching sections. This resulted in parts of the binary being over-written with
debug data.
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D154544
Create LinuxKernelRewriter and move kernel-specific code to this class.
Depends on D154023
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D154024
Use new MetdataRewriter interface to update pseudo probes and move
ProbeDecoder out of BinaryContext into new PseudoProbeRewriter class.
Depends on D154021
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D154022
Differential Revision: https://reviews.llvm.org/D154023
Migrate SDT markers processing to the new MetadataRewriter interface.
Depends on D154020
Reviewed By: Amir
Differential Revision: https://reviews.llvm.org/D154021
Introduce the MetadataRewriter interface to handle updates for various
types of auxiliary data stored in a binary file.
To implement metadata processing using this new interface, all metadata
rewriters should derive from the RewriterBase class and implement
one or more of the following methods, depending on the timing of metadata
read and write operations:
* preCFGInitializer()
* postCFGInitializer() // TBD
* preEmitFinalizer() // TBD
* postEmitFinalizer()
By adopting this approach, we aim to simplify the RewriteInstance class
and improve its scalability to accommodate new extensions of file formats,
including various metadata types of the Linux Kernel.
Differential Revision: https://reviews.llvm.org/D154020
In a very rare case that mmap call fails, we'll at least get a message
instead of segfault.
Reviewed By: rafauler, Amir
Differential Revision: https://reviews.llvm.org/D154056