Commit Graph

1806 Commits

Author SHA1 Message Date
Amir Ayupov
7248e57a4b [BOLT][NFC] Fix duplicate word typo
Based on https://reviews.llvm.org/D137338
2023-09-01 13:29:01 -07:00
Elvina Yakubova
777e268b81 [BOLT][test] Enable exceptions_split tests for AArch64
Since the issue with trap value is fixed in D158191, it now should pass
on both platforms.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D158899
2023-09-01 10:45:53 +03:00
Job Noorman
eafe4ee2e8 [BOLT] Rename isLoad/isStore to mayLoad/mayStore
As discussed in D159266, for some instructions it's impossible to know
statically if they will load/store (e.g., predicated instructions).
Therefore, mayLoad/mayStore are more appropriate names.
2023-09-01 09:36:05 +02:00
Job Noorman
76f040bda6 [BOLT] Provide generic implementations for isLoad/isStore
`MCInstrDesc` provides the `mayLoad` and `mayStore` flags that seem
appropriate to use as a target-independent way to implement `isLoad` and
`isStore`.

I believe this is currently good enough to use for the RISC-V target as
well. I've provided a test for this that checks the generated dyno
stats (which seems to be the only thing both `isLoad` and `isStore` are
used for).

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D159266
2023-09-01 09:36:05 +02:00
Amir Ayupov
8f9006bfa0 [BOLT][test] Move asm-dump.c to runtime/X86
Since the test executes instrumented version of the binary, move it under
runtime/X86. Note that it can be adjusted to also run under AArch64 now that
instrumentation is supported.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D159298
2023-08-31 10:59:28 -07:00
spupyrev
1256ef274c [BOLT] Fine-tuning hash computation for stale matching
Fine-tuning hash computation for stale matching:
- introducing a new "loose" basic block hash that allows to match many more blocks than before;
- tweaking params of the inference algorithm that find (slightly) better solutions;
- added more meaningful tests for stale matching.

Tested the changes on several open-source benchmarks (clang, rocksdb, chrome)
and one prod workload using different compiler modes (LTO/PGO etc). There is
always an improvement in the quality of inferred profiles.
(The current implementation is still not optimal but the diff is a step forward;
I am open to further suggestions)

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D156278
2023-08-31 07:29:02 -07:00
Sinan Lin
9c99e9fd68 [BOLT] Fix a bug related to iterators in ReorderData pass
If `Itr` is the last element and then `std::next(Itr)` will be
`Range.end()`, so that the statement `std::next(Itr)->second` is
a UB.

Reviewed By: yota9, maksfb

Differential Revision: https://reviews.llvm.org/D159177
2023-08-31 11:10:25 +08:00
hezuoqiang
83f5497155 [BOLT] BL/BH are considered aliases in regreassign
The relationship of X86 registers is shown in the diagram. BL and BH do
not have a direct alias relationship. However, if the BH register cannot be
swapped, then the BX/EBX/RBX registers cannot be swapped as well, which
means that BL register also cannot be swapped. Therefore, in the presence
of BX/EBX/RBX registers, BL and BH have an alias relationship.

    ┌────────────────┐
    │  RBX           │
    ├────┬───────────┤
    │    │ EBX       │
    ├────┴──┬────────┤
    │       │   BX   │
    ├───────┼───┬────┤
    │       │BH │BL  │
    └───────┴───┴────┘

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D155098
2023-08-28 22:57:24 +08:00
Job Noorman
475a93a07a [BOLT] Calculate output values using BOLTLinker
BOLT uses `MCAsmLayout` to calculate the output values of functions and
basic blocks. This means output values are calculated based on a
pre-linking state and any changes to symbol values during linking will
cause incorrect values to be used.

This issue can be triggered by enabling linker relaxation on RISC-V.
Since linker relaxation can remove instructions, symbol values may
change. This causes, among other things, the symbol table created by
BOLT in the output executable to be incorrect.

This patch solves this issue by using `BOLTLinker` to get symbol values
instead of `MCAsmLayout`. This way, output values are calculated based
on a post-linking state. To make sure the linker can update all
necessary symbols, this patch also makes sure all these symbols are not
marked as temporary so that they end-up in the object file's symbol
table.

Note that this patch only deals with symbols of binary functions
(`BinaryFunction::updateOutputValues`). The technique described above
turned out to be too expensive for basic block symbols so those are
handled differently in D155604.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D154604
2023-08-28 10:13:07 +02:00
Kazu Hirata
d791fa26a9 [BOLT] Use SmallPtrSet::contains (NFC) 2023-08-27 13:18:38 -07:00
Rafael Auler
b9deec1cd9 [BOLT] Fix cross-compilation build
Don't enable BOLT runtime when cross compiling as we don't
support this scenario yet.

Differential Revision: https://reviews.llvm.org/D158906
2023-08-25 17:33:04 -07:00
Rafael Auler
b59cf211a0 [BOLT] Don't choke on injected functions' IO map
AddressMap would fail lookup for injected functions and crash
BOLT. Fix that.

Reviewed By: #bolt, maksfb, jobnoorman

Differential Revision: https://reviews.llvm.org/D158685
2023-08-24 12:02:55 -07:00
Rafael Auler
b5ac1697c8 [BOLT] Give precedence to first AddressMap entries
When parsing AddressMap and there is a conflict in keys,
where two entries share the same key, consider the first entry as the
correct one, instead of the last. This matches previous behavior in
BOLT and covers case such as BOLT creating a new basic block but
sharing the same input offset of the previous (or entry) basic
block. In this case, instead of translating debuginfo to use the newly
created BB, translate using the BB that was originally read from
input. This will increase our chances of getting debuginfo right.

Tested via binary comparison in tests:
X86/dwarf4-df-input-lowpc-ranges.test
X86/dwarf5-df-input-lowpc-ranges.test

Reviewed By: #bolt, maksfb, jobnoorman

Differential Revision: https://reviews.llvm.org/D158686
2023-08-24 11:59:43 -07:00
Eymen Ünay
d7add58cff [BOLT] Fix typo in comment
Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D157206
2023-08-24 09:37:48 -07:00
Elvina Yakubova
83cb541f80 [BOLT][Instrumentation][test] Fix tests
Extend tests for instrumentation

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D151920
2023-08-24 19:34:58 +03:00
Elvina Yakubova
87e9c42495 [BOLT][Instrumentation] AArch64 instrumentation support in runtime
This commit adds support for AArch64 in instrumentation runtime library,
including AArch64 system calls.
Also this commit divides syscalls into target-specific files.

Reviewed By: rafauler, yota9

Differential Revision: https://reviews.llvm.org/D151942
2023-08-24 19:34:57 +03:00
Elvina Yakubova
70405a0bf7 [BOLT][Instrumentation] Add support for MacOS counters
This commit adds support for generation of getter counters for AArch64 MacOS.
Continuation of work D151899

Reviewed By: rafauleir, yota9

Differential Revision: https://reviews.llvm.org/D151901
2023-08-24 19:34:57 +03:00
Elvina Yakubova
6e4c230525 [BOLT][Instrumentation] Initial instrumentation support for AArch64
This commit adds code generation for AArch64 instrumentation,
including direct and indirect calls support.

Reviewed By: rafauler, yota9

Differential Revision: https://reviews.llvm.org/D151899
2023-08-24 19:34:57 +03:00
Denis Revunov
82ed7896cf [BOLT] Add test for emitting trap value
Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D158191
2023-08-24 01:30:02 +03:00
Denis Revunov
28fd2ca142 [BOLT] Fix trap value for non-X86
The trap value used by BOLT was assumed to be single-byte instruction.
It made some functions unaligned on AArch64(e.g exceptions-instrumentation test)
and caused emission failures. Fix that by changing fill value to StringRef.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D158191
2023-08-24 01:29:41 +03:00
Denis Revunov
dfc7599296 [BOLT][Instrumentation] Add test for append-pid option
Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D154121
2023-08-23 23:50:32 +03:00
Denis Revunov
a86dd9ae60 [BOLT][Instrumentation] Fix indirect call profile in PIE
Because indirect call tables use static addresses for call sites, but pc
values recorded by runtime may be subject to ASLR in PIE, we couldn't
find indirect call descriptions by their runtime address in PIE. It
resulted in [unknown] entries in profile for all indirect calls. We need
to substract base address of .text from runtime addresses to get the
corresponding static addresses. Here we create a getter for base address
of .text and substract it's return value from recorded PC values. It
converts them to static addresses, which then may be used to find the
corresponding indirect call descriptions.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D154121
2023-08-23 23:50:31 +03:00
Denis Revunov
a799298152 [BOLT][Instrumentation] Keep profile open in WatchProcess
When a binary is instrumented with --instrumentation-sleep-time and
instrumentation-wait-forks options and lauched, the profile is
periodically written until all the forks die. The problem is that we
cannot wait for the whole process tree, and we have no way to tell when
it's safe to read the profile. Hovewer, if we keep profile open
throughout the life of the process tree, we can use fuser to determine
when writing is finished.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D154436
2023-08-23 23:50:31 +03:00
zhoujiapeng
62020a3a7e [BOLT] Implement createRelocation for AArch64
The implementation is based on the X86 version, with the same code
of symbol and addend extraction. The differences include the
support for RelType `R_AARCH64_CALL26` and the deletion of 8-bit
relocation.

Reviewed By: rafauler

Differential Revision: https://reviews.llvm.org/D156018
2023-08-23 00:53:32 +08:00
zhoujiapeng
9fee2ac044 [BOLT][NFC] Split createRelocation in X86 and share the second part
This commit splits the createRelocation function for the X86 architecture
into two parts, retaining the first half and moving the second half to a
new function called extractFixupExpr. The purpose of this change is to make
extractFixupExpr a shared function between AArch64 and X86 architectures,
increasing code reusability and maintainability.

Child revision: https://reviews.llvm.org/D156018

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D157217
2023-08-23 00:29:25 +08:00
Kazu Hirata
ff22d125a7 [BOLT] Fix an unused variable warning
This patch fixes:

  bolt/lib/Core/BinaryFunction.cpp:4117:20: error: unused variable
  'FragmentBaseAddress' [-Werror,-Wunused-variable]
2023-08-21 07:57:18 -07:00
Job Noorman
23c8d38258 [BOLT] Calculate input to output address map using BOLTLinker
BOLT uses MCAsmLayout to calculate the output values of basic blocks.
This means output values are calculated based on a pre-linking state and
any changes to symbol values during linking will cause incorrect values
to be used.

This issue was first addressed in D154604 by adding all basic block
symbols to the symbol table for the linker to resolve them. However, the
runtime overhead of handling this huge symbol table turned out to be
prohibitively large.

This patch solves the issue in a different way. First, a temporary
section containing [input address, output symbol] pairs is emitted to the
intermediary object file. The linker will resolve all these references
so we end up with a section of [input address, output address] pairs.
This section is then parsed and used to:
- Replace BinaryBasicBlock::OffsetTranslationTable
- Replace BinaryFunction::InputOffsetToAddressMap
- Update BinaryBasicBlock::OutputAddressRange

Note that the reason this is more performant than the previous attempt
is that these symbol references do not cause entries to be added to the
symbol table. Instead, section-relative references are used for the
relocations.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D155604
2023-08-21 10:36:20 +02:00
Hans Wennborg
d158ee576b bolt/test/X86/bug-function-layout-execount.s: Require x86 and asserts
Follow-up to D152959:
--debug-only= requires an asserts build.
The test also needs the x86 target.
2023-08-18 14:02:05 +02:00
hezuoqiang
a37e8a4bdc [BOLT] Consider Code Fragments during regreassign
During register swapping, the code fragments associated with the
function need to be swapped together (which may be generated during
PGO optimization).

Fix https://github.com/llvm/llvm-project/issues/59730

Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D141931
2023-08-18 16:46:18 +08:00
spupyrev
9460ebd130 [BOLT] Fix sorting functions by execution count
I noticed that `-reorder-functions=exec-count` doesn't work as expected due to
a bug in the comparison function (which isn't symmetric). It is questionable
whether anyone would want to ever use the sorting method (as sorting by say
density is much better in all cases) but it is probably better to fix the bug.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D152959
2023-08-16 15:08:18 -07:00
Alexander Yermolovich
2c784f7d26 [BOLT][DWARF] Fix handling of invalid DIE references
Compiler can generate DIE References that are invalid. Previously BOLT could
assert when writing out IR to .debug_info. Changed where DIE offsets are changed
so that it's always done. Thus making sure that assert is not triggered.

Added more specific warnings, and ability to print out invalid referenced DIE
offset when verbosity >=1.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D157746
2023-08-14 17:28:24 -07:00
Alexander Yermolovich
bce5743e21 [BOLT][DWARF] Fix location list order
This bug crept in when CU partitioning was introduced. It manifests itself when
there are CUs that use location lists that come before CUs that are part of
thin-lto. BOLT processes CUs with cross CU references first (these are produced
by thin-lto). When we wrote out all the location lists we did it in original
order. Since DWARF4 uses offsets directly in to .debug_loc those offsets in DIEs
became wrong.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D157908
2023-08-14 17:27:22 -07:00
Kazu Hirata
363be89c7d [BOLT] Use static_assert (NFC) 2023-08-10 18:44:17 -07:00
Alexander Yermolovich
0807028d03
Update README.md
BOLT supports DWARF5.
2023-08-08 18:46:43 -07:00
Alexander Yermolovich
43fe9dcb71 [BOLT][DWARF][NFC] Remove addIndexAddress
Removed unused API DebugAddrWriter::addIndexAddress.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D157357
2023-08-08 18:23:04 -07:00
Alexander Yermolovich
55a1d959a5 [BOLT][DWARF] Always use new low_pc for call_site
Changed to creating a new index all the time. This code was legacy of when we
couldn't change the size of .debug_info, and led to subtle bugs where index for
new entries was pointing to a wrong address.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D157356
2023-08-08 18:21:24 -07:00
Alexander Yermolovich
96cfc5f840 [BOLT][DWARF] Always use new low_pc for exprloc
Changed to creating a new index all the time. This code was legacy of when we
couldn't change the size of .debug_info, and led to subtle bugs where index for
new entries was pointing to a wrong address.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D157355
2023-08-08 18:20:06 -07:00
Alexander Yermolovich
9ffdc2b457 [BOLT][DWARF][NFC] Add function to print DIE
This is purely to make debugging easier for developers. Now that we moved to IR
the print out of DIEs is lacking. This function will lazily parse DIE and use
DWARFDie dump function.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D157354
2023-08-08 18:18:24 -07:00
chenpeihao3
892305adb1 [BOLT] fix the endless loop of --iterative-guess
Solve the endless loop caused by iterative guess. The main function of this option is guessEdgeByIterativeApproach, where the do while loop involves guessPredEdgeCounts and guessSuccessEdgeCounts. In some scenarios, the do while loop will fall into an endless loop. The reason is that although the GuessedPredEdgeCounts function has guessed the pred-edges counts, GuessedArcs does not insert the corresponding BB block, resulting in the changed variable always being true.

Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D154922
2023-08-04 17:02:47 +08:00
Alexander Yermolovich
e1ceae4b60 [BOLT][DWARF] Fix setting DW_AT_ranges offset of Skeleton CU
Fixed a bug where when Skelton CU had DW_AT_ranges, it the output CU DW_AT_ranges
 offset was relative, and not absolute.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D156958
2023-08-03 10:34:21 -07:00
Alexander Yermolovich
efb8a1c906 [BOLT][DWARF] Delete DW_AT_low_pc when converting to ranges
Now that we have new DWARF Rewriter we can remove DW_AT_low_pc when converting
DW_AT_low_pc/DW_AT_high_pc to DW_AT_ranges. Which closer follows DWARF spec.
Leaving CU DW_AT_low_pc in place. Reading the spec I think it's needed.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D156957
2023-08-03 10:33:04 -07:00
Alexander Yermolovich
1713f84983 [BOLT][DWARF] Opt out test from aarch64
Limiting the test to only X86.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D156765
2023-07-31 18:22:11 -07:00
Alexander Yermolovich
9eb0df3aa9 [BOLT][DWARF] Fix handling of inlined subroutine with no output PC
Clang can generate DW_TAG_inlined_subroutine with low_pc 0. With split dwarf
this led to range offset being a negative number.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D156742
2023-07-31 17:12:07 -07:00
Amir Ayupov
2dea832ef0 [BOLT][test] Add missing stderr redirections
BOLT-ERROR and BOLT-WARNING messages are output to stderr which is not captured
by piping to FileCheck. Redirect stderr to stdout to fix that in tests.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D156340
2023-07-31 16:17:09 -07:00
Amir Ayupov
d796f36fbc [BOLT][NFC] Simplify DataAggregator
Use short loop instead of duplicating the code for setHasProfileAvailable.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D154749
2023-07-31 14:54:41 -07:00
spupyrev
299ec3c22a [BOLT] Fixing macOS build
Fixing build after https://reviews.llvm.org/D153039

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D156734
2023-07-31 13:55:46 -07:00
Amir Ayupov
70e76e0982 [BOLT] Fix instrumenting conditional tail calls
We identify instructions to be instrumented based on Offset annotation.

BOLT "expands" conditional tail calls into a conditional jump to a basic block
with unconditional tail call. Move Offset annotation from former CTC to the tail
call.

For expanded CTC we keep Offset attached to the original instruction which is
converted into a regular conditional jump, while leaving the newly created tail
call without an Offset annotation. This leads to attempting the instrumentation
of the conditional jump which points to the basic block with an inherited input
offset thus creating an invalid edge description. At the same time, the newly
created tail call is skipped entirely which means we're not creating a call
description for it.

If we instead reassign Offset annotation from the conditional jump to the tail
call we fix both issues. The conditional jump will be skipped not creating an
invalid edge description, while tail call will be handled properly (unformly
with regular calls).

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D156389
2023-07-31 13:52:50 -07:00
Amir Ayupov
b0b566b5da [BOLT][YAML] Only read first profile per function
Work around the issue of multiple profiles per function.
Can happen with a stale profile which has separate profiles
that in a new binary got merged and became aliases.

Reviewed By: #bolt, maksfb

Differential Revision: https://reviews.llvm.org/D156644
2023-07-31 13:48:09 -07:00
spupyrev
b402487b74 [BOLT] A new code layout algorithm for function reordering [3b/3]
This is a new algorithm for function layout (reordering) based on the call graph
extracted from a profile data; see diffs down the stack for more details.

This layout is very similar to the existing hfsort+, but perhaps a little better
on some benchmarks. The goals of the change is as follows:

(i) rename and replace hfsort+ with a newer (hopefully better) implementation.
I'd prefer to keep both algs together for some time to simplify evaluation and
transition, but do want to remove hfsort+ once we're confident that there are
no regressions.

(ii) unify the implementation of code layout algorithms across LLVM. Currently
Passes/HfsortPlus.cpp and Utils/CodeLayout.cpp share many implementation-specific
details; this diff unifies the code.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D153039
2023-07-31 10:49:06 -07:00
Alexander Yermolovich
75f770a68f [BOLT][DWARF] Update handling of size 1 ranges and fix sub-programs with ranges
When output range is only one entry, and input is low_pc/high_pc do not convert
to ranges. This helps with size of .debug_ranges/.debug_rnglists. It also helps
when either low_pc/high_pc is 0. We not generating potentially invalid ranges
that result in LLDB error.

Also fixed handling of DW_AT_subprogram with ranges. This can be created with
-fbasic-block-sections=all.

Reviewed By: maksfb

Differential Revision: https://reviews.llvm.org/D156374
2023-07-30 17:32:32 -07:00