Commit Graph

900 Commits

Author SHA1 Message Date
Vasily Leonenko
ad79d51778 [PR] Instrumentation: Generate and use _start and _fini trampolines
Summary:
This commit implements new method for _start & _fini functions hooking
which allows to use relative jumps for future PIE & .so library support.
Instead of using absolute address of _start & _fini functions known on
linking stage - we'll use dynamically created trampoline functions and
use corresponding symbols in instrumentation runtime library.

As we would like to use instrumentation for dynamically loaded binaries
(with PIE & .so), thus we need to compile instrumentation library with
"-fPIC" flag to support relative address resolution for functions and
data.

For shared libraries we need to handle initialization of instrumentation
library case by using DT_INIT section entry point.

Also this commit adds detection if the binary is executable or shared
library based on existence of PT_INTERP header. In case of shared
library we save information about real library init function address
for further usage for instrumentation library init trampoline function
creation and also update DT_INIT to point instrumentation library init
function.

Functions called from init/fini functions should be called with forced
stack alignment to avoid issues with instructions which relies on it.
E.g. optimized string operations.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD30092316)
2021-06-19 04:08:35 +08:00
Amir Ayupov
60b10a8ead [BOLT][NFC] Unify isTailCall interface across X86 and AArch64
Summary:
Move the common code into MCPlusBuilder.h.
Use group 1 `kTailCall` MCAnnotation instead of dynamically allocated
annotation.
This diff reduces the processing time overhead to 1.5% vs using
TAILJMP opcode.

(cherry picked from FBD30055585)
2021-07-29 17:28:51 -07:00
Maksim Panchenko
89a2e16037 [BOLT] Support PLT sections with variable entry sizes
Summary:
The linker can generate 8- or 16-byte entries in .plt.got and .plt.sec
sections. On X86, the main differentiator is the presence of endbr64
instruction at the beginning of the entry. Detect the instruction and
adjust the size accordingly.

(cherry picked from FBD29847639)
2021-07-14 01:35:34 -07:00
Amir Ayupov
c33f08e7df [BOLT] Update build instructions in README
Summary: Remove llvm.patch from build instructions.

(cherry picked from FBD29973395)
2021-07-28 14:45:10 -07:00
Joey Thaman
a7e2a8f946 [BOLT] Tail Duplication active pass
Summary:
Amended the Tail Duplication
analysis pass to do the tail duplication in question

(cherry picked from FBD29833794)
2021-07-16 11:45:44 -07:00
Vasily Leonenko
68be8caf3f RewriteInstance: account .stab and .stabstr as debug sections
Summary:
.stab and .stabstr are special sections containing debugging
information and strings associated with the debugging information.
This commit adds them to the list of debugging sections, so
these sections can be removed for output binary.

Vasily Leonenko,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD29746153)
2021-06-25 15:42:56 +08:00
James Luo
0df7bf7b8b [BOLT][CSSPGO] Handle indirect call promotion in Pseudo Probe Integration
Summary:
Match new direct call generated during ICP to correct pseudo probe

New call is matched to the probes of original call instruction.

(cherry picked from FBD29591662)
2021-07-16 16:05:18 -07:00
James Luo
3e55dea4dd [BOLT][CSSPGO] Encode pseudo probe section to binary
Summary:
Update .pseudo_probe section in input binary

DFS inline tree and emit pseudo probes with updated addresses

(cherry picked from FBD29522142)
2021-07-15 14:58:32 -07:00
Joey Thaman
2f46660559 [BOLT] Tail duplication analysis pass
Summary:
Created a binary pass that records how many
times tail duplication would be used and how many cache
misses it would theoretically stop

(cherry picked from FBD29619858)
2021-07-01 07:11:26 -07:00
Zino Benaissa
60b15062e1 [BOLT] Dump dynamic execution per instruction opcode
Summary:
We extended DynoStats to dump the histogram per instruction opcode. By
default the dump is turned off. Use '-print-dyno-opcode-stats' to enable
the dump.

BOLT also dumps for each instruction opcode the maximum execution count and
corresponding function name and basic block offsets where the instruction
occurs. Below is a sample of the dump:

                   Opcode,    Execution Count,      Max Exec Count, Function Name:Offset
                  SHR8rCL,                232,                 232, _ZNK5folly14AsyncSSLSocket4goodEv:53
                VPADDDYrr,              13956,                 388, chacha20_encrypt_bytes.part.0/3:736
               PMOVSXBWrr,                  4,                   2, ares_expand_name/1:264
                VMOVAPSmr,               1082,                  43, chacha20_encrypt_bytes.part.0/3:2864
                VPSHUFBrr,               9540,                1667, chacha20_encrypt_bytes.part.0/3:4416
            VPUNPCKLDQYrr,               1102,                 188, jsimd_ycc_rgb_convert_avx2/1:125
          VPBROADCASTQYrm,                 39,                  39, chacha20_encrypt_bytes.part.0/3:400
               PMOVSXWDrr,                  8,                   2, ares_expand_name/1:264
                   VPORrr,                817,                 129, jsimd_idct_islow_avx2/1:41
                  PSLLDri,            8690752,               65644, blockmix_salsa8_xor/1:1424

(cherry picked from FBD28859624)
2021-05-24 21:33:43 -07:00
Maksim Panchenko
c9f5f47b51 [BOLT] Add support for .plt.sec and refactor PLT-reading code
Summary:
A binary can contain multiple PLT sections with different name and
attributes (such as an entry size). Extend the support to .plt.sec and
refactor the code to make future extensions simpler.

(cherry picked from FBD29502107)
2021-06-30 14:41:41 -07:00
Joey Thaman
4c12afc1f4 [BOLT][NFC] Resolved all clang-12 warnings for bolt
Summary:
clang-12 now compiles bolt without warnings.
Some warnings were fixed if possible while others were suppressed by
doing (void)variable for unused variable warnings or moving code inside
assert statements of LLVM_DEBUG blocks.

(cherry picked from FBD29469054)
2021-06-29 12:11:56 -07:00
Maksim Panchenko
1de0746790 [BOLT] Read all dynamic relocations and refactor code
Summary:
Add code to read more dynamic relocations (DT_JMPREL) and enforce strict
checks that corresponding sections sizes match .dynamic entry
description.

(cherry picked from FBD29502109)
2021-06-30 14:38:50 -07:00
Alexander Yermolovich
f7499c6711 [BOLT][DWARF] Fix writing out dwo with DWP as input
Summary:
The code for writing out dwo files wasn't handling case where DWP is an input.
Because all the sections are part of the same binary.

One note with current implementation. .debug-str.dwo will have strings for all the dwo objects.
This is because llvm-dwp de-duplicates strings and combines them in to one section. It then re-writes .debug-str-offsets.dwo to point to new .debug-str.dwo section.

(cherry picked from FBD29244835)
2021-06-18 15:57:34 -07:00
Maksim Panchenko
3e5ce1f282 [BOLT][TESTS] Remove dynamic relocations from YAML tests
Summary:
Our YAML objects contain references to dynamic relocations via .dynamic,
but there are no corresponding relocation sections. Change .dynamic
contents to specify no dynamic relocations.

(cherry picked from FBD29502108)
2021-06-30 14:33:59 -07:00
Amir Ayupov
a07d24cc4b [BOLT][NFC] Un-inline checking AArch64 linker veneers out of disassemble loop
Summary:
Move the AArch64 `matchLinkerVeneer` check out of a for-loop
in `BinaryFunction::disassemble`

(cherry picked from FBD29411348)
2021-06-25 17:52:51 -07:00
Amir Ayupov
c7c0803b59 [BOLT][NFC] Un-inline indirect branch handling out of disassemble loop
Summary:
Move the `processIndirectBranch` switch statement out of a for-loop
in `BinaryFunction::disassemble`

(cherry picked from FBD29411346)
2021-06-25 17:49:43 -07:00
Amir Ayupov
8f751bc058 [BOLT][NFC] Un-inline adding external references out of disassemble loop
Summary:
Move the code that handles true external references (non-unreachable)
out of a for-loop in `BinaryFunction::disassemble`.

(cherry picked from FBD29411345)
2021-06-25 17:32:25 -07:00
Amir Ayupov
8f7a400629 [BOLT][NFC] Delete MoveRelocations entirely
Summary: MoveRelocations are unused. Remove interfaces and emission part.

(cherry picked from FBD29468409)
2021-06-25 17:06:21 -07:00
Maksim Panchenko
38c5887992 [BOLT][NFC] Always process runtime relocations
Summary:
Dynamic relocations applied at runtime should be processed even in
non-relocation mode.

(cherry picked from FBD29311906)
2021-06-22 13:46:06 -07:00
Amir Ayupov
ef1b1e7184 [BOLT][NFC] Refactor handlePCRelOperand
Summary: Move error logging to handlePCRelOperand, reduce code duplication

(cherry picked from FBD29309702)
2021-06-17 17:41:28 -07:00
Amir Ayupov
b964e852d5 [BOLT][NFC] Readability improvements in X86,Aarch64 MCPlusBuilder
Summary: Minor refactorings in target specific MCPlusBuilders to improve readability

(cherry picked from FBD29309701)
2021-06-17 18:22:32 -07:00
James Luo
dea6c247d9 [BOLT][CSSPGO] Relate decoded pseudo probe basic blocks
Summary:
Assign decoded pseudo probe to correlated output block

Pseudo probes can then be encoded to a proper address

(cherry picked from FBD29211688)
2021-06-25 11:42:58 -07:00
Amir Ayupov
521a61b056 [BOLT][NFC] Use MCPlusBuilder::isPseudo
Summary: Consistently use this interface across BOLT codebase

(cherry picked from FBD29171718)
2021-06-16 12:10:20 -07:00
Amir Ayupov
da276d73c7 [BOLT] Handle R_X86_64_64 in flushPendingRelocations
Summary:
Handle R_X86_64_64 the same way as R_X86_64_32;
`getSizeForType` takes care of the size:

```x86_64 ABI relocation types
Name        Value Field  Calculation
R_X86_64_64 1     word64 S + A
R_X86_64_32 10    word32 S + A
```

(cherry picked from FBD29370417)
2021-06-24 12:18:16 -07:00
Maksim Panchenko
f46af9e9bc [BOLT][TESTS] Fix ICF test case
Summary:
Host compiler may generate duplicate functions and as a result BOLT can
fold more than 1 function.

(cherry picked from FBD29347302)
2021-06-23 16:13:30 -07:00
Joey Thaman
be0da0fac2 Throw an error in instrument for dynamic libs
Summary:
In InstrumentatonRuntimeLibrary, throw an error

if the program uses dynamic libraries

(cherry picked from FBD29265147)
2021-06-21 07:45:52 -07:00
Maksim Panchenko
bbbd159ccb [BOLT] Fix undefined symbol warnings/errors
Summary:
When we fold a function in relocation mode, make sure to clear its state
to avoid emitting relocations against undefined symbols.

(cherry picked from FBD29245320)
2021-06-18 14:35:39 -07:00
Sameeran joshi
ba915af1cd [PR][BOLT] Print revision in perf2bolt and bolt-diff modes"
Summary:
Fix issue facebookincubator/BOLT#160
PR facebookincubator/BOLT#172

(cherry picked from FBD29139522)
2021-06-08 23:28:37 +05:30
Rafael Auler
e485a9830b Rebase: [BOLT][DebugFission] Fix reading support for DWP
Summary:
Dived more in to DWARF APIs and llvm-symbolizer this is a more streamline way of doing it, and address base gets set properly.
Writing out dwo files with dwp input will be separate patch.

(cherry picked from FBD31361529)
2021-06-16 09:52:03 -07:00
Vladislav Khmelevsky
a8b9319536 [PR] Patch allocatable relocations for AArch64
Summary: PR facebookincubator/BOLT#166

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28910060)
2021-06-02 00:03:56 +03:00
Vladislav Khmelevsky
2cf9008a60 [PR] Instrumentation: Disable signals on mutex lock
Summary:
When indirect call is instrmented it locks SimpleHashTable's mutex on get() call.
If while locked we we receive a signal and signal handler also will call
indirect function we will end up with deadlock.

PR facebookincubator/BOLT#167

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28909921)
2021-06-04 19:51:06 +03:00
Maksim Panchenko
1efadeedf2 [BOLT] Fix rodata load simplification pass
Summary:
If the target address has a runtime relocation against it, do not
perform the load simplification.

(cherry picked from FBD29091939)
2021-06-13 15:37:31 -07:00
Amir Ayupov
f7f0a571d7 [BOLT][NFC] Suppress addList override warning
Summary:
Suppresses the warning
```
src/DebugData.h:338:20: warning: 'addList' overrides a member function but is not marked 'override' [-Wsuggest-override]
```

(cherry picked from FBD28858201)
2021-06-02 19:12:13 -07:00
James Luo
8a919593c7 [BOLT][CSSPGO] Pseudo probe decoding
Summary:
Make bolt decode pseudo probe section in binary

For more detail of pseudo probe, check https://reviews.llvm.org/D86490.

(cherry picked from FBD28856316)
2021-06-11 13:06:12 -07:00
Alexander Yermolovich
226d1c3b0b [BOLT] Change how DF DWO logging is handled
Summary: Changing assert to a warning when DWO debug information can't be retrieved. Usually due to invalid path.

(cherry picked from FBD29005217)
2021-06-09 12:55:09 -07:00
Amir Ayupov
2da5b12a3d [BOLT] Hugify: check for THP support via sysfs
Summary:
Remove dependence on kernel version check, query sysfs directly
instead.

(cherry picked from FBD28858208)
2021-06-02 19:11:52 -07:00
Maksim Panchenko
7bccf8d25d [BOLT][NFC] Fix debug info printouts for inlined functions
Summary:
While printing debug info for instructions, we should use line tables
from the corresponding DWARF CU which could be different from the
containing function CU in case of inlined instructions.

(cherry picked from FBD28908324)
2021-06-04 12:31:31 -07:00
Amir Ayupov
65d227c035 [BOLT][TEST] Fix test case to conform to analyzePICJumpTable pattern matching
Summary:
Make sure that jump table is properly recognized in
`split_func_jump_table_fragment.s`.

(cherry picked from FBD28839976)
2021-06-02 10:50:47 -07:00
James Luo
1c06193d0f [BOLT] Resolve JumpTable namespace issue in pseudo probe decoder migration
Summary: This diff fixes the JumpTable namespace conflicts during the migration of pseudo probe decoder.

(cherry picked from FBD28859927)
2021-06-02 22:46:57 -07:00
Maksim Panchenko
a26370389a [BOLT][NFC] Disable ProcessAllSections in RuntimeDyld
Summary:
FBD55943 changed the way ProcessAllSections works in RuntimeDyld. After
the change, all sections, including symbol table, section table, etc.
are loaded into memory whenever ProcessAllSections is enabled.

In BOLT we rely on RuntimeDyld for processing sections with relocations.
These include most allocatable sections and additionally .debug_line.
The latter is skipped by RuntimeDyld without ProcessAllSections flag.
If we enable ProcessAllSections, we will have to deal with allocating
memory for more sections than we need (see above) and later to filter
them out.

The alternative is to mark all sections that we actually plan to use as
"required for execution" (using RuntimeDyld terminology). For
.debug_line section on ELF it means adding SHF_ALLOC flag. On MachO,
RuntimeDyld currently treats all sections as required.

(cherry picked from FBD28729398)
2021-05-26 16:23:34 -07:00
Vladislav Khmelevsky
5a6c379f5b [PR] Instrumentation: Emit paddings to preserve data alignment
Summary:
Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

facebookincubator/BOLT#156

(cherry picked from FBD28521843)
2021-05-14 14:09:05 +03:00
Vladislav Khmelevsky
79807d99fe [PR] Introduce loop inversion pass
Summary:
This patch introduces LoopInversionPass. Its main purpose is to ensure
that the loop layout is optimal depending on the profile information. So
if profile information shows that the loop is used, the unconditional
jump instruction must be executed only once and vice-versa. Please take
a look to the pass header file and test for more details.

Also change link_fdata script a bit, to be able to change FDATA prefix,
like FileCheck does.

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

PR facebookincubator/BOLT#153

(cherry picked from FBD28391811)
2021-05-11 20:59:13 +03:00
Amir Ayupov
12e9fec697 Rebase: [BOLT] DebugFission Support
Summary:
Implemented support for Debug Fission.
For the most part it doesn't impact Monolithic execution path.
One area that was changed is the DW_AT_low_pc/DW_AT_high_pc conversion. Before it was to DW_AT_ranges/DW_AT_low_pc, now DW_AT_low_pc is kept in same place.
Another more visible impact is in Skeleton CU the DW_AT_low_pc is replaced with DW_AT_ranges_base if it's not originally present and bolt converted ranges conversion inside the dwo units.

Output of this are multiple .dwo files with updated debug information.

(cherry picked from FBD29569788)
2021-04-01 11:43:00 -07:00
Amir Ayupov
99d7f90635 [BOLT][NFC][TEST] Added llvm-dwarfdump and llvm-mc to BOLT_TEST_DEPS
(cherry picked from FBD28427352)
2021-05-13 15:36:43 -07:00
Maksim Panchenko
ba6fdb8113 [BOLT] Preserve original jump table relocations
Summary:
Remove relocations against internal function labels, e.g. jump table
relocations, only when overwriting them.

While reading an input file with relocations, we create internal
relocations against code references (we skip PIC relocations).
Later, when we discover jump tables, we remove corresponding relocations
with the assumption that original relocations will either be ignored or
replaced by new relocations. However, it is possible to miss some
references to the jump table, in which case the original entries will
not be ignored. While such situation is abnormal, it is still a
better/safer approach to preserve relocations if we are not replacing
them with new ones.

(cherry picked from FBD28406628)
2021-05-12 23:35:10 -07:00
Maksim Panchenko
81c59d9a54 [BOLT][NFC] Change interface for searching relocations
(cherry picked from FBD28406629)
2021-05-12 23:29:04 -07:00
Amir Ayupov
500edf26c9 [BOLT][NFC] Address warning about ProgramPoint implicit copy constructor
Summary:
Explicit assignment operator can be replaced with an implicit one.
Remove it to allow an implicit copy constructor:
```
bolt/src/Passes/DataflowAnalysis.h:74:8: warning: definition of
implicit copy constructor for 'ProgramPoint' is deprecated because it
has a user-declared copy assignment operator [-Wdeprecated-copy]
  void operator=(const ProgramPoint &PP) {
       ^
bolt/src/Passes/DataflowAnalysis.h:62:14: note: in implicit copy
constructor for 'llvm::bolt::ProgramPoint' first required here
      return ProgramPoint(&*Last);
```

(cherry picked from FBD28335138)
2021-05-10 14:16:25 -07:00
Maksim Panchenko
fe37f1870e [BOLT][NFC] Follow LLVM variable initialization style
(cherry picked from FBD28417604)
2021-05-13 10:50:47 -07:00
Vladislav Khmelevsky
b728bfc70a [PR] Add missing includes
Summary:
Adds missing headers removed by IWYU.
NB: this caused build breakage on ubuntu-latest

Vladislav Khmelevsky,
Advanced Software Technology Lab, Huawei

(cherry picked from FBD28368185)
2021-05-11 15:55:57 +03:00