llvm-capstone

mirror of https://github.com/capstone-engine/llvm-capstone.git synced 2024-12-14 19:49:36 +00:00

Author	SHA1	Message	Date
Job Noorman	475a93a07a	[BOLT] Calculate output values using BOLTLinker BOLT uses `MCAsmLayout` to calculate the output values of functions and basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue can be triggered by enabling linker relaxation on RISC-V. Since linker relaxation can remove instructions, symbol values may change. This causes, among other things, the symbol table created by BOLT in the output executable to be incorrect. This patch solves this issue by using `BOLTLinker` to get symbol values instead of `MCAsmLayout`. This way, output values are calculated based on a post-linking state. To make sure the linker can update all necessary symbols, this patch also makes sure all these symbols are not marked as temporary so that they end-up in the object file's symbol table. Note that this patch only deals with symbols of binary functions (`BinaryFunction::updateOutputValues`). The technique described above turned out to be too expensive for basic block symbols so those are handled differently in D155604. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D154604	2023-08-28 10:13:07 +02:00
Kazu Hirata	d791fa26a9	[BOLT] Use SmallPtrSet::contains (NFC)	2023-08-27 13:18:38 -07:00
Rafael Auler	b9deec1cd9	[BOLT] Fix cross-compilation build Don't enable BOLT runtime when cross compiling as we don't support this scenario yet. Differential Revision: https://reviews.llvm.org/D158906	2023-08-25 17:33:04 -07:00
Rafael Auler	b59cf211a0	[BOLT] Don't choke on injected functions' IO map AddressMap would fail lookup for injected functions and crash BOLT. Fix that. Reviewed By: #bolt, maksfb, jobnoorman Differential Revision: https://reviews.llvm.org/D158685	2023-08-24 12:02:55 -07:00
Rafael Auler	b5ac1697c8	[BOLT] Give precedence to first AddressMap entries When parsing AddressMap and there is a conflict in keys, where two entries share the same key, consider the first entry as the correct one, instead of the last. This matches previous behavior in BOLT and covers case such as BOLT creating a new basic block but sharing the same input offset of the previous (or entry) basic block. In this case, instead of translating debuginfo to use the newly created BB, translate using the BB that was originally read from input. This will increase our chances of getting debuginfo right. Tested via binary comparison in tests: X86/dwarf4-df-input-lowpc-ranges.test X86/dwarf5-df-input-lowpc-ranges.test Reviewed By: #bolt, maksfb, jobnoorman Differential Revision: https://reviews.llvm.org/D158686	2023-08-24 11:59:43 -07:00
Eymen Ünay	d7add58cff	[BOLT] Fix typo in comment Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D157206	2023-08-24 09:37:48 -07:00
Elvina Yakubova	83cb541f80	[BOLT][Instrumentation][test] Fix tests Extend tests for instrumentation Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D151920	2023-08-24 19:34:58 +03:00
Elvina Yakubova	87e9c42495	[BOLT][Instrumentation] AArch64 instrumentation support in runtime This commit adds support for AArch64 in instrumentation runtime library, including AArch64 system calls. Also this commit divides syscalls into target-specific files. Reviewed By: rafauler, yota9 Differential Revision: https://reviews.llvm.org/D151942	2023-08-24 19:34:57 +03:00
Elvina Yakubova	70405a0bf7	[BOLT][Instrumentation] Add support for MacOS counters This commit adds support for generation of getter counters for AArch64 MacOS. Continuation of work D151899 Reviewed By: rafauleir, yota9 Differential Revision: https://reviews.llvm.org/D151901	2023-08-24 19:34:57 +03:00
Elvina Yakubova	6e4c230525	[BOLT][Instrumentation] Initial instrumentation support for AArch64 This commit adds code generation for AArch64 instrumentation, including direct and indirect calls support. Reviewed By: rafauler, yota9 Differential Revision: https://reviews.llvm.org/D151899	2023-08-24 19:34:57 +03:00
Denis Revunov	82ed7896cf	[BOLT] Add test for emitting trap value Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D158191	2023-08-24 01:30:02 +03:00
Denis Revunov	28fd2ca142	[BOLT] Fix trap value for non-X86 The trap value used by BOLT was assumed to be single-byte instruction. It made some functions unaligned on AArch64(e.g exceptions-instrumentation test) and caused emission failures. Fix that by changing fill value to StringRef. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D158191	2023-08-24 01:29:41 +03:00
Denis Revunov	dfc7599296	[BOLT][Instrumentation] Add test for append-pid option Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D154121	2023-08-23 23:50:32 +03:00
Denis Revunov	a86dd9ae60	[BOLT][Instrumentation] Fix indirect call profile in PIE Because indirect call tables use static addresses for call sites, but pc values recorded by runtime may be subject to ASLR in PIE, we couldn't find indirect call descriptions by their runtime address in PIE. It resulted in [unknown] entries in profile for all indirect calls. We need to substract base address of .text from runtime addresses to get the corresponding static addresses. Here we create a getter for base address of .text and substract it's return value from recorded PC values. It converts them to static addresses, which then may be used to find the corresponding indirect call descriptions. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D154121	2023-08-23 23:50:31 +03:00
Denis Revunov	a799298152	[BOLT][Instrumentation] Keep profile open in WatchProcess When a binary is instrumented with --instrumentation-sleep-time and instrumentation-wait-forks options and lauched, the profile is periodically written until all the forks die. The problem is that we cannot wait for the whole process tree, and we have no way to tell when it's safe to read the profile. Hovewer, if we keep profile open throughout the life of the process tree, we can use fuser to determine when writing is finished. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D154436	2023-08-23 23:50:31 +03:00
zhoujiapeng	62020a3a7e	[BOLT] Implement createRelocation for AArch64 The implementation is based on the X86 version, with the same code of symbol and addend extraction. The differences include the support for RelType `R_AARCH64_CALL26` and the deletion of 8-bit relocation. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D156018	2023-08-23 00:53:32 +08:00
zhoujiapeng	9fee2ac044	[BOLT][NFC] Split createRelocation in X86 and share the second part This commit splits the createRelocation function for the X86 architecture into two parts, retaining the first half and moving the second half to a new function called extractFixupExpr. The purpose of this change is to make extractFixupExpr a shared function between AArch64 and X86 architectures, increasing code reusability and maintainability. Child revision: https://reviews.llvm.org/D156018 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157217	2023-08-23 00:29:25 +08:00
Kazu Hirata	ff22d125a7	[BOLT] Fix an unused variable warning This patch fixes: bolt/lib/Core/BinaryFunction.cpp:4117:20: error: unused variable 'FragmentBaseAddress' [-Werror,-Wunused-variable]	2023-08-21 07:57:18 -07:00
Job Noorman	23c8d38258	[BOLT] Calculate input to output address map using BOLTLinker BOLT uses MCAsmLayout to calculate the output values of basic blocks. This means output values are calculated based on a pre-linking state and any changes to symbol values during linking will cause incorrect values to be used. This issue was first addressed in D154604 by adding all basic block symbols to the symbol table for the linker to resolve them. However, the runtime overhead of handling this huge symbol table turned out to be prohibitively large. This patch solves the issue in a different way. First, a temporary section containing [input address, output symbol] pairs is emitted to the intermediary object file. The linker will resolve all these references so we end up with a section of [input address, output address] pairs. This section is then parsed and used to: - Replace BinaryBasicBlock::OffsetTranslationTable - Replace BinaryFunction::InputOffsetToAddressMap - Update BinaryBasicBlock::OutputAddressRange Note that the reason this is more performant than the previous attempt is that these symbol references do not cause entries to be added to the symbol table. Instead, section-relative references are used for the relocations. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D155604	2023-08-21 10:36:20 +02:00
Hans Wennborg	d158ee576b	bolt/test/X86/bug-function-layout-execount.s: Require x86 and asserts Follow-up to D152959: --debug-only= requires an asserts build. The test also needs the x86 target.	2023-08-18 14:02:05 +02:00
hezuoqiang	a37e8a4bdc	[BOLT] Consider Code Fragments during regreassign During register swapping, the code fragments associated with the function need to be swapped together (which may be generated during PGO optimization). Fix https://github.com/llvm/llvm-project/issues/59730 Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D141931	2023-08-18 16:46:18 +08:00
spupyrev	9460ebd130	[BOLT] Fix sorting functions by execution count I noticed that `-reorder-functions=exec-count` doesn't work as expected due to a bug in the comparison function (which isn't symmetric). It is questionable whether anyone would want to ever use the sorting method (as sorting by say density is much better in all cases) but it is probably better to fix the bug. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D152959	2023-08-16 15:08:18 -07:00
Alexander Yermolovich	2c784f7d26	[BOLT][DWARF] Fix handling of invalid DIE references Compiler can generate DIE References that are invalid. Previously BOLT could assert when writing out IR to .debug_info. Changed where DIE offsets are changed so that it's always done. Thus making sure that assert is not triggered. Added more specific warnings, and ability to print out invalid referenced DIE offset when verbosity >=1. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157746	2023-08-14 17:28:24 -07:00
Alexander Yermolovich	bce5743e21	[BOLT][DWARF] Fix location list order This bug crept in when CU partitioning was introduced. It manifests itself when there are CUs that use location lists that come before CUs that are part of thin-lto. BOLT processes CUs with cross CU references first (these are produced by thin-lto). When we wrote out all the location lists we did it in original order. Since DWARF4 uses offsets directly in to .debug_loc those offsets in DIEs became wrong. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157908	2023-08-14 17:27:22 -07:00
Kazu Hirata	363be89c7d	[BOLT] Use static_assert (NFC)	2023-08-10 18:44:17 -07:00
Alexander Yermolovich	0807028d03	Update README.md BOLT supports DWARF5.	2023-08-08 18:46:43 -07:00
Alexander Yermolovich	43fe9dcb71	[BOLT][DWARF][NFC] Remove addIndexAddress Removed unused API DebugAddrWriter::addIndexAddress. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157357	2023-08-08 18:23:04 -07:00
Alexander Yermolovich	55a1d959a5	[BOLT][DWARF] Always use new low_pc for call_site Changed to creating a new index all the time. This code was legacy of when we couldn't change the size of .debug_info, and led to subtle bugs where index for new entries was pointing to a wrong address. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157356	2023-08-08 18:21:24 -07:00
Alexander Yermolovich	96cfc5f840	[BOLT][DWARF] Always use new low_pc for exprloc Changed to creating a new index all the time. This code was legacy of when we couldn't change the size of .debug_info, and led to subtle bugs where index for new entries was pointing to a wrong address. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157355	2023-08-08 18:20:06 -07:00
Alexander Yermolovich	9ffdc2b457	[BOLT][DWARF][NFC] Add function to print DIE This is purely to make debugging easier for developers. Now that we moved to IR the print out of DIEs is lacking. This function will lazily parse DIE and use DWARFDie dump function. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D157354	2023-08-08 18:18:24 -07:00
chenpeihao3	892305adb1	[BOLT] fix the endless loop of --iterative-guess Solve the endless loop caused by iterative guess. The main function of this option is guessEdgeByIterativeApproach, where the do while loop involves guessPredEdgeCounts and guessSuccessEdgeCounts. In some scenarios, the do while loop will fall into an endless loop. The reason is that although the GuessedPredEdgeCounts function has guessed the pred-edges counts, GuessedArcs does not insert the corresponding BB block, resulting in the changed variable always being true. Reviewed By: rafauler Differential Revision: https://reviews.llvm.org/D154922	2023-08-04 17:02:47 +08:00
Alexander Yermolovich	e1ceae4b60	[BOLT][DWARF] Fix setting DW_AT_ranges offset of Skeleton CU Fixed a bug where when Skelton CU had DW_AT_ranges, it the output CU DW_AT_ranges offset was relative, and not absolute. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D156958	2023-08-03 10:34:21 -07:00
Alexander Yermolovich	efb8a1c906	[BOLT][DWARF] Delete DW_AT_low_pc when converting to ranges Now that we have new DWARF Rewriter we can remove DW_AT_low_pc when converting DW_AT_low_pc/DW_AT_high_pc to DW_AT_ranges. Which closer follows DWARF spec. Leaving CU DW_AT_low_pc in place. Reading the spec I think it's needed. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D156957	2023-08-03 10:33:04 -07:00
Alexander Yermolovich	1713f84983	[BOLT][DWARF] Opt out test from aarch64 Limiting the test to only X86. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D156765	2023-07-31 18:22:11 -07:00
Alexander Yermolovich	9eb0df3aa9	[BOLT][DWARF] Fix handling of inlined subroutine with no output PC Clang can generate DW_TAG_inlined_subroutine with low_pc 0. With split dwarf this led to range offset being a negative number. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D156742	2023-07-31 17:12:07 -07:00
Amir Ayupov	2dea832ef0	[BOLT][test] Add missing stderr redirections BOLT-ERROR and BOLT-WARNING messages are output to stderr which is not captured by piping to FileCheck. Redirect stderr to stdout to fix that in tests. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156340	2023-07-31 16:17:09 -07:00
Amir Ayupov	d796f36fbc	[BOLT][NFC] Simplify DataAggregator Use short loop instead of duplicating the code for setHasProfileAvailable. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D154749	2023-07-31 14:54:41 -07:00
spupyrev	299ec3c22a	[BOLT] Fixing macOS build Fixing build after https://reviews.llvm.org/D153039 Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D156734	2023-07-31 13:55:46 -07:00
Amir Ayupov	70e76e0982	[BOLT] Fix instrumenting conditional tail calls We identify instructions to be instrumented based on Offset annotation. BOLT "expands" conditional tail calls into a conditional jump to a basic block with unconditional tail call. Move Offset annotation from former CTC to the tail call. For expanded CTC we keep Offset attached to the original instruction which is converted into a regular conditional jump, while leaving the newly created tail call without an Offset annotation. This leads to attempting the instrumentation of the conditional jump which points to the basic block with an inherited input offset thus creating an invalid edge description. At the same time, the newly created tail call is skipped entirely which means we're not creating a call description for it. If we instead reassign Offset annotation from the conditional jump to the tail call we fix both issues. The conditional jump will be skipped not creating an invalid edge description, while tail call will be handled properly (unformly with regular calls). Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156389	2023-07-31 13:52:50 -07:00
Amir Ayupov	b0b566b5da	[BOLT][YAML] Only read first profile per function Work around the issue of multiple profiles per function. Can happen with a stale profile which has separate profiles that in a new binary got merged and became aliases. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156644	2023-07-31 13:48:09 -07:00
spupyrev	b402487b74	[BOLT] A new code layout algorithm for function reordering [3b/3] This is a new algorithm for function layout (reordering) based on the call graph extracted from a profile data; see diffs down the stack for more details. This layout is very similar to the existing hfsort+, but perhaps a little better on some benchmarks. The goals of the change is as follows: (i) rename and replace hfsort+ with a newer (hopefully better) implementation. I'd prefer to keep both algs together for some time to simplify evaluation and transition, but do want to remove hfsort+ once we're confident that there are no regressions. (ii) unify the implementation of code layout algorithms across LLVM. Currently Passes/HfsortPlus.cpp and Utils/CodeLayout.cpp share many implementation-specific details; this diff unifies the code. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D153039	2023-07-31 10:49:06 -07:00
Alexander Yermolovich	75f770a68f	[BOLT][DWARF] Update handling of size 1 ranges and fix sub-programs with ranges When output range is only one entry, and input is low_pc/high_pc do not convert to ranges. This helps with size of .debug_ranges/.debug_rnglists. It also helps when either low_pc/high_pc is 0. We not generating potentially invalid ranges that result in LLDB error. Also fixed handling of DW_AT_subprogram with ranges. This can be created with -fbasic-block-sections=all. Reviewed By: maksfb Differential Revision: https://reviews.llvm.org/D156374	2023-07-30 17:32:32 -07:00
Job Noorman	fc395884de	[BOLT][RISCV] Recognize mapping symbols The RISC-V psABI [1] defines them similarly to AArch64. [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#mapping-symbol Reviewed By: yota9, Amir Differential Revision: https://reviews.llvm.org/D153277	2023-07-29 09:18:36 +02:00
Fangrui Song	1652205482	[BOLT][test] Add --show-all-symbols to llvm-objdump -d command llvm-objdump -d has been changed to not display mapping symbols by default.	2023-07-27 21:08:38 -07:00
spupyrev	6d1502c654	[BOLT] (Minor) Changes in stale inference 1. Using ADT/Bitfields.h for hash computation; this is equivalent but shorter than the existing implementation 2. Getting rid of Layout indices for stale matching; using BB->getIndex for indexing Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D155748	2023-07-27 15:29:03 -07:00
Amir Ayupov	2b642926b4	[BOLT][NFC] Format ReorderFunctions.cpp	2023-07-27 13:57:00 -07:00
spupyrev	31e8a9f4d9	[BOLT] Add stale-related logging Adding some logs related to stale profile matching. The new data can be helpful to understand how "stale" the input profile is and how well the inference is able to utilize the stale data. Example of outputs on clang-10 built with LTO (profile collected on a year-old release): ``` BOLT-INFO: inferred profile for 2101 (18.52% of profiled, 100.00% of stale) functions responsible for 30.95% samples (14754697 out of 47670654) BOLT-INFO: stale inference matched 89.42% of basic blocks (79052 out of 88402 stale) responsible for 76.99% samples (645737 out of 838719 stale) ``` LTO+AutoFDO: ``` BOLT-INFO: inferred profile for 6146 (57.57% of profiled, 100.00% of stale) functions responsible for 90.34% samples (50891403 out of 56330313) BOLT-INFO: stale inference matched 74.55% of basic blocks (191295 out of 256589 stale) responsible for 57.30% samples (1288632 out of 2248799 stale) ``` Reviewed By: Amir, maksfb Differential Revision: https://reviews.llvm.org/D154737	2023-07-27 08:56:57 -07:00
Maksim Panchenko	1e4ee588fb	[BOLT] Accept function start as valid jump table entry Jump tables may contain a function start address. One real-world example is when a target basic block contains a recursive tail call that is later optimized/folded into a jump table target. While analyzing a jump table, we treat start address similar to an address past the end of the containing function (a result of __builtin_unreachable), i.e. we require another "regular" entry for the heuristic to proceed. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D156206	2023-07-26 13:25:08 -07:00
Amir Ayupov	e8a75c3f6e	[BOLT][NFC] Simplify YAMLProfileReader - Add `FunctionSet` type alias. - Use any_of - Use ErrorOr handling pattern Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156043	2023-07-26 08:26:16 -07:00
Amir Ayupov	1e0d08e872	[BOLT] Add blocks order kind to YAML profile header Specify blocks order used in YAML profile. Needed to ensure profile backwards compatibility with pre-D155514 DFS order by default. Reviewed By: #bolt, maksfb Differential Revision: https://reviews.llvm.org/D156176	2023-07-24 21:33:05 -07:00

1 2 3 4 5 ...

1798 Commits