lldb-dap instances managed by other extensions benefit from having a
welcome message with, for example, a basic user guide or a
troubleshooting message.
This PR adds a cmake variable for defining such message in a simple way.
This message appears upon initialization but before initCommands are
executed, as they might cause a failure and prevent the message from
being displayed.
Such cases show up in the middle of optimizations passes, e.g., after
some rewrites and then CSE. The current folder can fold such cases when
the inputs are constant; this patch improves it to fold even if the
inputs are non-constant.
Prior to this change, we wouldn't build headers that aren't referenced
by other parts of the libc which would result in a build error during
installation. To address this, we make the header target a dependency of
the libc archive. Additionally, we also redo the install targets, moving
the install targets closer to build targets and simplifying the
hierarchy and generally matching what we do for other runtimes.
It implements transformation to optimize accesses to shared memory.
Reference: https://reviews.llvm.org/D127457
_This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by newColIdx = col % vecSize + perm[row](col / vecSize, row)
where perm is a permutation function indexed by row and vecSize
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized
accesses
common to loading matrix multiplication operands to/from shared memory._
When the MachODebugMapParser encounters an object file that cannot be
found on disk, it currently leaves the parser in an incoherent state,
resulting in spurious warnings that can in turn slow down dsymutil.
This fixes#78411.
rdar://117515153
We want to know the upper 33 bits of the And Input are zero. SExt
only guarantees they are the same.
We originally checked for SExt or ZExt when we were using
isImpliedByDomCondition because a ZExt may have been changed to SExt
before we visited the And.
We are no longer using isImpliedByDomCondition so we can only look
for zext with the nneg flag.
While here, switch to PatternMatch to simplify the code.
Fixes#78783
This patch adds in support for symbolizing PGO information contained
within the SHT_LLVM_BB_ADDR_MAP section in llvm-objdump. The outputs are
simply the raw values contained within the section.
We just need to check if the global is large or not.
In the kernel code model, globals are in the negative 2GB of the address
space, so globals can be a sign extended 32-bit immediate.
In other code models, small globals are in the low 2GB of the address
space, so sign extending them is equivalent to zero extending them.
`-fvisibility-from-dllstorageclass` allows for overriding the visibility
of globals from their DLL storage class. The visibility to apply can be
customised for the different classes of globals via a set of dependent
options that specify the mapping values:
- `-fvisibility-dllexport=<value>`
- `-fvisibility-nodllstorageclass=<value>`
- `-fvisibility-externs-dllimport=<value>`
- `-fvisibility-externs-nodllstorageclass=<value>`
Currently, one of the existing LLVM visibilities, `hidden`, `protected`,
`default`, can be used as a mapping value. This change adds a new
mapping value: `keep`, which specifies that the visibility should not be
overridden for that class of globals. The behaviour of
`-fvisibility-from-dllstorageclass` is otherwise unchanged and existing
uses of this set of options will be unaffected.
The background to this change is that currently the PS4 and PS5
compilers effectively ignore visibility - dllimport/export is the
supported method for export control in C/C++ source code. Now, we would
like to support visibility attributes and options in our frontend, in
addition to dllimport/export. To support this, we will override the
visibility of globals with explicit dllimport/export annotations but use
the `keep` setting for globals which do not have an explicit
dllimport/export.
There are also some minor improvements to the existing options:
- Make the `LANGOPS` `BENIGN` as they don't involve the AST.
- Correct/clarify the help text for the options.
This mirrors how the ELF linker works. I wasn't able to find anywhere
where this is currently tested.
Followup to #78640, which triggered a regression.
In the hardening modes that can be used in production (`fast` and
`extensive`), make a failed assertion invoke a trap instruction rather
than calling verbose abort. In the debug mode, still keep calling
verbose abort to provide a better user experience and to allow us to
keep our existing testing infrastructure for verifying assertion
messages. Since the debug mode by definition enables all assertions, we
can be sure that we still check all the assertion messages in the
library when running the test suite in the debug mode.
The main motivation to use trapping in production is to achieve better
code generation and reduce the binary size penalty. This way, the
assertion handler can compile to a single instruction, whereas the
existing mechanism with verbose abort results in generating a function
call that in general cannot be optimized away (made worse by the fact
that it's a variadic function, imposing an additional penalty). See the
[RFC](https://discourse.llvm.org/t/rfc-hardening-in-libc/73925) for more
details. Note that this mechanism can now be completely [overridden at
CMake configuration
time](https://github.com/llvm/llvm-project/pull/77883).
This patch also significantly refactors `check_assertion.h` and expands
its test coverage. The main changes:
- when overriding `verbose_abort`, don't do matching inside the function
-- just print the error message to `stderr`. This removes the need to
set a global matcher and allows to do matching in the parent process
after the child finishes;
- remove unused logic for matching source locations and for using
wildcards;
- make matchers simple functors;
- introduce `DeathTestResult` that keeps data about the test run,
primarily to make it easier to test.
In addition to the refactoring, `check_assertion.h` can now recognize
when a process exits due to a trap.
Adding weights to utility nodes in BP so that we can give more
importance to
certain utilities. This is useful when we optimize several objectives
jointly.
Closes#77638, #24186
Rebased from <https://reviews.llvm.org/D156032>, see there for more
information.
Implements wording change in [CWG2137](https://wg21.link/CWG2137) in the
first commit.
This also implements an approach to [CWG2311](https://wg21.link/CWG2311)
in the second commit, because too much code that relies on `T{ T_prvalue}`
being an elision would break. Because that issue is still open and
the CWG issue doesn't provide wording to fix the issue, there may be
different behaviours on other compilers.
Currently, the duplicate snippet repetitor will truncate snippets that
do not exactly divide the minimum number of instructions. This patch
corrects that behavior by making the duplicate snippet repetitor
duplicate the snippet in its entirety until the minimum number of
instructions has been reached.
This makes the behavior consistent with the loop snippet repetitor,
which will execute at least `--num-repetitions` (soon to be renamed
`--min-instructions`) instructions.
This patch adds an assertion to readBBAddrMapImpl to confirm that
PGOAnalyses and BBAddrMaps are of the same size when PGO information is
requested (part of the API contract). This patch also updates the
docstring for readBBAddrMap to better clarify what is guaranteed.
Adds `transform.func.cast_and_call` that takes a set of inputs and
outputs and replaces the uses of those outputs with a call to a function
at a specified insertion point.
The idea with this operation is to allow users to author independent IR
outside of a to-be-compiled module, and then match and replace a slice
of the program with a call to the external function.
Additionally adds a mechanism for populating a type converter with a set
of conversion materialization functions that allow insertion of
casts on the inputs/outputs to and from the types of the function
signature.
TSan's shadow mappings only support 30-bits of ASLR entropy on x86
Linux, and it is not practical to support the maximum of 32-bits (due to pointer compression and the overhead of shadow mappings). Instead, this patch changes TSan to re-exec without ASLR if it encounters an
incompatible memory layout, as suggested by Dmitry in
https://github.com/google/sanitizers/issues/1716.
If ASLR is already disabled but the memory layout is still incompatible,
it will abort.
This patch involves a bit of refactoring, because the old code is:
1. InitializePlatformEarly()
2. InitializeAllocator()
3. InitializePlatform(): CheckAndProtect()
but it may already segfault during InitializeAllocator() if the memory
layout is incompatible, before we get a chance to check in
CheckAndProtect().
This patch adds CheckAndProtect() during InitializePlatformEarly(), before the allocator is initialized. Naturally, it is necessary to ensure that CheckAndProtect() does *not* allow the heap regions to be occupied here, hence we generalize CheckAndProtect() to optionally check the heap
regions. We keep the original behavior of CheckAndProtect() in InitializePlatform() as a last line of defense.
We need to be careful not to prematurely abort if ASLR is disabled but TSan was going to re-exec for other reasons (e.g., unlimited stack size); we implement this by moving all the re-exec logic into ReExecIfNeeded().
When undefined functions exist in the final link we need to create
stub functions (otherwise direct calls to those functions could
not be generated). We were creating those stub when
`--unresolved-symbols=ignore-all` was passed but overlooked the fact
that `--warn-unresolved-symbols` essentially has the same effect (i.e.
undefined function can exist in the final link).
Fixes: #53987
This implements the ideas discussed in [1].
To summarize, this commit changes AsmPrinter so that it outputs
DW_IDX_parent information for debug_name entries. It will enable
debuggers to speed up queries for fully qualified types (based on a
DWARFDeclContext) significantly, as debuggers will no longer need to
parse the entire CU in order to inspect the parent chain of a DIE.
Instead, a debugger can simply take the parent DIE offset from the
accelerator table and peek at its name in the debug_info/debug_str
sections.
The implementation uses two types of DW_FORM for the DW_IDX_parent
attribute:
1. DW_FORM_ref4, which points to the accelerator table entry for the
parent.
2. DW_FORM_flag_present, when the entry has a parent that is not in the
table (that is, the parent doesn't have a name, or isn't allowed to be
in the table as per the DWARF spec). This is space-efficient, since it
takes 0 bytes.
The implementation works by:
1. Changing how abbreviations are encoded (so that they encode which
form, if
any, was used to encode IDX_Parent)
2. Creating an MCLabel per accelerator table entry, so that they may be
referred by IDX_parent references.
When all patches related to this are merged, we are able to show that
evaluating an expression such as:
```
lldb --batch -o 'b CodeGenFunction::GenerateCode' -o run -o 'expr Fn' -- \
clang++ -c -g test.cpp -o /dev/null
```
is far faster: from ~5000 ms to ~1500ms.
Building llvm-project + clang with and without this patch, and looking
at its impact on object file size:
```
ls -la $(find build_stage2_Debug_idx_parent_assert_dwarf5 -name \*.cpp.o) | awk '{s+=$5} END {printf "%\047d\n", s}'
11,507,327,592
-la $(find build_stage2_Debug_no_idx_parent_assert_dwarf5 -name \*.cpp.o) | awk '{s+=$5} END {printf "%\047d\n", s}'
11,436,446,616
```
That is, an increase of 0.62% in total object file size.
Looking only at debug_names:
```
$stage1_build/bin/llvm-objdump --section-headers $(find build_stage2_Debug_idx_parent_assert_dwarf5 -name \*.cpp.o) | grep __debug_names | awk '{s+="0x"$3} END {printf "%\047d\n", s}'
440,772,348
$stage1_build/bin/llvm-objdump --section-headers $(find build_stage2_Debug_no_idx_parent_assert_dwarf5 -name \*.cpp.o) | grep __debug_names | awk '{s+="0x"$3} END {printf "%\047d\n", s}'
369,867,920
```
That is an increase of 19%.
DWARF Linkers need to be changed in order to support this. This commit
already brings support to "base" linker, but it does not attempt to
modify the parallel linker. Accelerator entries refer to the
corresponding DIE offset, and this patch also requires the parent DIE
offset -- it's not clear how the parallel linker can access this. It may
be obvious to someone familiar with it, but it would be nice to get help
from its authors.
[1]:
https://discourse.llvm.org/t/rfc-improve-dwarf-5-debug-names-type-lookup-parsing-speed/74151/
The @expectedFailureAll and @skipIf decorators will mark the test case
as xfail/skip if _all_ conditions passed in match, including debug_info.
* If debug_info is not one of the matching conditions, we can
immediately evaluate the check and decide if it should be decorated.
* If debug_info *is* present as a match condition, we need to defer
whether or not to decorate until when the `LLDBTestCaseFactory`
metaclass expands the test case into its potential variants. This is
still early enough that the standard `unittest` framework will recognize
the test as xfail/skip by the time the test actually runs.
TestDecorators exhibits the edge cases more thoroughly. With the
exception of `@expectedFailureIf` (added by this commit), all those test
cases pass prior to this commit.
This is a followup to 212a60ec37.
Summary:
We use `add_libc_test' now because it works for both hermetic and unit
tests. If the test needs to be unit test only you use `UNIT_TEST_ONLY`
as an argument.
Having it return a `std::optional<bool>` is unnecessarily confusing.
This patch changes it to a simple 'bool'.
This patch also removes the 'BodyOverridesInterface' operand because
there is only a single use for this which is easily rewritten.
The arm_sme.td file was still using `IsSharedZA` and `IsPreservesZA`,
which should be changed to match the new state attributes added in
#76971.
This patch adds `IsInZA`, `IsOutZA` and `IsInOutZA` as the state for the
Clang builtins and fixes up the code in SemaChecking and SveEmitter to
match.
Note that the code is written in such a way that it can be easily
extended with ZT0 state (to follow in a future patch).
Provides some context for failing to generate LLVM IR for `target
enter|exit|update` directives when `nowait` is provided. This is
directly helpful for flang users since they would get this error message
if they tried to use `nowait`. Before that we had a very generic
message.
This is a follow-up to https://github.com/llvm/llvm-project/pull/78269,
please only review the latest commit (the one with the same commit
message as the PR title).
Fixes#70221
Fix a bug in FileCheck that corrects the error message when multiple
prefixes are provided
through --check-prefixes and one of them is a PREFIX-NOT.
Earlier, only the first of the provided prefixes was displayed as the
erroneous prefix, while the
actual error might be on the prefix that occurred at the end of the
prefix list in the input file.
Now, the right NOT prefix is shown in the error message.
Add an `-allow-incomplete-ir` flag to the IR parser, which allows
reading IR with missing declarations. This is intended to produce a
best-effort interpretation of the IR, along the same lines of what we
would manually do when taking, for example, a function from
`-print-after-all` output and fixing it up to be valid IR.
This patch only supports dropping references to undeclared metadata,
either by dropping metadata attachments from instructions/functions, or
by dropping calls to certain intrinsics (like debug intrinsics). I will
implement support for inserting missing function/global declarations in
a followup patch.
We don't have real use lists for metadata, so the approach here is to
iterate over the whole IR and identify metadata that needs to be
dropped. This does not support all possible cases, but should handle
anything that's relevant for the function-only IR use case.