Right now we have one large AST for all types in LLDB. All ODR violations in
types we reconstruct are resolved by just letting the ASTImporter handle the
conflicts (either by merging types or somehow trying to introduce a duplicated
declaration in the AST). This works ok for the normal types we build from debug
information as most of them are just simple CXXRecordDecls or empty template
declarations.
However, with a loaded `std` C++ module we have alternative versions of pretty
much all declarations in the `std` namespace that are much more fleshed out than
the debug information declarations. They have all the information that is lost
when converting to DWARF, such as default arguments, template default arguments,
the actual uninstantiated template declarations and so on.
When we merge these C++ module types into the big scratch AST (that might
already contain debug information types) we give the ASTImporter the tricky task
of somehow creating a consistent AST out of all these declarations. Usually this
ends in a messy AST that contains a mostly broken mix of both module and debug
info declarations. The ASTImporter in LLDB is also importing types with the
MinimalImport setting, which usually means the only information we have when
merging two types is often just the name of the declaration and the information
that it contains some child declarations. This makes it pretty much impossible
to even implement a better merging logic (as the names of C++ module
declarations and debug info declarations are identical).
This patch works around this whole merging problem by separating C++ module
types from debug information types. This is done by splitting up the single
scratch AST into two: One default AST for debug information and a dedicated AST
for C++ module types.
The C++ module AST is implemented as a 'specialised AST' that lives within the
default ScratchTypeSystemClang. When we select the scratch AST we can explicitly
request that we want such a isolated sub-AST of the scratch AST. I kept the
infrastructure more general as we probably can use the same mechanism for other
features that introduce conflicting types (such as programs that are compiled
with a custom -wchar-size= option).
There are just two places where we explicitly have request the C++ module AST:
When we export persistent declarations (`$mytype`) and when we create our
persistent result variable (`$0`, `$1`, ...). There are a few formatters that
were previously assuming that there is only one scratch AST which I cleaned up
in a preparation revision here (D92757).
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D92759
There are a few things that I wanted to reorganize for a while:
- the loop that incrementally goes through classes on failure looked
horrible in assembly, mostly because of `LIKELY`/`UNLIKELY` within
the loop. So remove those, we are already in an unlikely scenario
- hooks are not used by default on Android/Fuchsia/etc so mark the
tests for the existence of the weak functions as unlikely
- mark of couple of conditions as likely/unlikely
- in `reallocate`, the old size was computed again while we already
have it in a variable. So just use the one we have.
- remove the bitwise AND trick and use a logical AND, that has one
less test by using a purposeful underflow when `Size` is 0 (I
actually looked at the assembly of the previous code to steal that
trick)
- move the read of the options closer to where they are used, mark them
as `const`
Overall this makes things a tiny bit faster, but cleaner.
Differential Revision: https://reviews.llvm.org/D92689
The test is reduced from the example in D82005.
Similar to 94f6d365e, the test here would assert in
the DomTree when we tried to convert a select to a
phi with an unreachable block operand.
We may want to add some kind of guard code in DomTree
itself to avoid this sort of problem.
This is just a small change in the Flang tool within libclangDriver.
Currently it passes `-triple` when calling `flang-new -fc1` for various
driver Jobs. As there is no support for code-generation, `-triple` is
not required and should remain unsupported. It is safe to remove it.
This hasn't been a problem as the affected driver Jobs are not yet
implemented or used. However, we will be adding support for them in the
near future and the fact `-triple` is added will become a problem.
Differential Revision: https://reviews.llvm.org/D93027
This change implements pseudo probe encoding and emission for CSSPGO. Please see RFC here for more context: https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s
Pseudo probes are in the form of intrinsic calls on IR/MIR but they do not turn into any machine instructions. Instead they are emitted into the binary as a piece of data in standalone sections. The probe-specific sections are not needed to be loaded into memory at execution time, thus they do not incur a runtime overhead.
**ELF object emission**
The binary data to emit are organized as two ELF sections, i.e, the `.pseudo_probe_desc` section and the `.pseudo_probe` section. The `.pseudo_probe_desc` section stores a function descriptor for each function and the `.pseudo_probe` section stores the actual probes, each fo which corresponds to an IR basic block or an IR function callsite. A function descriptor is stored as a module-level metadata during the compilation and is serialized into the object file during object emission.
Both the probe descriptors and pseudo probes can be emitted into a separate ELF section per function to leverage the linker for deduplication. A `.pseudo_probe` section shares the same COMDAT group with the function code so that when the function is dead, the probes are dead and disposed too. On the contrary, a `.pseudo_probe_desc` section has its own COMDAT group. This is because even if a function is dead, its probes may be inlined into other functions and its descriptor is still needed by the profile generation tool.
The format of `.pseudo_probe_desc` section looks like:
```
.section .pseudo_probe_desc,"",@progbits
.quad 6309742469962978389 // Func GUID
.quad 4294967295 // Func Hash
.byte 9 // Length of func name
.ascii "_Z5funcAi" // Func name
.quad 7102633082150537521
.quad 138828622701
.byte 12
.ascii "_Z8funcLeafi"
.quad 446061515086924981
.quad 4294967295
.byte 9
.ascii "_Z5funcBi"
.quad -2016976694713209516
.quad 72617220756
.byte 7
.ascii "_Z3fibi"
```
For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :
```
FUNCTION BODY (one for each outlined function present in the text section)
GUID (uint64)
GUID of the function
NPROBES (ULEB128)
Number of probes originating from this function.
NUM_INLINED_FUNCTIONS (ULEB128)
Number of callees inlined into this function, aka number of
first-level inlinees
PROBE RECORDS
A list of NPROBES entries. Each entry contains:
INDEX (ULEB128)
TYPE (uint4)
0 - block probe, 1 - indirect call, 2 - direct call
ATTRIBUTE (uint3)
reserved
ADDRESS_TYPE (uint1)
0 - code address, 1 - address delta
CODE_ADDRESS (uint64 or ULEB128)
code address or address delta, depending on ADDRESS_TYPE
INLINED FUNCTION RECORDS
A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
callees. Each record contains:
INLINE SITE
GUID of the inlinee (uint64)
ID of the callsite probe (ULEB128)
FUNCTION BODY
A FUNCTION BODY entry describing the inlined function.
```
To support building a context-sensitive profile, probes from inlinees are grouped by their inline contexts. An inline context is logically a call path through which a callee function lands in a caller function. The probe emitter builds an inline tree based on the debug metadata for each outlined function in the form of a trie tree. A tree root is the outlined function. Each tree edge stands for a callsite where inlining happens. Pseudo probes originating from an inlinee function are stored in a tree node and the tree path starting from the root all the way down to the tree node is the inline context of the probes. The emission happens on the whole tree top-down recursively. Probes of a tree node will be emitted altogether with their direct parent edge. Since a pseudo probe corresponds to a real code address, for size savings, the address is encoded as a delta from the previous probe except for the first probe. Variant-sized integer encoding, aka LEB128, is used for address delta and probe index.
**Assembling**
Pseudo probes can be printed as assembly directives alternatively. This allows for good assembly code readability and also provides a view of how optimizations and pseudo probes affect each other, especially helpful for diff time assembly analysis.
A pseudo probe directive has the following operands in order: function GUID, probe index, probe type, probe attributes and inline context. The directive is generated by the compiler and can be parsed by the assembler to form an encoded `.pseudoprobe` section in the object file.
A example assembly looks like:
```
foo2: # @foo2
# %bb.0: # %bb0
pushq %rax
testl %edi, %edi
.pseudoprobe 837061429793323041 1 0 0
je .LBB1_1
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 6 2 0
callq foo
.pseudoprobe 837061429793323041 3 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
.LBB1_1: # %bb1
.pseudoprobe 837061429793323041 5 1 0
callq *%rsi
.pseudoprobe 837061429793323041 2 0 0
.pseudoprobe 837061429793323041 4 0 0
popq %rax
retq
# -- End function
.section .pseudo_probe_desc,"",@progbits
.quad 6699318081062747564
.quad 72617220756
.byte 3
.ascii "foo"
.quad 837061429793323041
.quad 281547593931412
.byte 4
.ascii "foo2"
```
With inlining turned on, the assembly may look different around %bb2 with an inlined probe:
```
# %bb.2: # %bb2
.pseudoprobe 837061429793323041 3 0
.pseudoprobe 6699318081062747564 1 0 @ 837061429793323041:6
.pseudoprobe 837061429793323041 4 0
popq %rax
retq
```
**Disassembling**
We have a disassembling tool (llvm-profgen) that can display disassembly alongside with pseudo probes. So far it only supports ELF executable file.
An example disassembly looks like:
```
00000000002011a0 <foo2>:
2011a0: 50 push rax
2011a1: 85 ff test edi,edi
[Probe]: FUNC: foo2 Index: 1 Type: Block
2011a3: 74 02 je 2011a7 <foo2+0x7>
[Probe]: FUNC: foo2 Index: 3 Type: Block
[Probe]: FUNC: foo2 Index: 4 Type: Block
[Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6
2011a5: 58 pop rax
2011a6: c3 ret
[Probe]: FUNC: foo2 Index: 2 Type: Block
2011a7: bf 01 00 00 00 mov edi,0x1
[Probe]: FUNC: foo2 Index: 5 Type: IndirectCall
2011ac: ff d6 call rsi
[Probe]: FUNC: foo2 Index: 4 Type: Block
2011ae: 58 pop rax
2011af: c3 ret
```
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D91878
The NPM runs loop passes over loops in forward program order, rather
than the legacy loop PM's reverse program order. This seems to produce
better results as shown here.
I verified that changing the loop order to reverse program order results
in the same IR with the NPM.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D92817
Prevent lldb from crashing when multiple threads are concurrently
accessing the SB API with reproducer capture enabled.
The API instrumentation records both the input arguments and the return
value, but it cannot block for the duration of the API call. Therefore
we introduce a sequence number that allows to to correlate the function
with its result and add locking to ensure those two parts are emitted
atomically.
Using the sequence number, we can detect situations where the return
value does not succeed the function call, in which case we print an
error saying that concurrency is not (currently) supported. In the
future we might attempt to be smarter and read ahead until we've found
the return value matching the current call.
Differential revision: https://reviews.llvm.org/D92820
- use ConvertOpToLLVMPattern to avoid explicit casting and in most cases the
constructor can be reused to save a few lines of code.
Differential Revision: https://reviews.llvm.org/D92989
If SETUNE isn't legal, UO can use the NOT of the SETO expansion.
Removes some complex isel patterns. Most of the test changes are
from using XORI instead of SEQZ.
Differential Revision: https://reviews.llvm.org/D92008
This makes it slightly easier to deal with custom attributes and
CallBase already provides hasFnAttr versions that support both AttrKind
and StringRef arguments in a similar fashion.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D92567
This patch adds vcmla and the rotated variants as defined in
"Arm Neon Intrinsics Reference for ACLE Q3 2020" [1]
The *_lane_* are still missing, but they can be added separately.
This patch only adds the builtin mapping for AArch64.
[1] https://developer.arm.com/documentation/ihi0073/latest
Reviewed By: t.p.northover
Differential Revision: https://reviews.llvm.org/D92930
Extended subgroups are library style extensions and therefore
they require no changes in the frontend. This commit:
1. Moves extension macro definitions to the internal headers.
2. Removes extension pragmas because they are not needed.
Tags: #clang
Differential Revision: https://reviews.llvm.org/D92231
Several data formatters assume their types are in the Target's scratch AST and
build new types from that scratch AST instance. However, types from different
ASTs shouldn't be mixed, so this (unchecked) assumption may lead to problems if
we ever have more than one scratch AST or someone somehow invokes data
formatters on a type that are not in the scratch AST.
Instead we can use in all the formatters just the TypeSystem of the type we're
formatting. That's much simpler and avoids all the headache of finding the right
TypeSystem that matches the one of the formatted type.
Right now LLDB only has one scratch TypeSystemClang instance and we format only
types that are in the scratch AST, so this doesn't change anything in the
current way LLDB works. The intention here is to allow follow up refactorings
that introduce multiple scratch ASTs with the same Target.
Differential Revision: https://reviews.llvm.org/D92757
The wrapper clears shadow for any bytes written to addr or addrlen.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D92964
The semantic analysis of index-names of FORALL statements looks up symbols with
the same name as the index-name. This is needed to exclude symbols that are
not objects. But if the symbol found is host-, use-, or construct-associated
with another entity, the check fails.
I fixed this by getting the root symbol of the symbol found and doing the check
on the root symbol. This required creating a non-const version of
"GetAssociationRoot()".
Differential Revision: https://reviews.llvm.org/D92970
Ports 6e42a417ba since it's now needed, and undo an accidental
deletion from d69762c404 while here (this part is not needed to fix
the build, it's just in the vicinity).
lldb-server tests are a very special subclass of "api" tests. As they
communicate with lldb-server directly, they don't actually need most of
facilities provided by our TestBase class. In particular, they don't
need the ability to fork debug info flavours of tests (but they could
use debug server flavours).
This makes them inherit from "Base" instead. This avoids the need to
explicitly mark these tests as NO_DEBUG_INFO_TEST_CASE. Two additional
necessary tweaks were:
- move run_platform_command to the base (Base) class. This is used in
one test, and can be generally useful when running tests remotely.
- add a "build" method, forwarding to buildDefault. This is to avoid
updating each test case to use buildDefault (also, "build" sounds
better). It might be interesting to refactor the (Test)Base classes so
that all debug info flavour handling happens in TestBase, and the Base
class provides a simple build method automatically.
Remove the OpenMP clause information from the OMPKinds.def file and use the
information from the new OMP.td file. There is now a single source of truth for the
directives and clauses.
To avoid generate lots of specific small code from tablegen, the macros previously
used in OMPKinds.def are generated almost as identical. This can be polished and
possibly removed in a further patch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D92955
Using a MemoryBufferRef, If there is an error parsing, we can point the user to the location of the file.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D93024
This patch changes performMSCATTERCombine to also promote the indices of
masked gathers where the element type is i8 or i16, and adds various tests
for gathers with illegal types.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D91433
Remove target features crypto for Cortex-R82, because it doesn't have any, and
add LSE which was missing while we are at it.
This also removes crypto from the v8-R architecture description because that
aligns better with GCC and so far none of the R-cores have implemented crypto,
so is probably a more sensible default.
Differential Revision: https://reviews.llvm.org/D91994
... and give more guidance to users.
If specifying -msve-vector-bits on a non-SVE target, clang would say:
error: '-msve-vector-bits' is not supported without SVE enabled
1. The driver lacks logic for "implied features".
This would result in this error being raised for -march=...+sve2,
even though +sve2 implies +sve.
2. Feature implication is well modelled in LLVM, so push the error down
the stack.
3. Hint to the user what flag they need to consider setting.
Now clang fails later, when the feature is used, saying:
aarch64-sve-vector-bits.c:42:41: error: 'arm_sve_vector_bits' attribute is not supported on targets missing 'sve'; specify an appropriate -march= or -mcpu=
typedef svint32_t noflag __attribute__((arm_sve_vector_bits(256)));
Move clang/test/Sema/{neon => arm}-vector-types-support.c and put tests for
this warning together in one place.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D92487
LLDB is supposed to ask the Clang Driver what the default module cache path is
and then use that value as the default for the
`symbols.clang-modules-cache-path` setting. However, we use the property type
`String` to change `symbols.clang-modules-cache-path` even though the type of
that setting is `FileSpec`, so the setter will simply do nothing and return
`false`. We also don't check the return value of the setter, so this whole code
ends up not doing anything at all.
This changes the setter to use the correct property type and adds an assert that
we actually successfully set the default path. Also adds a test that checks that
the default value for this setting is never unset/empty path as this would
effectively disable the import-std-module feature from working by default.
Reviewed By: JDevlieghere, shafik
Differential Revision: https://reviews.llvm.org/D92772
SmallVector<T> with default size is now the recommended version (D92522).
Reviewed By: sammccall
Differential Revision: https://reviews.llvm.org/D92788
We currently have problems with the way that low overhead loops are
specified, with LR being spilled between the t2LoopDec and the t2LoopEnd
forcing the entire loop to be reverted late in the backend. As they will
eventually become a single instruction, this patch introduces a
t2LoopEndDec which is the combination of the two, combined before
registry allocation to make sure this does not fail.
Unfortunately this instruction is a terminator that produces a value
(and also branches - it only produces the value around the branching
edge). So this needs some adjustment to phi elimination and the register
allocator to make sure that we do not spill this LR def around the loop
(needing to put a spill after the terminator). We treat the loop very
carefully, making sure that there is nothing else like calls that would
break it's ability to use LR. For that, this adds a
isUnspillableTerminator to opt in the new behaviour.
There is a chance that this could cause problems, and so I have added an
escape option incase. But I have not seen any problems in the testing
that I've tried, and not reverting Low overhead loops is important for
our performance. If this does work then we can hopefully do the same for
t2WhileLoopStart and t2DoLoopStart instructions.
This patch also contains the code needed to convert or revert the
t2LoopEndDec in the backend (which just needs a subs; bne) and the code
pre-ra to create them.
Differential Revision: https://reviews.llvm.org/D91358
When llvm-rc loads an external file, it looks for it relative to
a number of include directories and the current working directory.
If the path is considered absolute, llvm-rc tries to open the
filename as such, and doesn't try to open it relative to other
paths.
On Windows, a path name like "\dir\file" isn't considered absolute
as it lacks the drive name, but by appending it on top of the search
dirs, it's not found.
LLVM's sys::path::append just appends such a path (same with a properly
absolute posix path) after the paths it's supposed to be relative to.
This fix doesn't handle the case if the resource script and the
external file are on a different drive than the current working
directory; to fix that, we'd have to make LLVM's sys::path::append
handle appending fully absolute and partially absolute paths (ones
lacking a drive prefix but containing a root directory), or switch
to C++17's std::filesystem.
Differential Revision: https://reviews.llvm.org/D92558
Current interface of AddressMap assumes that relocations exist.
That is correct for not-linked object file but is not correct
for linked executable. This patch changes interface in such way
that AddressMap could be used not only with not-linked object files:
hasValidRelocationAt()
replaced with:
hasLiveMemoryLocation()
hasLiveAddressRange()
Differential Revision: https://reviews.llvm.org/D87723
Both ds_read_b128 and ds_read2_b64 are valid for 128bit 16-byte aligned
loads but the one that will be selected is determined either by the order in
tablegen or by the AddedComplexity attribute. Currently ds_read_b128 has
priority.
While ds_read2_b64 has lower alignment requirements, we cannot always
restrict ds_read_b128 to 16-byte alignment because of unaligned-access-mode
option. This was causing ds_read_b128 to be selected for 8-byte aligned
loads regardles of chosen access mode.
To resolve this we use two patterns for selecting ds_read_b128. One
requires alignment of 16-byte and the other requires
unaligned-access-mode option.
Same goes for ds_write2_b64 and ds_write_b128.
Differential Revision: https://reviews.llvm.org/D92767
By now LLDB can import the 'std' C++ module to improve expression evaluation,
but there are still a few problems to solve before we can do this by default.
One is that importing the C++ module is slightly slower than normal expression
evaluation (mostly because the disk access and loading the initial lookup data
is quite slow in comparison to the barebone Clang setup the rest of the LLDB
expression evaluator is usually doing). Another problem is that some complicated
types in the standard library aren't fully supported yet by the ASTImporter, so
we end up types that fail to import (which usually appears to the user as if the
type is empty or there is just no result variable).
To still allow people to adopt this mode in their daily debugging, this patch
adds a setting that allows LLDB to automatically retry failed expression with a
loaded C++ module. All success expressions will behave exactly as they would do
before this patch. Failed expressions get a another parse attempt if we find a
usable C++ module in the current execution context. This way we shouldn't have
any performance/parsing regressions in normal debugging workflows, while the
debugging workflows involving STL containers benefit from the C++ module type
info.
This setting is off by default for now with the intention to enable it by
default on macOS soon-ish.
The implementation is mostly just extracting the existing parse logic into its
own function and then calling the parse function again if the first evaluation
failed and we have a C++ module to retry the parsing with.
Reviewed By: shafik, JDevlieghere, aprantl
Differential Revision: https://reviews.llvm.org/D92784
A quick search of github.com, shows one common scenario for excessive use of //clang-format off/on is the indentation of #pragma's, especially around the areas of loop optimization or OpenMP
This revision aims to help that by introducing an `IndentPragmas` style, the aim of which is to keep the pragma at the current level of scope
```
for (int i = 0; i < 5; i++) {
// clang-format off
#pragma HLS UNROLL
// clang-format on
for (int j = 0; j < 5; j++) {
// clang-format off
#pragma HLS UNROLL
// clang-format on
....
```
can become
```
for (int i = 0; i < 5; i++) {
#pragma HLS UNROLL
for (int j = 0; j < 5; j++) {
#pragma HLS UNROLL
....
```
This revision also support working alongside the `IndentPPDirective` of `BeforeHash` and `AfterHash` (see unit tests for examples)
Reviewed By: curdeius
Differential Revision: https://reviews.llvm.org/D92753
clang-format see the `disable:` in __pragma(warning(disable:)) as ObjectiveC method call
Remove any line starting with `#` or __pragma line from being part of the ObjectiveC guess
https://bugs.llvm.org/show_bug.cgi?id=42434
Reviewed By: curdeius, krasimir
Differential Revision: https://reviews.llvm.org/D92922