As discussed in [0], add a `weight` field to temporal profiling traces found in profiles. This allows users to use the `--weighted-input=` flag in the `llvm-profdata merge` command to weight traces from different scenarios differently.
Note that this is a breaking change, but since [1] landed very recently and there is no way to "use" this trace data, there should be no users of this feature. We believe it is acceptable to land this change without bumping the profile format version.
[0] https://reviews.llvm.org/D147812#4259507
[1] https://reviews.llvm.org/D147287
Reviewed By: snehasish
Differential Revision: https://reviews.llvm.org/D148150
The acc.serial operation models the OpenACC serial construct.
The serial construct defines a region of a program that is to be
executed sequentially on the current device.
The operation is modelled on the acc.parallel operation and will
receive similar updates when the data operands operations will
be implemented.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D148250
Introduced BoUpSLP::ShuffleCostEstimator::gather function as an initial
implementation of the gather/buildvector cost estimation for buildvector
nodes. It will allow to use general codegen infrastructure for better
cost estimation + it improves the cost estimation for the
gathers/buildvectors.
Improved part of D110978.
Differential Revision: https://reviews.llvm.org/D148174
Remove the custoom parser and printer for the acc.parallel
operation and use the assembly format directly.
Reviewed By: PeteSteinfeld, razvanlupusoru
Differential Revision: https://reviews.llvm.org/D148183
Using the latest version of the script from D103695 to compare costmodel vs llvm-mca statistics.
Avoids using the default costs, which was assuming libm calls.
enums and structs declared inside typedefs have incorrect declaration fragments, where the typedef keyword and other syntax is missing.
For the following struct:
typedef struct Test {
int hello;
} Test;
The produced declaration is:
"declarationFragments": [
{
"kind": "keyword",
"spelling": "struct"
},
{
"kind": "text",
"spelling": " "
},
{
"kind": "identifier",
"spelling": "Test"
}
],
instead the declaration fragments should represent the following
typedef struct Test {
…
} Test;
This patch removes the condition in SymbolGraphSerializer.cpp file and completes declaration fragments
Reviewed By: dang
Differential Revision: https://reviews.llvm.org/D146385
When flushing output to a non-positionable tty or socket file, reset the
left tab limit. Otherwise, non-advancing output to that file will contain
an increasing amount of leading spaces in each flush. Also, detect
newline characters in stream output, and treat them as record
advancement.
Differential Revision: https://reviews.llvm.org/D148157
Exp2 functions are pushed directly to libm. This is problematic for
situations where libm is not available. This patch will expand the exp2
function to use exp2 with the input multiplied by ln2 (natural log).
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D148064
By default these will expand back to cmp/sel, but some targets (X86) has optimized costs for scalar integer min/max patterns which are lower than the default expansion (pre-SSE41 is particularly weak for vector min/max support).
Differential Revision: [SLP] Compute min/max scalar reduction costs using min/max intrinsics instead of expanded cmp+sel
With architected SGPRs, workgroup IDs are passed into a compute shader
in TTMP registers. Allow for this in AMDGPUResourceUsageAnalysis instead
of failing an assertion.
Differential Revision: https://reviews.llvm.org/D148239
Adds a DAG combine checks for vector comparisons followed by a bitcast to a
scalar value. Previously, this resulted in an expand. Now, this is done with a
constant number of instructions that take one bit per vector value (via an AND
mask) and perfom a horizontal add to get a single value. This is especially
useful for Clang's __builtin_convertvector() to a bool vector.
Issue: https://github.com/llvm/llvm-project/issues/59829
Differential Revision: https://reviews.llvm.org/D145301
Based off D148215, when expanding a min/max reduction we should be creating min/max intrinsics directly instead of relying on instcombine to fold them back together.
This patch handles integer min/max cases. Hopefully we can add floating point support soon (at least for fastmath/nnan cases) - but we're missing some of the plumbing to pass the correct FMF to the intrinsic at the moment.
Differential Revision: https://reviews.llvm.org/D148221
This test previously relied on just segfaulting or not. This commit adds
a CHECK statement to the test.
Differential Revision: https://reviews.llvm.org/D148151
These test cases were fixed with AIX 73TL1, and are currently passing on AIX machines with that fix. This fix has also been backported to the 7.2 service line. These were tested on a machine with AIX 7.2 TL 5 SP4 installed.
Differential Revision: https://reviews.llvm.org/D148040
Using FileEntry for retrieving filenames give the name used for the last access. FileEntryRef gives the first access name.
Last access name is suspected to change with unrelated changes in clang or the underlying filesystem. First access name gives more stability to the name and makes it easier to track.
Differential Revision: https://reviews.llvm.org/D148213
We were still seeing occasional crashes with inline assembly blocks
using fp16/bf16 after my previous patches:
- https://reviews.llvm.org/rGff4027d152d0
- https://reviews.llvm.org/rG7d15212b8c0c
- https://reviews.llvm.org/rG20b2d11896d9
It turns out:
- The original two commits were wrong, and we should have always been
choosing the SPR register class, not the HPR register class, so that
LLVM's SelectionDAGBuilder correctly did the right splits/joins.
- The `splitValueIntoRegisterParts`/`joinRegisterPartsIntoValue` changes
from rG20b2d11896d9 are still correct, even though they sometimes
result in inefficient codegen of casts between fp16/bf16 and i32/f32
(which is visible in these tests).
This patch fixes crashes in `getCopyToParts` and when trying to select
`(bf16 (bitconvert (fp16 ...)))` dags when Neon is enabled.
This patch also adds support for passing fp16/bf16 values using the 'x'
constraint that is LLVM-specific. This should broadly match how we pass
with 't' and 'w', but with a different set of valid S registers.
Differential Revision: https://reviews.llvm.org/D147715
just because we're being told to evaluate it twice. This sometimes
happens when a variable is evaluated again during codegen.
Differential Revision: https://reviews.llvm.org/D147535
Otherwise, we run into an assertion when trying to use the current
variable scope while creating temporaries for constructor initializers.
Differential Revision: https://reviews.llvm.org/D147534
This change uses the information from target.xml sent by
the GDB stub to produce C types that we can use to print
register fields.
lldb-server *does not* produce this information yet. This will
only work with GDB stubs that do. gdbserver or qemu
are 2 I know of. Testing is added that uses a mocked lldb-server.
```
(lldb) register read cpsr x0 fpcr fpsr x1
cpsr = 0x60001000
= (N = 0, Z = 1, C = 1, V = 0, TCO = 0, DIT = 0, UAO = 0, PAN = 0, SS = 0, IL = 0, SSBS = 1, BTYPE = 0, D = 0, A = 0, I = 0, F = 0, nRW = 0, EL = 0, SP = 0)
```
Only "register read" will display fields, and only when
we are not printing a register block.
For example, cpsr is a 32 bit register. Using the target's scratch type
system we construct a type:
```
struct __attribute__((__packed__)) cpsr {
uint32_t N : 1;
uint32_t Z : 1;
...
uint32_t EL : 2;
uint32_t SP : 1;
};
```
If this register had unallocated bits in it, those would
have been filled in by RegisterFlags as anonymous fields.
A new option "SetChildPrintingDecider" is added so we
can disable printing those.
Important things about this type:
* It is packed so that sizeof(struct cpsr) == sizeof(the real register).
(this will hold for all flags types we create)
* Each field has the same storage type, which is the same as the type
of the raw register value. This prevents fields being spilt over
into more storage units, as is allowed by most ABIs.
* Each bitfield size matches that of its register field.
* The most significant field is first.
The last point is required because the most significant bit (MSB)
being on the left/top of a print out matches what you'd expect to
see in an architecture manual. In addition, having lldb print a
different field order on big/little endian hosts is not acceptable.
As a consequence, if the target is little endian we have to
reverse the order of the fields in the value. The value of each field
remains the same. For example 0b01 doesn't become 0b10, it just shifts
up or down.
This is needed because clang's type system assumes that for a struct
like the one above, the least significant bit (LSB) will be first
for a little endian target. We need the MSB to be first.
Finally, if lldb's host is a different endian to the target we have
to byte swap the host endian value to match the endian of the target's
typesystem.
| Host Endian | Target Endian | Field Order Swap | Byte Order Swap |
|-------------|---------------|------------------|-----------------|
| Little | Little | Yes | No |
| Big | Little | Yes | Yes |
| Little | Big | No | Yes |
| Big | Big | No | No |
Testing was done as follows:
* Little -> Little
* LE AArch64 native debug.
* Big -> Little
* s390x lldb running under QEMU, connected to LE AArch64 target.
* Little -> Big
* LE AArch64 lldb connected to QEMU's GDB stub, which is running
an s390x program.
* Big -> Big
* s390x lldb running under QEMU, connected to another QEMU's GDB
stub, which is running an s390x program.
As we are not allowed to link core code to plugins directly,
I have added a new plugin RegisterTypeBuilder. There is one implementation
of this, RegisterTypeBuilderClang, which uses TypeSystemClang to build
the CompilerType from the register fields.
Reviewed By: jasonmolenda
Differential Revision: https://reviews.llvm.org/D145580
The main change here is to add a `widenScalarToNextPow2` before the
`clampScalar` so that non-power-of-two sizes between 32 and 64 are
turned into s64 count trailing zeroes.
However, if you make the legalisation rules depend on TypeIdx 0 (the
output), then you still get crashes for the s65 testcase, which I solved
by instead flipping the rules around to be about TypeIdx 1 (the input),
with a `scalarSameSizeAs` at the end to tie index 0 to index 1. This,
incidentally, is how things are written for `G_CTLZ`.
Differential Revision: https://reviews.llvm.org/D147602
Add a set of transform operations into the "structured" extension of the
Transform dialect that allow one to select transformation targets more
specifically than the currently available matching. In particular, add
the mechanism for identifying the producers of operands (input and init
in destination-passing style) and users of results, as well as
mechanisms for reasoning about the shape of the iteration space.
Additionally, add several transform operations to manipulate parameters
that could be useful to implement more advanced selectors. Specifically,
new operations let one produce and compare parameter values to implement
shape-driven transformations.
New operations are placed in separate files to decrease compilation
time. Some relayering of the extension is necessary to avoid repeated
generation of enums.
Depends on D148013
Depends on D148014
Depends on D148015
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D148017
This teaches ProcessGDBRemote to look for "flags" nodes
in the target XML that tell you what fields a register has.
https://sourceware.org/gdb/onlinedocs/gdb/Target-Description-Format.html
It will check for various invalid inputs like:
* Flags nodes with 0 fields in them.
* Start or end being > the register size.
* Fields that overlap.
* Required properties not being present (e.g. no name).
* Flag sets being redefined.
If anything untoward is found, we'll just drop the field or the
flag set altogether. Register fields are a "nice to have" so LLDB
shouldn't be crashing because of them, instead just log anything
we throw away. So the user can fix their XML/file a bug with their
vendor.
Once that is done it will sort the fields and pass them to
the RegisterFields class I added previously.
There is no way to see these fields yet, so tests for this code
will come later when the formatting code is added.
The fields are stored in a map of unique pointers on the
ProcessGDBRemote class. It will give out raw pointers on the
assumption that the GDB process lives longer than the users
of those pointers do. Which means RegisterInfo is still a trivial struct
but we are properly destroying the fields when the GDB process ends.
We can't store the fields directly in the map because adding new
items may cause its storage to be reallocated, which would invalidate
pointers we've already given out.
Reviewed By: jasonmolenda, JDevlieghere
Differential Revision: https://reviews.llvm.org/D145574
ObjectFileELF::ApplyRelocations() considered all 32-bit input objects to be i386 and didn't provide good error messages for AArch32 objects. Please find an example in https://github.com/llvm/llvm-project/issues/61948
While we are here, let' improve the situation for unsupported architectures as well. I think we should report the error here too and not silently fail (or crash with assertions enabled).
Reviewed By: SixWeining
Differential Revision: https://reviews.llvm.org/D147627
The following test fails when enabling UBSan due to a left shift of a
negative value:
> runtime error: left shift of negative value -2
BOLT :: AArch64/ext-island-ref.s
This patch fixes this by using a multiplication instead of a shift.
Reviewed By: yota9
Differential Revision: https://reviews.llvm.org/D148218
Previously, hoisting through an iter_arg would mistakenly yield the unpadded value and
cast it to the padded value.
This was incorrect and resulted in out-of-bounds accesses.
The correct formulation is to yield the padded value and extract a smaller dynamic slice
out of it.
Differential Revision: https://reviews.llvm.org/D148173
These are used to store new state added by the Scalable Matrix
Extension which is documented in
https://developer.arm.com/documentation/ddi0616/aa/.
The values match those defined by Linux, see:
e62252bc55/include/uapi/linux/elf.h (L435)
The ZT register(s) are added by SME2 which is not yet publicly
documented but has support in LLVM and Linux already.
Also added descriptions for SVE and PAC_MASK notes since those
were missing.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D148126
Instantiation pattern is null for incomplete template types and using
specializaiton decl results in not seeing re-declarations.
Differential Revision: https://reviews.llvm.org/D148158