Add new operations to the gpu dialect to represent device side
asynchronous copies. This also add the lowering of those operations to
nvvm dialect.
Those ops are meant to be low level and map directly to llvm dialects
like nvvm or rocdl.
We can further add higher level of abstraction by building on top of
those operations.
This has been discuss here:
https://discourse.llvm.org/t/modeling-gpu-async-copy-ampere-feature/4924
Differential Revision: https://reviews.llvm.org/D119191
Note that this doesn't actually cause the top level predicate to become a non-union just yet.
The * above comes from a case in the LoopVectorizer where a predicate which is later proven no longer blocks vectorization due to a change from checking if predicates exists to whether the predicate is possibly false.
The superclass method handles a bunch of useful things. For example
it applies flags such as `-fnew-alignment` which doesn't work without
this patch.
Differential Revision: https://reviews.llvm.org/D118573
Due to the way type units work, this would lead to a declaration in a
type unit of a local type in a CU - which is ambiguous. Rather than
trying to resolve that relative to the CU that references the type unit,
let's just not try to simplify these names.
Longer term this should be fixed by not putting the template
instantiation in a type unit to begin with - since it references an
internal linkage type, it can't legitimately be duplicated/in more than
one translation unit, so skip the type unit overhead. (but the right fix
for that is to move type unit management into a DICompositeType flag
(dropping the "identifier" field is not a perfect solution since it
breaks LLVM IR linking decl/def merging during IR linking))
`opt -analyze` is legacy PM-specific. Show better ways of doing the same
thing, generally with some sort of `-passes=print<foo>`.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D119486
`bug49334.cpp` directly uses `!=` to compare two floating point values,
which is almost wrong.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D119485
This relands commit b380a31de084a540cfa38b72e609b25ea0569bb7.
Restrict the tests to Windows only since the flag symbol hash depends on
system-dependent path normalization.
Currently we have a hard team limit, which is set to 65536. It says no matter whether the device can support more teams, or users set more teams, as long as it is larger than that hard limit, the final number to launch the kernel will always be that hard limit. It is way less than the actual hardware limit. For example, my workstation has GTX2080, and the hardware limit of grid size is 2147483647, which is exactly the largest number a `int32_t` can represent. There is no limitation mentioned in the spec. This patch simply removes it.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D119313
Lambda names aren't entirely canonical (as demonstrated by the
cross-project-test added here) at the moment (we should fix that for a
bunch of reasons) - even if the template referencing them is
non-simplified, other names referencing /that/ template can't be
simplified either because type units might cause a different template to
be picked up that would conflict with the expected name.
(other than for roundtripping precision, it'd be OK to simplify types
that reference types that reference lambdas - but best be consistent
between the roundtrip/verify mode and the actual simplified template
names mode)
Use same MSSA clobbering checks as in the AMDGPUAnnotateUniformValues.
Kernel argument promotion needs exactly the same information so factor
out utility function isClobberedInFunction.
Differential Revision: https://reviews.llvm.org/D119480
This change adds support for dsymutil to be able to dump out the new swift5 reflection sections called swift5_proto and swift5_protos. The test is also updated to check for this.
Differential Revision: https://reviews.llvm.org/D119310
The test sometimes fails on Windows due to a warning emitted by bash about not
being able to find the /tmp directory causing this test to randomly fail. This
update makes the test more flexible to account for this possibility and should
hopefully make it more reliable.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D118691
Minor efficiency fix. There is no reason to perform the same set lookup
repeatedly in the inner loop as it is invariant there.
Differential Revision: https://reviews.llvm.org/D119474
A significant number of our tests in C accidentally use functions
without prototypes. This patch converts the function signatures to have
a prototype for the situations where the test is not specific to K&R C
declarations. e.g.,
void func();
becomes
void func(void);
This is the seventh batch of tests being updated (there are a
significant number of other tests left to be updated).
These functions allow for defining pattern fragments usable within the `match` and `rewrite` sections of a pattern. The main structure of Constraints and Rewrites functions are the same, and are similar to functions in other languages; they contain a signature (i.e. name, argument list, result list) and a body:
```pdll
// Constraint that takes a value as an input, and produces a value:
Constraint Cst(arg: Value) -> Value { ... }
// Constraint that returns multiple values:
Constraint Cst() -> (result1: Value, result2: ValueRange);
```
When returning multiple results, each result can be optionally be named (the result of a Constraint/Rewrite in the case of multiple results is a tuple).
These body of a Constraint/Rewrite functions can be specified in several ways:
* Externally
In this case we are importing an external function (registered by the user outside of PDLL):
```pdll
Constraint Foo(op: Op);
Rewrite Bar();
```
* In PDLL (using PDLL constructs)
In this case, the body is defined using PDLL constructs:
```pdll
Rewrite BuildFooOp() {
// The result type of the Rewrite is inferred from the return.
return op<my_dialect.foo>;
}
// Constraints/Rewrites can also implement a lambda/expression
// body for simple one line bodies.
Rewrite BuildFooOp() => op<my_dialect.foo>;
```
* In PDLL (using a native/C++ code block)
In this case the body is specified using a C++(or potentially other language at some point) code block. When building PDLL in AOT mode this will generate a native constraint/rewrite and register it with the PDL bytecode.
```pdll
Rewrite BuildFooOp() -> Op<my_dialect.foo> [{
return rewriter.create<my_dialect::FooOp>(...);
}];
```
Differential Revision: https://reviews.llvm.org/D115836
This allows for defining simple patterns in a single line. The lambda
body of a Pattern expects a single operation rewrite statement:
```
Pattern => replace op<my_dialect.foo>(operands: ValueRange) with operands;
```
Differential Revision: https://reviews.llvm.org/D115835