Returning structs directly in LLVM does not necessarily align with the C ABI of the platform. This might happen to work on Linux but for small structs this breaks on Windows. With this change, the wrappers work platform independently. Differential Revision: https://reviews.llvm.org/D98725
19 KiB
Built-in Function and MemRef Calling Convention
This documents describes the calling convention implemented in the conversion of
built-in function operation, standard
call
operations and the handling of
memref
type equivalents in the
LLVM dialect. The conversion assumes the default
convention was used when converting
built-in to the LLVM dialect types.
Function Result Packing
In case of multi-result functions, the returned values are inserted into a structure-typed value before being returned and extracted from it at the call site. This transformation is a part of the conversion and is transparent to the defines and uses of the values being returned.
Example:
func @foo(%arg0: i32, %arg1: i64) -> (i32, i64) {
return %arg0, %arg1 : i32, i64
}
func @bar() {
%0 = constant 42 : i32
%1 = constant 17 : i64
%2:2 = call @foo(%0, %1) : (i32, i64) -> (i32, i64)
"use_i32"(%2#0) : (i32) -> ()
"use_i64"(%2#1) : (i64) -> ()
}
// is transformed into
llvm.func @foo(%arg0: i32, %arg1: i64) -> !llvm.struct<(i32, i64)> {
// insert the vales into a structure
%0 = llvm.mlir.undef : !llvm.struct<(i32, i64)>
%1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i32, i64)>
%2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i32, i64)>
// return the structure value
llvm.return %2 : !llvm.struct<(i32, i64)>
}
llvm.func @bar() {
%0 = llvm.mlir.constant(42 : i32) : i32
%1 = llvm.mlir.constant(17) : i64
// call and extract the values from the structure
%2 = llvm.call @bar(%0, %1)
: (i32, i32) -> !llvm.struct<(i32, i64)>
%3 = llvm.extractvalue %2[0] : !llvm.struct<(i32, i64)>
%4 = llvm.extractvalue %2[1] : !llvm.struct<(i32, i64)>
// use as before
"use_i32"(%3) : (i32) -> ()
"use_i64"(%4) : (i64) -> ()
}
Calling Convention for Ranked memref
Function arguments of memref
type, ranked or unranked, are expanded into a
list of arguments of non-aggregate types that the memref descriptor defined
above comprises. That is, the outer struct type and the inner array types are
replaced with individual arguments.
This convention is implemented in the conversion of std.func
and std.call
to
the LLVM dialect, with the former unpacking the descriptor into a set of
individual values and the latter packing those values back into a descriptor so
as to make it transparently usable by other operations. Conversions from other
dialects should take this convention into account.
This specific convention is motivated by the necessity to specify alignment and aliasing attributes on the raw pointers underpinning the memref.
Examples:
func @foo(%arg0: memref<?xf32>) -> () {
"use"(%arg0) : (memref<?xf32>) -> ()
return
}
// Gets converted to the following
// (using type alias for brevity):
!llvm.memref_1d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64,
array<1xi64>, array<1xi64>)>
llvm.func @foo(%arg0: !llvm.ptr<f32>, // Allocated pointer.
%arg1: !llvm.ptr<f32>, // Aligned pointer.
%arg2: i64, // Offset.
%arg3: i64, // Size in dim 0.
%arg4: i64) { // Stride in dim 0.
// Populate memref descriptor structure.
%0 = llvm.mlir.undef :
%1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_1d
%2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_1d
%3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_1d
%4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_1d
%5 = llvm.insertvalue %arg4, %4[4, 0] : !llvm.memref_1d
// Descriptor is now usable as a single value.
"use"(%5) : (!llvm.memref_1d) -> ()
llvm.return
}
func @bar() {
%0 = "get"() : () -> (memref<?xf32>)
call @foo(%0) : (memref<?xf32>) -> ()
return
}
// Gets converted to the following
// (using type alias for brevity):
!llvm.memref_1d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64,
array<1xi64>, array<1xi64>)>
llvm.func @bar() {
%0 = "get"() : () -> !llvm.memref_1d
// Unpack the memref descriptor.
%1 = llvm.extractvalue %0[0] : !llvm.memref_1d
%2 = llvm.extractvalue %0[1] : !llvm.memref_1d
%3 = llvm.extractvalue %0[2] : !llvm.memref_1d
%4 = llvm.extractvalue %0[3, 0] : !llvm.memref_1d
%5 = llvm.extractvalue %0[4, 0] : !llvm.memref_1d
// Pass individual values to the callee.
llvm.call @foo(%1, %2, %3, %4, %5) : (!llvm.memref_1d) -> ()
llvm.return
}
Calling Convention for Unranked memref
For unranked memrefs, the list of function arguments always contains two
elements, same as the unranked memref descriptor: an integer rank, and a
type-erased (!llvm<"i8*">
) pointer to the ranked memref descriptor. Note that
while the calling convention does not require stack allocation, casting to
unranked memref does since one cannot take an address of an SSA value containing
the ranked memref. The caller is in charge of ensuring the thread safety and
eventually removing unnecessary stack allocations in cast operations.
Example
llvm.func @foo(%arg0: memref<*xf32>) -> () {
"use"(%arg0) : (memref<*xf32>) -> ()
return
}
// Gets converted to the following.
llvm.func @foo(%arg0: i64 // Rank.
%arg1: !llvm.ptr<i8>) { // Type-erased pointer to descriptor.
// Pack the unranked memref descriptor.
%0 = llvm.mlir.undef : !llvm.struct<(i64, ptr<i8>)>
%1 = llvm.insertvalue %arg0, %0[0] : !llvm.struct<(i64, ptr<i8>)>
%2 = llvm.insertvalue %arg1, %1[1] : !llvm.struct<(i64, ptr<i8>)>
"use"(%2) : (!llvm.struct<(i64, ptr<i8>)>) -> ()
llvm.return
}
llvm.func @bar() {
%0 = "get"() : () -> (memref<*xf32>)
call @foo(%0): (memref<*xf32>) -> ()
return
}
// Gets converted to the following.
llvm.func @bar() {
%0 = "get"() : () -> (!llvm.struct<(i64, ptr<i8>)>)
// Unpack the memref descriptor.
%1 = llvm.extractvalue %0[0] : !llvm.struct<(i64, ptr<i8>)>
%2 = llvm.extractvalue %0[1] : !llvm.struct<(i64, ptr<i8>)>
// Pass individual values to the callee.
llvm.call @foo(%1, %2) : (i64, !llvm.ptr<i8>)
llvm.return
}
Lifetime. The second element of the unranked memref descriptor points to
some memory in which the ranked memref descriptor is stored. By convention, this
memory is allocated on stack and has the lifetime of the function. (Note: due
to function-length lifetime, creation of multiple unranked memref descriptors,
e.g., in a loop, may lead to stack overflows.) If an unranked descriptor has to
be returned from a function, the ranked descriptor it points to is copied into
dynamically allocated memory, and the pointer in the unranked descriptor is
updated accordingly. The allocation happens immediately before returning. It is
the responsibility of the caller to free the dynamically allocated memory. The
default conversion of std.call
and std.call_indirect
copies the ranked
descriptor to newly allocated memory on the caller's stack. Thus, the convention
of the ranked memref descriptor pointed to by an unranked memref descriptor
being stored on stack is respected.
This convention may or may not apply if the conversion of MemRef types is overridden by the user.
C-compatible wrapper emission
In practical cases, it may be desirable to have externally-facing functions with
a single attribute corresponding to a MemRef argument. When interfacing with
LLVM IR produced from C, the code needs to respect the corresponding calling
convention. The conversion to the LLVM dialect provides an option to generate
wrapper functions that take memref descriptors as pointers-to-struct compatible
with data types produced by Clang when compiling C sources. The generation of
such wrapper functions can additionally be controlled at a function granularity
by setting the llvm.emit_c_interface
unit attribute.
More specifically, a memref argument is converted into a pointer-to-struct
argument of type {T*, T*, i64, i64[N], i64[N]}*
in the wrapper function, where
T
is the converted element type and N
is the memref rank. This type is
compatible with that produced by Clang for the following C++ structure template
instantiations or their equivalents in C.
template<typename T, size_t N>
struct MemRefDescriptor {
T *allocated;
T *aligned;
intptr_t offset;
intptr_t sizes[N];
intptr_t strides[N];
};
Furthermore, we also rewrite function results to pointer parameters if the rewritten function result has a struct type. The special result parameter is added as the first parameter and is of pointer-to-struct type.
If enabled, the option will do the following. For external functions declared in the MLIR module.
- Declare a new function
_mlir_ciface_<original name>
where memref arguments are converted to pointer-to-struct and the remaining arguments are converted as usual. Results are converted to a special argument if they are of struct type. - Add a body to the original function (making it non-external) that
- allocates memref descriptors,
- populates them,
- potentially allocates space for the result struct, and
- passes the pointers to these into the newly declared interface function, then
- collects the result of the call (potentially from the result struct), and
- returns it to the caller.
For (non-external) functions defined in the MLIR module.
- Define a new function
_mlir_ciface_<original name>
where memref arguments are converted to pointer-to-struct and the remaining arguments are converted as usual. Results are converted to a special argument if they are of struct type. - Populate the body of the newly defined function with IR that
- loads descriptors from pointers;
- unpacks descriptor into individual non-aggregate values;
- passes these values into the original function;
- collects the results of the call and
- either copies the results into the result struct or returns them to the caller.
Examples:
func @qux(%arg0: memref<?x?xf32>)
// Gets converted into the following
// (using type alias for brevity):
!llvm.memref_2d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64,
array<2xi64>, array<2xi64>)>
// Function with unpacked arguments.
llvm.func @qux(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>,
%arg2: i64, %arg3: i64, %arg4: i64,
%arg5: i64, %arg6: i64) {
// Populate memref descriptor (as per calling convention).
%0 = llvm.mlir.undef : !llvm.memref_2d
%1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d
%2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d
%3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d
%4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d
%5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d
%6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d
%7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d
// Store the descriptor in a stack-allocated space.
%8 = llvm.mlir.constant(1 : index) : i64
%9 = llvm.alloca %8 x !llvm.memref_2d
: (i64) -> !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
array<2xi64>, array<2xi64>)>>
llvm.store %7, %9 : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
array<2xi64>, array<2xi64>)>>
// Call the interface function.
llvm.call @_mlir_ciface_qux(%9)
: (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
array<2xi64>, array<2xi64>)>>) -> ()
// The stored descriptor will be freed on return.
llvm.return
}
// Interface function.
llvm.func @_mlir_ciface_qux(!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
array<2xi64>, array<2xi64>)>>)
func @foo(%arg0: memref<?x?xf32>) {
return
}
// Gets converted into the following
// (using type alias for brevity):
!llvm.memref_2d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64,
array<2xi64>, array<2xi64>)>
!llvm.memref_2d_ptr = type !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
array<2xi64>, array<2xi64>)>>
// Function with unpacked arguments.
llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>,
%arg2: i64, %arg3: i64, %arg4: i64,
%arg5: i64, %arg6: i64) {
llvm.return
}
// Interface function callable from C.
llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr) {
// Load the descriptor.
%0 = llvm.load %arg0 : !llvm.memref_2d_ptr
// Unpack the descriptor as per calling convention.
%1 = llvm.extractvalue %0[0] : !llvm.memref_2d
%2 = llvm.extractvalue %0[1] : !llvm.memref_2d
%3 = llvm.extractvalue %0[2] : !llvm.memref_2d
%4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d
%5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d
%6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d
%7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d
llvm.call @foo(%1, %2, %3, %4, %5, %6, %7)
: (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64,
i64, i64) -> ()
llvm.return
}
func @foo(%arg0: memref<?x?xf32>) -> memref<?x?xf32> {
return %arg0 : memref<?x?xf32>
}
// Gets converted into the following
// (using type alias for brevity):
!llvm.memref_2d = type !llvm.struct<(ptr<f32>, ptr<f32>, i64,
array<2xi64>, array<2xi64>)>
!llvm.memref_2d_ptr = type !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64,
array<2xi64>, array<2xi64>)>>
// Function with unpacked arguments.
llvm.func @foo(%arg0: !llvm.ptr<f32>, %arg1: !llvm.ptr<f32>, %arg2: i64,
%arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64)
-> !llvm.memref_2d {
%0 = llvm.mlir.undef : !llvm.memref_2d
%1 = llvm.insertvalue %arg0, %0[0] : !llvm.memref_2d
%2 = llvm.insertvalue %arg1, %1[1] : !llvm.memref_2d
%3 = llvm.insertvalue %arg2, %2[2] : !llvm.memref_2d
%4 = llvm.insertvalue %arg3, %3[3, 0] : !llvm.memref_2d
%5 = llvm.insertvalue %arg5, %4[4, 0] : !llvm.memref_2d
%6 = llvm.insertvalue %arg4, %5[3, 1] : !llvm.memref_2d
%7 = llvm.insertvalue %arg6, %6[4, 1] : !llvm.memref_2d
llvm.return %7 : !llvm.memref_2d
}
// Interface function callable from C.
llvm.func @_mlir_ciface_foo(%arg0: !llvm.memref_2d_ptr, %arg1: !llvm.memref_2d_ptr) {
%0 = llvm.load %arg1 : !llvm.memref_2d_ptr
%1 = llvm.extractvalue %0[0] : !llvm.memref_2d
%2 = llvm.extractvalue %0[1] : !llvm.memref_2d
%3 = llvm.extractvalue %0[2] : !llvm.memref_2d
%4 = llvm.extractvalue %0[3, 0] : !llvm.memref_2d
%5 = llvm.extractvalue %0[3, 1] : !llvm.memref_2d
%6 = llvm.extractvalue %0[4, 0] : !llvm.memref_2d
%7 = llvm.extractvalue %0[4, 1] : !llvm.memref_2d
%8 = llvm.call @foo(%1, %2, %3, %4, %5, %6, %7)
: (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i64, i64, i64, i64) -> !llvm.memref_2d
llvm.store %8, %arg0 : !llvm.memref_2d_ptr
llvm.return
}
Rationale: Introducing auxiliary functions for C-compatible interfaces is
preferred to modifying the calling convention since it will minimize the effect
of C compatibility on intra-module calls or calls between MLIR-generated
functions. In particular, when calling external functions from an MLIR module in
a (parallel) loop, the fact of storing a memref descriptor on stack can lead to
stack exhaustion and/or concurrent access to the same address. Auxiliary
interface function serves as an allocation scope in this case. Furthermore, when
targeting accelerators with separate memory spaces such as GPUs, stack-allocated
descriptors passed by pointer would have to be transferred to the device memory,
which introduces significant overhead. In such situations, auxiliary interface
functions are executed on host and only pass the values through device function
invocation mechanism.
## Default Memref Model
### Memref Descriptor
Within a converted function, a `memref`-typed value is represented by a memref
_descriptor_, the type of which is the structure type obtained by converting
from the memref type. This descriptor holds all the necessary information to
produce an address of a specific element. In particular, it holds dynamic values
for static sizes, and they are expected to match at all times.
It is created by the allocation operation and is updated by the conversion
operations that may change static dimensions into dynamic dimensions and vice
versa.
**Note**: LLVM IR conversion does not support `memref`s with layouts that are
not amenable to the strided form.
### Index Linearization
Accesses to a memref element are transformed into an access to an element of the
buffer pointed to by the descriptor. The position of the element in the buffer
is calculated by linearizing memref indices in row-major order (lexically first
index is the slowest varying, similar to C, but accounting for strides). The
computation of the linear address is emitted as arithmetic operation in the LLVM
IR dialect. Strides are extracted from the memref descriptor.
Examples:
An access to a memref with indices:
```mlir
%0 = load %m[%1,%2,%3,%4] : memref<?x?x4x8xf32, offset: ?>
is transformed into the equivalent of the following code:
// Compute the linearized index from strides.
// When strides or, in absence of explicit strides, the corresponding sizes are
// dynamic, extract the stride value from the descriptor.
%stride1 = llvm.extractvalue[4, 0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
array<4xi64>, array<4xi64>)>
%addr1 = muli %stride1, %1 : i64
// When the stride or, in absence of explicit strides, the trailing sizes are
// known statically, this value is used as a constant. The natural value of
// strides is the product of all sizes following the current dimension.
%stride2 = llvm.mlir.constant(32 : index) : i64
%addr2 = muli %stride2, %2 : i64
%addr3 = addi %addr1, %addr2 : i64
%stride3 = llvm.mlir.constant(8 : index) : i64
%addr4 = muli %stride3, %3 : i64
%addr5 = addi %addr3, %addr4 : i64
// Multiplication with the known unit stride can be omitted.
%addr6 = addi %addr5, %4 : i64
// If the linear offset is known to be zero, it can also be omitted. If it is
// dynamic, it is extracted from the descriptor.
%offset = llvm.extractvalue[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
array<4xi64>, array<4xi64>)>
%addr7 = addi %addr6, %offset : i64
// All accesses are based on the aligned pointer.
%aligned = llvm.extractvalue[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64,
array<4xi64>, array<4xi64>)>
// Get the address of the data pointer.
%ptr = llvm.getelementptr %aligned[%addr8]
: !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<4xi64>, array<4xi64>)>
-> !llvm.ptr<f32>
// Perform the actual load.
%0 = llvm.load %ptr : !llvm.ptr<f32>
For stores, the address computation code is identical and only the actual store operation is different.
Note: the conversion does not perform any sort of common subexpression elimination when emitting memref accesses.