Commit Graph

17205 Commits

Author SHA1 Message Date
Nicolas Vasilache
bf7c490ab7
[mlir][Vector] Add a rewrite pattern for better low-precision bitcast… (#66387)
…(trunci) expansion

This revision adds a rewrite for sequences of vector `bitcast(trunci)`
to use a more efficient sequence of vector operations comprising
`shuffle` and `bitwise` ops.

Such patterns appear naturally when writing quantization /
dequantization functionality with the vector dialect.

The rewrite performs a simple enumeration of each of the bits in the
result vector and determines its provenance in the pre-trunci vector.
The enumeration is used to generate the proper sequence of `shuffle`,
`andi`, `ori` followed by an optional final `trunci`/`extui`.

The rewrite currently only applies to 1-D non-scalable vectors and bails
out if the final vector element type is not a multiple of 8. This is a
failsafe heuristic determined empirically: if the resulting type is not
an even number of bytes, further complexities arise that are not
improved by this pattern: the heavy lifting still needs to be done by
LLVM.
2023-09-18 15:08:18 +02:00
Matthias Springer
08d2ea372f
[mlir][Interfaces][NFC] LoopLikeOpInterface: Consistent TD formatting (#66097)
Format the interface methods consistently in the TableGen file. Also
mention more details about this interface in the description.
2023-09-18 12:45:33 +02:00
Martin Erhart
1a4dd8d362
[mlir][bufferization] Switch tests to new deallocation pass pipeline (#66517)
Use the new ownership based deallocation pass pipeline in the regression
and integration tests. Some one-shot bufferization tests tested one-shot
bufferize and deallocation at the same time. I removed the deallocation
pass there because the deallocation pass is already thoroughly tested by
itself.

Fixed version of #66471
2023-09-18 12:00:27 +02:00
Matthias Springer
0f952cfe24
[mlir][IR] Change MutableOperandRange::operator[] to return an OpOperand & (#66515)
`operator[]` returns `OpOperand &` instead of `Value`.

* This allows users to get OpOperands by name instead of "magic" number.
E.g., `extractSliceOp->getOpOperand(0)` can be written as
`extractSliceOp.getSourceMutable()[0]`.
* `OperandRange` provides a read-only API to operands: `operator[]`
returns `Value`. `MutableOperandRange` now provides a mutable API:
`operator[]` returns `OpOperand &`, which can be used to set operands.

Note: The TableGen code generator could be changed to return `OpOperand
&` (instead of `MutableOperandRange`) for non-variadic and non-optional
arguments in a subsequent change. Then the `[0]` part in the above
example would no longer be necessary.
2023-09-18 09:43:03 +02:00
Matthias Springer
5cf714bb2f
[mlir][SCF] scf.for: Consistent API around initArgs (#66512)
* Always use the auto-generated `getInitArgs` function. Remove the
hand-written `getInitOperands` duplicate.
* Remove `hasIterOperands` and `getNumIterOperands`. The names were
inconsistent because the "arg" is called `initArgs` in TableGen. Use
`getInitArgs().size()` instead.
* Fix verification around ops with no results.
2023-09-18 09:13:43 +02:00
Dudeldu
ed8bd7176d
[MLIR] Allow comparison of opaque properties (#66378)
Add capabilities for comparing opaque properties. This is useful when
dealing with arbitrary operations which can be compare based on their
OperationName. Now you can furthermore compare their properties without
the need to determine their actual type.
2023-09-17 23:46:31 -07:00
Bharathi Ramana Joshi
ccf194b845
[MLIR][Presburger] Implement convertVarKind for PresburgerRelation 2023-09-17 20:02:16 +05:30
Christian Sigg
1c8c365de2
[mlir][bytecode] Check that bytecode source buffer is sufficiently aligned. (#66380)
Before this change, the `ByteCode` test failed on CentOS 7 with
devtoolset-9, because strings happen to be only 8 byte aligned. In
general though, strings have no alignment guarantee.

Increase resource alignment in test to 32 bytes. 
Adjust test to sufficiently align buffer.
Add test to check error when buffer is insufficiently aligned.
2023-09-17 13:46:01 +02:00
Tai Ly
3323702bff
[TOSA] Change EightK MaxScale to 256 (#66536)
This patch changes the MaxScale value for level EightK to 256. Also
updated affected level check tests


Change-Id: Id9cffd5eb9053bb688196cd5b3b55b3ddd2a359c

Signed-off-by: Tai Ly <tai.ly@arm.com>
2023-09-15 13:57:10 -07:00
Yinying Li
3dc621124f
[mlir][sparse] Migrate tests to use new syntax (#66543)
**COO**
`lvlTypes = [ "compressed_nu", "singleton" ]` to `map = (d0, d1) -> (d0
: compressed(nonunique), d1 : singleton)`
`lvlTypes = [ "compressed_nu_no", "singleton_no" ]` to `map = (d0, d1)
-> (d0 : compressed(nonunique, nonordered), d1 : singleton(nonordered))`

**SortedCOO**
`lvlTypes = [ "compressed_nu", "singleton" ]` to `map = (d0, d1) -> (d0
: compressed(nonunique), d1 : singleton)`

**BCOO**
`lvlTypes = [ "dense", "compressed_hi_nu", "singleton" ]` to `map = (d0,
d1, d2) -> (d0 : dense, d1 : compressed(nonunique, high), d2 :
singleton)`

**BCSR**
`lvlTypes = [ "compressed", "compressed", "dense", "dense" ], dimToLvl =
affine_map<(d0, d1) -> (d0 floordiv 2, d1 floordiv 3, d0 mod 2, d1 mod
3)>` to
`map = ( i, j ) ->
      ( i floordiv 2 : compressed,
        j floordiv 3 : compressed,
        i mod 2 : dense,
        j mod 3 : dense
      )`

**Tensor and other supported formats(e.g. CCC, CDC, CCCC)**

Currently, ELL and slice are not supported yet in the new syntax and the
CHECK tests will be updated once printing is set to output the new
syntax.

Previous PRs: #66146, #66309, #66443
2023-09-15 16:12:20 -04:00
Daniil Dudkin
01e80a0f41
[mlir] Add maxnumf and minnumf to AtomicRMWKind (#66442)
This commit adds the mentioned kinds of `AtomicRMWKind`
as well as code generation for them.
2023-09-15 22:41:51 +03:00
Ingo Müller
68033aaac5
[mlir][transform] Fix crash in transform.get_parent_op. (#66492)
The previous implementation crashed if run on a `builtin.module` using
an `op_name` filter (because the initial value of `parent` in the while
loop was a `nullptr`). This PR fixes the crash and adds a test.
2023-09-15 21:32:17 +02:00
Aart Bik
d2e787d5d7
[mlir][sparse][tensor] replace bufferization with empty tensor (#66450)
Rationale:
    A bufferization.alloc_tensor can be directly replaced
    with tensor.empty since these are more or less semantically
    equivalent. The latter is considered a bit more "pure"
    with respect to SSA semantics.
2023-09-15 11:45:42 -07:00
Anton Korobeynikov
51d5d7bbae
Extend retcon.once coroutines lowering to optionally produce a normal result (#66333)
One of the main user of these kind of coroutines is swift. There yield-once (`retcon.once`) coroutines are used to temporary "expose" pointers to internal fields of various objects creating borrow scopes.

However, in some cases it might be useful also to allow these coroutines to produce a normal result, but there is no convenient way to represent this (as compared to switched-resume kind of coroutines where C++ `co_return`
is transformed to a member / callback call on promise object).

The extension is simple: we allow continuation function to have a non-void result and accept optional extra arguments via a special `llvm.coro.end.result` intrinsic that would essentially forward them as normal results.
2023-09-15 09:54:38 -07:00
Prabhdeep Singh Soni
9b57b167bb [OMPIRBuilder] Fix shared clause for task construct
This patch fixes the shared clause for the task construct with multiple
shared variables. The shareds field in the kmp_task_t is not an inline
array in the struct, rather it is a pointer to an array. With an inline
array, the pointer dereference to the outlined function body of the task
would segmentation fault when accessed by the runtime.

Reviewed By: kiranchandramohan, jdoerfert

Differential Revision: https://reviews.llvm.org/D158462
2023-09-15 12:19:47 -04:00
Andrzej Warzyński
57cf6896cd
[mlir][vector] Fix vector.broadcast lowering for scalable vectors (#66344)
This patch makes sure that the following case is lowered correctly
("duplication"):
```
  func.func @broadcast_scalable_duplication(%arg0: vector<[32]xf32>) -> vector<1x[32]xf32> {
    %res = vector.broadcast %arg0 : vector<[32]xf32> to vector<1x[32]xf32>
    return %res : vector<1x[32]xf32>
  }
```
2023-09-15 16:35:47 +01:00
Andrzej Warzynski
cadabb58f1 [mlir][linalg][nfc] Fix typo in a file name
Follow-up of D158427
2023-09-15 15:31:36 +00:00
Andrzej Warzynski
c92124765e [mlir][Linalg] Move test
To avoid confusion with vectorization-masked.mlir, move

  test/Dialect/Linalg/masked_vectorization.mlir

to:

  test/Dialect/Linalg/transpose-compose-masked-vectorize-and-cleanups.mlir

The updated name better reflects what's being tested.

Differential Revision: https://reviews.llvm.org/D158427
2023-09-15 15:21:32 +00:00
vic
87d77d3cfb
[mlir][IR] Insert operations before SingleBlock's terminator (#65959)
Change `SingleBlock::{insert,push_back}` to avoid inserting the argument
operation after the block's terminator. This allows removing
`SingleBlockImplicitTerminator`'s functions with the same name.

Define `Block::hasTerminator` checking whether the block has a
terminator operation.

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
2023-09-15 16:17:35 +02:00
Benjamin Maxwell
665995b918 [mlir][Conversion] Allow lowering to fixed arrays of scalable vectors
This allows lowering vector types like: vector<3x[4]> or vector<3x2x[4]>
to LLVM IR, i.e. vectors where the trailing dim is scalable.

This is contingent on:
https://discourse.llvm.org/t/rfc-enable-arrays-of-scalable-vector-types/72935

More tests will be added in later patches, however, some MLIR fixes are
needed first.

Depends on: D158517

Reviewed By: awarzynski

Differential Revision: https://reviews.llvm.org/D158752
2023-09-15 09:33:18 +00:00
Martin Erhart
3d51010a33 Revert "[mlir][bufferization] Switch tests to new deallocation pass pipeline (#66471)"
This reverts commit ea42b49f10.

Some GPU integration tests are failing that I didn't observe locally.
Reverting until I have a fix.
2023-09-15 09:19:54 +00:00
Martin Erhart
ea42b49f10
[mlir][bufferization] Switch tests to new deallocation pass pipeline (#66471)
Use the new ownership based deallocation pass pipeline in the regression
and integration tests. Some one-shot bufferization tests tested one-shot
bufferize and deallocation at the same time. I removed the deallocation
pass there because the deallocation pass is already thoroughly tested by
itself.
2023-09-15 11:08:53 +02:00
Ingo Müller
f167dc4fa9
[mlir][linalg][transform][python] Clean up _ext.py test. (#66469)
This PR cleans up the test of the mix-ins of this dialect. Most of the
character diff is due to factoring out the creation of the the top-level
sequence into a decorator. This decorator siginficantly shortens the
definition of the individual tests and can be used in all but one test,
where the top-level op is a PDL op. The only functional diff is due to
the fact that the decator uses `transform.any_op` instead of
`pdl.operation` for the type of the root handle. The only remaining
usages of the PDL dialects is now in the test a PDL-related op.
2023-09-15 10:52:34 +02:00
Martin Erhart
08b7a71bcc
[mlir][bufferization] Define a pipeline for buffer deallocation (#66352)
Since ownership based buffer deallocation requires a few passes to be run in a somewhat fixed sequence, it makes sense to have a pipeline for convenience (and to reduce the number of transform ops to represent default deallocation).
2023-09-15 09:39:17 +02:00
Guray Ozen
9420fc4956
[MLIR] SM_90 integratation test of TMA 128x64xf16 and 64x64xf16 with 128b Swizzling (#65954)
The #65953 added a test `128x64xf16` that does a single TMA load. This
PR adds more complex test that does 2 additional TMA loads with 128B
Swizzling:
```
    TMA Load: Matrix-A[0:128][0:64]
    TMA Load: Matrix-B[0:64][0:64]
    TMA Load: Matrix-B[64:128][0:64]
```

The program tests the loaded data for Matrix-B.
2023-09-15 09:30:15 +02:00
Ingo Müller
5d3489e940
[mlir][transform][lingalg][python] Replace pdl.operation => transform.any_op. (#66392)
For some reason, the mix-ins of the Python bindings of this dialect used
the PDL type for "any op". However, PDL isn't involved here, so it makes
more sense to use the corresponding type of the transform dialect. This
PR changes that.
2023-09-15 09:06:07 +02:00
Kunwar Grover
62dbcc46bf
[IRDL] Fix CMake to generate IRDL dialect docs (#66371)
The CMake file for IRDL was not generating documentation on the mlir
website. This patch fixes this.
2023-09-15 11:27:13 +05:30
Christopher Bate
831041be79 [mlir][vector] Cleanup VectorUnroll and create a generic tile iteration utility
This change refactors some of the utilities used to unroll larger vector
computations into smaller vector computations. In fact, the indexing
computations used here are rather generic and are useful in other dialects or
downstream projects. Therefore, a utility for iterating over all possible tile
offsets for a particular pair of static (shape, tiled shape) is introduced in
IndexingUtils and replaces the existing computations in the vector unrolling
transformations. This builds off of the refactoring of IndexingUtils introduced
in 203fad476b.

Reviewed By: nicolasvasilache

Differential Revision: https://reviews.llvm.org/D150000
2023-09-14 20:34:44 -06:00
Jakub Kuderski
ed4daeaa13
[mlir][spirv][gpu] Add conversion for load/store/mad coop matrix ops (#66311)
This is plugged in as an alternative lowering path in the gpu to spirv
dialect conversion. Add custom op builders for coop matrix ops to make
the create functions nicer to work with and less error-prone. The latter
is accomplished by following the op syntax and also requiring stride to
be a constant op to avoid confusion around the order of arguments.

The remaining lowering patterns will be added in a future patch.
2023-09-14 22:16:27 -04:00
Stella Laurenzo
f66cd9e955
[mlir] Add Python bindings for DenseResourceElementsAttr. (#66319)
Only construction and type casting are implemented. The method to create
is explicitly named "unsafe" and the documentation calls out what the
caller is responsible for. There really isn't a better way to do this
and retain the power-user feature this represents.
2023-09-14 18:45:29 -07:00
Yinying Li
2a07f0fd40
[mlir][sparse] Migrate more tests to use new syntax (#66443)
**Dense**
`lvlTypes = [ "dense", "dense" ]` to `map = (d0, d1) -> (d0 : dense, d1
: dense)`
`lvlTypes = [ "dense", "dense" ], dimToLvl = affine_map<(i,j) -> (j,i)>`
to `map = (d0, d1) -> (d1 : dense, d0 : dense)`

**DCSR**
`lvlTypes = [ "compressed", "compressed" ]` to `map = (d0, d1) -> (d0 :
compressed, d1 : compressed)`

**DCSC**
`lvlTypes = [ "compressed", "compressed" ], dimToLvl = affine_map<(i,j)
-> (j,i)>` to `map = (d0, d1) -> (d1 : compressed, d0 : compressed)`

**Block Row**
`lvlTypes = [ "compressed", "dense" ]` to `map = (d0, d1) -> (d0 :
compressed, d1 : dense)`

**Block Column**
`lvlTypes = [ "compressed", "dense" ], dimToLvl = affine_map<(i,j) ->
(j,i)>` to `map = (d0, d1) -> (d1 : compressed, d0 : dense)`

This is an ongoing effort: #66146, #66309
2023-09-14 23:19:57 +00:00
Aart Bik
8998bcfbce
[mlir][sparse][gpu] refine type of workspace size variables (#66438)
Rationale:
Some compiler settings don't like the size_t vs uint64_t setup.
2023-09-14 15:49:52 -07:00
Daniil Dudkin
331ebb0783
[mlir][arith] Add LLVM lowering for maxnumf, minnumf ops (#66431)
This patch is part of a larger initiative aimed at fixing floating-point
`max` and `min` operations in MLIR:
https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.
    
The commit addresses the task 1.4 of the RFC by adding LLVM lowering to
the corresponding LLVM intrinsics.
    
Please **note**: this PR is part of a stack of patches and depends on
#66429.
2023-09-15 01:17:05 +03:00
Fabian Mora
5093413a50
[mlir][gpu][NVPTX] Enable NVIDIA GPU JIT compilation path (#66220)
This patch adds an NVPTX compilation path that enables JIT compilation
on NVIDIA targets. The following modifications were performed:
1. Adding a format field to the GPU object attribute, allowing the
translation attribute to use the correct runtime function to load the
module. Likewise, a dictionary attribute was added to add any possible
extra options.

2. Adding the `createObject` method to `GPUTargetAttrInterface`; this
method returns a GPU object from a binary string.

3. Adding the function `mgpuModuleLoadJIT`, which is only available for
NVIDIA GPUs, as there is no equivalent for AMD.

4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify
the format to use during testing.
2023-09-14 18:00:27 -04:00
Daniil Dudkin
6f4a528698
[mlir][memref] Use dedicated ops in AtomicRMWOpConverter (#66437)
This patch refactors the `AtomicRMWOpConverter` class to use
the dedicated operations from Arith dialect instead of using
`cmpf` + `select` pattern.
Also, a test for `minimumf` kind of `atomic_rmw` has been added.
2023-09-15 00:52:35 +03:00
Daniil Dudkin
ca8cba76f9
[mlir][arith] Introduce minnumf and maxnumf operations (#66429)
This patch is part of a larger initiative aimed at fixing floating-point
`max` and `min` operations in MLIR:
https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.

Here we introduce new operations for floating-point numbers: `minnum`
and `maxnum`.
These operations have different semantics than `minumumf` and `maximumf`
ops.
They follow the eponymous LLVM intrinsics semantics, which differs
in the handling positive and negative zeros and NaNs.

This patch addresses the 1.3 task from the RFC.
2023-09-15 00:49:58 +03:00
Arthur Eubanks
0a1aa6cda2
[NFC][CodeGen] Change CodeGenOpt::Level/CodeGenFileType into enum classes (#66295)
This will make it easy for callers to see issues with and fix up calls
to createTargetMachine after a future change to the params of
TargetMachine.

This matches other nearby enums.

For downstream users, this should be a fairly straightforward
replacement,
e.g. s/CodeGenOpt::Aggressive/CodeGenOptLevel::Aggressive
or s/CGFT_/CodeGenFileType::
2023-09-14 14:10:14 -07:00
Jakub Kuderski
12175bcbce
[mlir][spirv] Support coop matrix in spirv.CompositeConstruct (#66399)
Also improve the documentation (code and website).
2023-09-14 16:57:59 -04:00
Christopher Bate
cafb6284d1 [mlir][VectorToGPU] Update memref stride preconditions on nvgpu.mma.sync path
This change removes the requirement that the row stride be statically known when
converting `vector.transfer_read` and `vector.transfer_write` to distributed
SIMT operations in the `nvgpu` lowering path. It also adds a check to verify
that the last dimension of the source memref is statically known to have stride
1 since this is assumed in the conversion logic.  No other change should be
required since the generated `vector.load` operations are never created across
dimensions other than the last. The routines for checking preconditions on
`vector.transfer_read/write` are moved to under nvgpu utilities.

The change is NFC with respect to the GPU dialect lowering path.

Reviewed By: ThomasRaoux

Differential Revision: https://reviews.llvm.org/D155753
2023-09-14 13:51:42 -06:00
Christopher Bate
e2d39f799b [mlir][Transform] Add updateConversionTarget to ConversionPatternDescriptorOpInterface
This change adds a method to modify the ConversionTarget used during
`transform.apply_conversion_patterns` to the
`ConversionPatternDescriptorOpInterface`. This is needed when the TypeConverter
is used to dictate the dynamic legality of operations, as in "structural"
conversion patterns present in, for example, the SCF and func dialects.

As a first use case/test, this change also adds a
`transform.apply_patterns.scf.structural_conversions` operation to the SCF
dialect.

Reviewed By: springerm

Differential Revision: https://reviews.llvm.org/D158672
2023-09-14 11:39:47 -06:00
Aart Bik
156a4ba9b4
[mlir][sparse] deprecate the convert{To,From}MLIRSparseTensor methods (#66304)
Rationale:
These libraries provided COO input and output at external boundaries
which, since then, has been generalized to the much more powerful pack
and unpack operations of the sparse tensor dialect.
2023-09-14 10:02:29 -07:00
Yinying Li
e2e429d994
[mlir][sparse] Migrate more tests to new syntax (#66309)
CSR:
`lvlTypes = [ "dense", "compressed" ]` to `map = (d0, d1) -> (d0 :
dense, d1 : compressed)`

CSC:
`lvlTypes = [ "dense", "compressed" ], dimToLvl = affine_map<(d0, d1) ->
(d1, d0)>` to `map = (d0, d1) -> (d1 : dense, d0 : compressed)`

This is an ongoing effort: #66146
2023-09-14 12:21:13 -04:00
Ingo Müller
360c629024
[mlir][linalg][transform][python] Drop _get_op_result... from mix-ins. (#65726)
`_get_op_result_or_value` was used in mix-ins to unify the handling of
op results and values. However, that function is now called in the
generated constructors, such that doing so in the mix-ins is not
necessary anymore.
2023-09-14 17:24:16 +02:00
Matthias Springer
aca9019be0
[mlir][transform] Check for invalidated iterators on payload IR mappings (#66369)
Add extra error checking (in debug mode) to detect cases where an
iterator on "direct" payload IR mappings is invalidated (due to elements
being removed). Such errors are hard to debug: they are often
non-deterministic; sometimes the program crashes, sometimes it produces
wrong results. Even when it crashes, the stack trace often points to
completely unrelated code locations.

Store a timestamp with each "direct" mapping. The timestamp is increased
whenever an operation is performed that invaldiates an iterator on that
mapping. A debug iterator is added that checks the timestamp as payload
IR is enumerated.
2023-09-14 16:34:32 +02:00
Martin Erhart
66aa9a2517
[mlir][bufferization] Implement BufferDeallocationopInterface for scf.forall.in_parallel (#66351)
The scf.forall.in_parallel terminator operation has a nested graph region with the NoTerminator trait. Such regions are not supported by the default implementations. Therefore, this commit adds a specialized implementation for
this operation which only covers the case where the nested region is empty.
This is because after bufferization, ops like tensor.parallel_insert_slice were already converted to memref operations residing int the scf.forall only and the nested region of scf.forall.in_parallel ends up empty.
2023-09-14 16:20:24 +02:00
Nicolas Vasilache
02d27eac0f [mlir][Linalg] NFC - Simplify transform script 2023-09-14 15:32:35 +02:00
Matthias Springer
4f63252d5d
[mlir][transform] Fix crash when op is erased during transform.foreach (#66357)
Fixes a crash when an op, that is mapped to handle that a
`transform.foreach` iterates over, was erased (through the
`TrackingRewriter`). Erasing an op removes it from all mappings and
invalidates iterators. This is already taken care of when an op is
iterating over payload ops in its `apply` method, but not when another
transform op is erasing a tracked payload op.
2023-09-14 14:59:36 +02:00
Martin Erhart
942ce31985
[mlir][bufferization] BufferDeallocationOpInterface: support custom ownership update logic (#66350)
Add a method to the BufferDeallocationOpInterface that allows operations to implement the interface and provide custom logic to compute the ownership indicators of values it defines. As a demonstrating example, this new method is implemented by the `arith.select` operation.
2023-09-14 14:34:04 +02:00
Martin Erhart
8160bce969
[mlir][bufferization][NFC] Introduce BufferDeallocationOpInterface (#66349)
This new interface allows operations to implement custom handling of ownership values and insertion of dealloc operations which is useful when an op cannot implement the interfaces supported by default by the buffer deallocation pass (e.g., because they are not exactly compatible or because there are some additional semantics to it that would render the default implementations in buffer deallocation invalid, or because no interfaces exist for this
kind of behavior and it's not worth introducing one plus a default implementation in buffer deallocation). Additionally, it can also be used to provide more efficient handling for a specific op than the interface based default
implementations can.
2023-09-14 13:58:30 +02:00
Felix Schneider
c336a06144 [mlir] [memref] Fix alignment bug in memref.copy lowering
memref.copy gets lowered to a function call sometimes, this function
is passed the element size of the memref in bytes as an argument.
The element size passed to the copyMemRef() function call can be
miscalculated if the LLVM IR uses aligned access to the memory.

This can be fixed by using llvm.getelementptr to calculate the element
size natively. This is also done in the other lowering path that lowers
to an intrinsic.

Fix https://github.com/llvm/llvm-project/issues/64072

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D156126
2023-09-14 13:18:12 +02:00