The method implementations remain in the .cpp file; explicit instantiations have been added for these two types.
makeMatrix has been duplicated to makeIntMatrix and makeFracMatrix.
…cast) expansion
This revision adds a rewrite for sequences of vector `ext(bitcast)` to
use a more efficient sequence of vector operations comprising `shuffle`
and `bitwise` ops.
Such patterns appear naturally when writing quantization /
dequantization functionality with the vector dialect.
The rewrite performs a simple enumeration of each of the bits in the
result vector and determines its provenance in the source vector. The
enumeration is used to generate the proper sequence of `shuffle`,
`andi`, `ori` with shifts`.
The rewrite currently only applies to 1-D non-scalable vectors and bails
out if the final vector element type is not a multiple of 8. This is a
failsafe heuristic determined empirically: if the resulting type is not
an even number of bytes, further complexities arise that are not
improved by this pattern: the heavy lifting still needs to be done by
LLVM.
This patch adds `host_assoc` attribute for operations that implement
FortranVariableInterface (e.g. `hlfir.declare`). The attribute is used
by the alias analysis to make better conclusions about memory overlap.
For example, a dummy argument of an inner subroutine and a host's
variable used inside the inner subroutine cannot refer to the same
object (if the dummy argument does not satisify exceptions in F2018
15.5.2.13).
This closes a performance gap between HLFIR optimization pipeline
and FIR ArrayValueCopy for Polyhedron/nf.
…lues
Emit an error at runtime when a list-directed input value is not
followed by a value separator or end of record. Previously, the runtime
I/O library was consuming as many input characters that were valid for
the type of the value, and leaving any remaining characters for the next
input edit, if any.
As suggested by Greg in https://github.com/llvm/llvm-project/pull/66534,
I'm adding a setting at the Target level that controls whether to show
leading zeroes in hex ValueObject values.
This has the benefit of reducing the amount of characters displayed in
certain interfaces, like VSCode.
When checking to see if our index expressions can be converted into strided
operations, we previously gave up if the index type wasn't an exact match for
the intptrty for the address. Per gep semantics, this mismatch implies a sext
or trunc cast to the respective index type. For constants, go ahead and
evaluate that cast instead of giving up.
Note that the motivation of this is mostly test cleanup. We canonicalize at IR
such that the gep index will match the intptrty. This is mostly useful so that
we can write both RV32 and RV64 tests from the same source. Its also helpful in
preventing confusion - I've stumbled across this at least four times now and
wasted time each one.
Note: The test change for scatters unit stride cases contains a minor
regression for rv32 and 64 bit indices. This is an artifact of order in which
changes are landing. This will be addressed in a near future change for all
configurations.
When the old extension of D debug lines in fixed form source is enabled,
recognize continuation lines that begin with a D.
(No test is added, since the community's driver doesn't support the GNU
Fortran option -fd-lines-as-code.)
The rewriting of the extension intrinsic function SIZEOF was producing
results that would reference symbols that were not available in the
current scope, leading to crashes in lowering. The symbols could be
function result variables, for SIZEOF(func()), or bare derived type
component names, for SIZEOF(array(n)%component). Fixing this without
regressing on a current test case involved careful threading of some
state through the TypeAndShape characterization code and the
shape/bounds analyzer, and some clean-up was done along the way.
The Z#_HI register definitions were created during the very early SVE
enablement work and before the SVE calling convention was locked in.
As they look entirely unused, they need to go.
If the SRL for Hi constant folds, but we don't remoe those bits from
the Lo, we can end up with strange constant folding through DAGCombine later.
I've only seen this with constants being lowered to constant pools
during lowering on RISC-V.
On the first split we create two i32 trunc stores and a srl to shift
the high part down. The srl gets constant folded, but to produce
a new i32 constant. But the truncstore for the low store still uses
the original constant.
This original constant then gets converted to a constant pool
before we revisit the stores to further split them. The constant
pool prevents further constant folding of the additional srls.
After legalization is done, we run DAGCombiner and get some constant
folding of srl via computeKnownBits which can peek through the constant
pool load. This can create new constants that also need a constant pool.
When running multiple shards in parallel, one shard might write to the
cache while another one is reading this cache. Instead of updating the
file in place, write to a temporary file and swap the cache file using
os.replace(). This is an atomic operation and means shards will either
see the old state or the new one.
This is a minor step towards deprecating bufferization.alloc_tensor().
It replaces the examples with tensor.empty() and adjusts the underlying
rewriting logic to prepare for this upcoming change.
The intrinsic uses ImmArg so TargetConstant would be consistent
with how other intrinsics are handled.
This hides the constants from type legalization so we can remove
the promotion support.
isel patterns are updated accordingly.
Fold references to the (relatively new) intrinsic function NORM2 at
compilation time when the argument(s) are all constants. (Getting this
done right involved some changes to the API of the accumulator function
objects used by the DoReduction<> template, which rippled through some
other reduction function folding code.)
Use the recently introduced llvm.mlir.zero operation for values with
LLVM target extension type. Replaces the previous workaround that uses a
single zero-valued integer attribute constant operation.
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
This is a fix for a subset of legalization problems around 64 bit indices on
rv32 targets. For RV32+V, we were using the wrong mask type for the manual
truncation lowering for fixed length vectors. Instead, just use the generic
TRUNCATE node, and let it be lowered as needed.
Note that legalization is still broken for rv32+zve32. That appears to be
a different issue.
The POINTER= and TARGET= arguments to the intrinsic function
ASSOCIATED() can be the results of references to functions that return
object pointers or procedure pointers. NULL() was working well but not
program-defined pointer-valued functions. Correct the validation of
ASSOCIATED() and extend the infrastructure used to detect and
characterize procedures and pointers.
Some VP intrinsic definitions were missing the
VP_PROPERTY_FUNCTIONAL_INTRINSIC property. This patch fills them in, and
adds a static_assert that all VP intrinsics have an equivalent opcode or
intrinsic defined so we don't forget them in future.
Some VP intrinsics don't have an equivalent, namely merge and strided
load/store. For those, a new property was added to mark that they don't
have a non-VP equivalent.
This adds a helper method to get the ID of the functionally equivalent
intrinsic, similar to the existing getFunctionalOpcodeForVP and
getConstrainedIntrinsicIDForVP method.
Summary:
The GPU build has a lot of magic around how we package the output.
Generally, the GPU needs to exist as a secondary fatbinary image for
offloading languages. This is because offloading languages pretend like
offloading to an accelerator is a single file. This then needs to be put
into a single file to make it mesh with the existing build
infrastructure. To work with this, the `libc` makes an installed version
of the library that simply embeds the GPU code into an empty stub file.
This wasn't being updated correctly, which lead to the installed `libc`
static library not being updated correctly when the underlying file was
changed. The previous behaviour only updated when the entrypoint itself
was modified, but not any of its headers. By adding a dependcy on the
actual *object* file we should now capture the regular CMake semantics.
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
We'll need to generalize this fold to check for any zero upperbits to address some of the D155472 regressions, but this exposes a number of issues. For now, just use the general MaskedValueIsZero test instead of the assertzext.
Thanks to Giuseppe D'Angelo for pointing this out on the cpplang Slack!
The example implementation in https://eel.is/c++draft/string.view.comparison#example-1
was necessary when it was written, in C++17, but in C++20 we don't need that
complexity anymore, because of the reversed candidates that are
synthesized by the compiler.
Update LiveIntervals after rewriting:
%reg = INSERT_SUBREG undef %reg, %subreg, subidx
to:
undef %reg:subidx = COPY %subreg
D113044 implemented this for the non-undef case.
This gets rid of the separate parameter enable_modules_lsv in favor of
adding a named option to the enable_modules parameter. The patch also
removes the getModuleFlag helper, which was just a really complicated
way of hardcoding "none".
This revision adds support for empty tensor elimination to
"bufferization.materialize_in_destination" by implementing the
`SubsetInsertionOpInterface`.
Furthermore, the One-Shot Bufferize conflict detection is improved for
"bufferization.materialize_in_destination".
This patch simplifies the pattern `icmp X and/or C1, X and/or C2` when
one constant mask is the subset of the other.
If `C1 & C2 == C1`, `A = X and/or C1`, `B = X and/or C2`, we can do the
following folds:
`icmp ule A, B -> true`
`icmp ugt A, B -> false`
We can apply similar folds for signed predicates when `C1` and `C2` are
the same sign:
`icmp sle A, B -> true`
`icmp sgt A, B -> false`
Alive2: https://alive2.llvm.org/ce/z/Q4ekP5Fixes#65833.
This patch moves the group of OpenMP MLIR passes using after lowering of
Fortran to MLIR into a pipeline to be shared by `flang-new` and `bbc`.
Currently, the `bbc` tool does not produce the expected FIR for offloading-
enabled OpenMP codes due to not running these passes.
Unit tests exercising these passes are updated to check `bbc` output as well.
…(trunci) expansion
This revision adds a rewrite for sequences of vector `bitcast(trunci)`
to use a more efficient sequence of vector operations comprising
`shuffle` and `bitwise` ops.
Such patterns appear naturally when writing quantization /
dequantization functionality with the vector dialect.
The rewrite performs a simple enumeration of each of the bits in the
result vector and determines its provenance in the pre-trunci vector.
The enumeration is used to generate the proper sequence of `shuffle`,
`andi`, `ori` followed by an optional final `trunci`/`extui`.
The rewrite currently only applies to 1-D non-scalable vectors and bails
out if the final vector element type is not a multiple of 8. This is a
failsafe heuristic determined empirically: if the resulting type is not
an even number of bytes, further complexities arise that are not
improved by this pattern: the heavy lifting still needs to be done by
LLVM.
Summary:
This patch copies a config file for the GPU similar to the
baremetal/embedded implementation. This will configure the
implementations of functions like `sprintf` and `snprintf` to be
compiled into more simple versions that can be run on the GPU. These
functions cannot be enabled yet as Vararg support hasn't landed, but it
will be used then.