This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
We'll need to generalize this fold to check for any zero upperbits to address some of the D155472 regressions, but this exposes a number of issues. For now, just use the general MaskedValueIsZero test instead of the assertzext.
Thanks to Giuseppe D'Angelo for pointing this out on the cpplang Slack!
The example implementation in https://eel.is/c++draft/string.view.comparison#example-1
was necessary when it was written, in C++17, but in C++20 we don't need that
complexity anymore, because of the reversed candidates that are
synthesized by the compiler.
Update LiveIntervals after rewriting:
%reg = INSERT_SUBREG undef %reg, %subreg, subidx
to:
undef %reg:subidx = COPY %subreg
D113044 implemented this for the non-undef case.
This gets rid of the separate parameter enable_modules_lsv in favor of
adding a named option to the enable_modules parameter. The patch also
removes the getModuleFlag helper, which was just a really complicated
way of hardcoding "none".
This revision adds support for empty tensor elimination to
"bufferization.materialize_in_destination" by implementing the
`SubsetInsertionOpInterface`.
Furthermore, the One-Shot Bufferize conflict detection is improved for
"bufferization.materialize_in_destination".
This patch simplifies the pattern `icmp X and/or C1, X and/or C2` when
one constant mask is the subset of the other.
If `C1 & C2 == C1`, `A = X and/or C1`, `B = X and/or C2`, we can do the
following folds:
`icmp ule A, B -> true`
`icmp ugt A, B -> false`
We can apply similar folds for signed predicates when `C1` and `C2` are
the same sign:
`icmp sle A, B -> true`
`icmp sgt A, B -> false`
Alive2: https://alive2.llvm.org/ce/z/Q4ekP5Fixes#65833.
This patch moves the group of OpenMP MLIR passes using after lowering of
Fortran to MLIR into a pipeline to be shared by `flang-new` and `bbc`.
Currently, the `bbc` tool does not produce the expected FIR for offloading-
enabled OpenMP codes due to not running these passes.
Unit tests exercising these passes are updated to check `bbc` output as well.
…(trunci) expansion
This revision adds a rewrite for sequences of vector `bitcast(trunci)`
to use a more efficient sequence of vector operations comprising
`shuffle` and `bitwise` ops.
Such patterns appear naturally when writing quantization /
dequantization functionality with the vector dialect.
The rewrite performs a simple enumeration of each of the bits in the
result vector and determines its provenance in the pre-trunci vector.
The enumeration is used to generate the proper sequence of `shuffle`,
`andi`, `ori` followed by an optional final `trunci`/`extui`.
The rewrite currently only applies to 1-D non-scalable vectors and bails
out if the final vector element type is not a multiple of 8. This is a
failsafe heuristic determined empirically: if the resulting type is not
an even number of bytes, further complexities arise that are not
improved by this pattern: the heavy lifting still needs to be done by
LLVM.
Summary:
This patch copies a config file for the GPU similar to the
baremetal/embedded implementation. This will configure the
implementations of functions like `sprintf` and `snprintf` to be
compiled into more simple versions that can be run on the GPU. These
functions cannot be enabled yet as Vararg support hasn't landed, but it
will be used then.
It is possible for a derived type extending a type with private
components to define components with the same name as the private
components.
This was not properly handled by lowering where several fir.record type
component names could end-up being the same, leading to bad generated
code (only the first component was accessed via fir.field_index, leading
to bad generated code).
This patch handles the situation by adding the derived type mangled name
to private component.
Stack-move optimization, the optimization that merges src and dest
alloca of the full-size copy, replaces all uses of the dest alloca with
src alloca. For safety, we needed to check all uses of the dest alloca
locations are dominated by src alloca, to be replaced. This PR adds the
check for that.
Fixes#65225
If the R_AARCH64_CALL26 against a symbol that has a lower address, then
encodeValueAArch64 will return a wrong value.
Reviewed By: Kepontry, yota9
Differential Revision: https://reviews.llvm.org/D159513
Following on from D139466 which added support for dead vreg defs, this
patch adds support for "undef" defs of subregs.
Use this to regenerate checks for amx-greedy-ra-spill-shape.ll which
previously required manual tweaks to the autogenerated checks to fix an
EXPENSIVE_CHECKS failure; see commit
8b7c1fbd96
Use the new ownership based deallocation pass pipeline in the regression
and integration tests. Some one-shot bufferization tests tested one-shot
bufferize and deallocation at the same time. I removed the deallocation
pass there because the deallocation pass is already thoroughly tested by
itself.
Fixed version of #66471
BEFORE this patch, when clang handles constraints like C1 || C2 where C1 evaluates to false and C2 evaluates to true, it emitted irrelevant diagnostics about the falsity of C1.
This patch removes the irrelevant diagnostic information generated during the evaluation of C1 if C2 evaluates to true.
Fixes https://github.com/llvm/llvm-project/issues/54678
Reviewed By: erichkeane
Differential Revision: https://reviews.llvm.org/D157526
So that the values that are accessed via such accessors can be analyzed
as a limited version of context-sensitive analysis. We can potentially
do this only when some option is set, but doing additional modeling like
this won't be expensive and intrusive, so we do it by default for now.
`operator[]` returns `OpOperand &` instead of `Value`.
* This allows users to get OpOperands by name instead of "magic" number.
E.g., `extractSliceOp->getOpOperand(0)` can be written as
`extractSliceOp.getSourceMutable()[0]`.
* `OperandRange` provides a read-only API to operands: `operator[]`
returns `Value`. `MutableOperandRange` now provides a mutable API:
`operator[]` returns `OpOperand &`, which can be used to set operands.
Note: The TableGen code generator could be changed to return `OpOperand
&` (instead of `MutableOperandRange`) for non-variadic and non-optional
arguments in a subsequent change. Then the `[0]` part in the above
example would no longer be necessary.
It maybe emit the .option directive without any follow up. Only emit the .option push/pop when there are supported extension difference between function and module.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D159399
* Always use the auto-generated `getInitArgs` function. Remove the
hand-written `getInitOperands` duplicate.
* Remove `hasIterOperands` and `getNumIterOperands`. The names were
inconsistent because the "arg" is called `initArgs` in TableGen. Use
`getInitArgs().size()` instead.
* Fix verification around ops with no results.
Source allocation with a zero sized array is legal, and the resulting
allocatable/pointer should be allocated/associated.
The current code skipped the actual allocation, leading the allocatable
or pointer to look unallocated/disassociated.
Add capabilities for comparing opaque properties. This is useful when
dealing with arbitrary operations which can be compare based on their
OperationName. Now you can furthermore compare their properties without
the need to determine their actual type.
/data/llvm-project/llvm/lib/ExecutionEngine/Orc/TargetProcess/JITLoaderPerf.cpp:118:10: error: variable 'Written' set but not used [-Werror,-Wunused-but-set-variable]
size_t Written = 0;
^
1 error generated.
This patch ports PerfJITEventListener to a JITLink plugin, but adds unwind
record support and drops debuginfo support temporarily. Debuginfo can be
enabled in the future by providing a way to obtain a DWARFContext from a
LinkGraph.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D146169
We can get `BlockAddress` in user code via `Labels as Values` so
we should be able to merge the access to `BlockAddress`.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D159429