Summary:
This is much cleaner, and fits the same structure as many other tablegen backends. This was not done originally as the CRTP in the pass classes made it overly verbose/complex.
Differential Revision: https://reviews.llvm.org/D77367
This revision removes all of the CRTP from the pass hierarchy in preparation for using the tablegen backend instead. This creates a much cleaner interface in the C++ code, and naturally fits with the rest of the infrastructure. A new utility class, PassWrapper, is added to replicate the existing behavior for passes not suitable for using the tablegen backend.
Differential Revision: https://reviews.llvm.org/D77350
ModulePass doesn't provide any special utilities and thus doesn't give enough benefit to warrant a special pass class. This revision replaces all usages with the more general OperationPass.
Differential Revision: https://reviews.llvm.org/D77339
Summary:
Add directive to indicate the location to give to op being created. This
directive is optional and if unused the location will still be the fused
location of all source operations.
Currently this directive only works with other op locations, reusing an
existing op location or a fusion of op locations. But doesn't yet support
supplying metadata for the FusedLoc.
Based off initial revision by antiagainst@ and effectively mirrors GlobalIsel
debug_locations directive.
Differential Revision: https://reviews.llvm.org/D77649
Summary:
* Removal of FxpMathOps was discussed on the mailing list.
* Will send a courtesy note about also removing the Quantizer (which had some dependencies on FxpMathOps).
* These were only ever used for experimental purposes and we know how to get them back from history as needed.
* There is a new proposal for more generalized quantization tooling, so moving these older experiments out of the way helps clean things up.
Subscribers: mgorny, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, grosul1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77479
Summary: Diagnostics may be cached in the parallel diagnostic handler to preserve proper ordering. Storing the Operation as a DiagnosticArgument is problematic as the operation may be erased or changed before it finally gets printed.
Differential Revision: https://reviews.llvm.org/D77675
If we have two back-to-back loops with block arguments, the OpPhi
instructions generated for the second loop's block arguments should
have use the merge block of the first SPIR-V loop structure as
their incoming parent block.
Differential Revision: https://reviews.llvm.org/D77543
Introduce the alloca op for stack memory allocation. When converting to the
LLVM dialect, this is lowered to an llvm.alloca. Refactor the std to
llvm conversion for alloc op to reuse with alloca. Drop useAlloca option
with alloc op lowering.
Differential Revision: https://reviews.llvm.org/D76602
Fix point-wise copy generation to work with bounds that have max/min.
Change structure of copy loop nest to use absolute loop indices and
subtracting base from the indexes of the fast buffers. Update supporting
utilities: Fix FlatAffineConstraints::getLowerAndUpperBound to look at
equalities as well and for a missing division. Update unionBoundingBox
to not discard common constraints (leads to a tighter system). Update
MemRefRegion::getConstantBoundingSizeAndShape to add memref dimension
constraints. Run removeTrivialRedundancy at the end of
MemRefRegion::compute. Run single iteration loop promotion and
load/store canonicalization after affine data copy (in its test pass as
well).
Differential Revision: https://reviews.llvm.org/D77320
Now that we have scalable vectors, there's a distinction that isn't
getting captured in the original SequentialType: some vectors don't have
a known element count, so counting the number of elements doesn't make
sense.
In some cases, there's a better way to express the commonality using
other methods. If we're dealing with GEPs, there's GEP methods; if we're
dealing with a ConstantDataSequential, we can query its element type
directly.
In the relatively few remaining cases, I just decided to write out
the type checks. We're talking about relatively few places, and I think
the abstraction doesn't really carry its weight. (See thread "[RFC]
Refactor class hierarchy of VectorType in the IR" on llvmdev.)
Differential Revision: https://reviews.llvm.org/D75661
Summary: This revision updates the value numbering when printing to number from the next parent operation that is isolated from above. This is the highest level to number from that still ensures thread-safety. This revision also changes the behavior of Operator::operator<< to use local scope to avoid thread races when numbering operations.
Differential Revision: https://reviews.llvm.org/D77525
Summary:
This revision adds a tensor_reshape operation that operates on tensors.
In the tensor world the constraints are less stringent and we can allow more
arbitrary dynamic reshapes, as long as they are contractions.
The expansion of a dynamic dimension into multiple dynamic dimensions is under-specified and is punted on for now.
Differential Revision: https://reviews.llvm.org/D77360
The current return type sometimes leads to code like
to_vector<2>(ValueRange(loop.getInductionIvs())). It would be nice to
shorten it. Users who need access to Block::BlockArgListType (if there
are any), can always call getBody()->getArguments(); if needed.
Also remove getNumInductionVars(), since there is getNumLoops().
Differential Revision: https://reviews.llvm.org/D77526
Summary:
This revision performs several cleanups on the translation infra:
* Removes the TranslateCLParser library and consolidates into Translation
- This was a weird library that existed in Support, and didn't really justify being a standalone library.
* Cleans up the internal registration and consolidates all of the translation functions within one registry.
Differential Revision: https://reviews.llvm.org/D77514
Summary: Blocks are numbered locally within a region, so numbering above the parent region is unnecessary.
Differential Revision: https://reviews.llvm.org/D77510
Even if this indicates in general a problem at call sites, the printer
is used for debugging and avoiding crashing is friendlier for example
when used in diagnostics or other printer.
Differential Revision: https://reviews.llvm.org/D77481
Add a pattern rewriter utility to erase blocks (while notifying the
pattern rewriting driver of the erased ops). Use this to remove trivial
else blocks in affine.if ops.
Differential Revision: https://reviews.llvm.org/D77083
Removing dead ops should make the outer loop of the pattern rewriting
driver run again. Although its operands are added to the worklist, if no
changes happenned to them or remaining ops in the worklist, the driver
wouldn't run once again - but it should be.
Differential Revision: https://reviews.llvm.org/D77483
The ForOp::build ensures that there is a block terminator which is great for
the default use case when there are no iter_args and loop.for returns no
results. In non-zero results case we always need to call replaceOpWithNewOp
which is not the nicest thing in the world. We can stop inserting YieldOp when
iter_args is non-empty. IfOp::build already behaves similarly.
Summary: This revision adds support for marking the last region as variadic in the ODS region list with the VariadicRegion directive.
Differential Revision: https://reviews.llvm.org/D77455
Summary: It is a very common user trap to think that the location printed along with the diagnostic is the same as the current operation that caused the error. This revision changes the behavior to always print the current operation, except for when diagnostics are being verified. This is achieved by moving the command line flags in IR/ to be options on the MLIRContext.
Differential Revision: https://reviews.llvm.org/D77095
Summary:
A recent extension allowed the `loop.if` operation to return results yielded by
its regions. However, such operations could not be lowered to a CFG of standard
operations because it would have required to modify the argument list of a
block, which is not allowed in a conversion pattern. Now that the conversion
infrastructure supports block creation, use it to create a block with an
argument list that dominates the operations following the `loop.if` and forward
the results as arguments of this block.
Depends On D77416
Differential Revision: https://reviews.llvm.org/D77418
Summary:
Linalg makes it possible to interface codegen with externally precompiled HPC libraries. The mechanism to allow such interop uses a normalized ABI and the emission of C interface wrappers.
The mechanism controlling these C interface emission is too aggressive and makes it very easy to obtained undefined symbols for external function (e.g. the ones coming from libm).
This revision uses the newly introduced llvm.emit_c_interface function attribute which allows controlling this behavior at a function granularity. As a consequence LinalgToLLVM does not need to activate the C wrapper emission when adding the StdToLLVM patterns.
Differential Revision: https://reviews.llvm.org/D77364
PatternRewriter and derived classes provide a set of virtual methods to
manipulate blocks, which ConversionPatternRewriter overrides to keep track of
the manipulations and undo them in case the conversion fails. However, one can
currently create a block only by splitting another block into two. This not
only makes the API inconsistent (`splitBlock` is allowed in conversion
patterns, but `createBlock` is not), but it also make it impossible for one to
create blocks with argument lists different from those of already existing
blocks since in-place block updates are not supported either. Such
functionality precludes dialect conversion infrastructure from being used more
extensively on region-containing ops, for example, for value-returning "if"
operations. At the same time, ConversionPatternRewriter already allows one to
undo block creation as block creation is one of the primitive operations in
already supported region inlining.
Support block creation in conversion patterns by hooking `createBlock` on the
block action undo mechanism. This requires to make `Builder::createBlock`
virtual, similarly to Op insertion. This is a minimal change to the Builder
infrastructure that will later help support additional use cases such as block
signature changes. `createBlock` now additionally takes the types of the block
arguments that are added immediately so as to avoid in-place argument list
manipulation that would be illegal in conversion patterns.
Two back-to-back transpose operations are combined into a single transpose, which uses a combination of their permutation vectors.
Differential Revision: https://reviews.llvm.org/D77331
A certain number of EDSCs have a named form (e.g. `linalg.matmul`) and a generic form (e.g. `linalg.generic` with matmul traits).
Despite living in different namespaces, using the same name is confusiong in clients.
Rename them as `linalg_matmul` and `linalg_generic_matmul` respectively.
C interface emission is controlled by a flag and has coarse granularity.
With this coarse control, interfaces are emitted for all external functions.
This makes is easy to get undefined symbols.
This revision adds support for controlling per-function emission with an "emit_c_interface" attribute.
Summary:
LLVM IR functions can have arbitrary attributes attached to them, some of which
affect may affect code transformations. Until we can model all attributes
consistently, provide a pass-through mechanism that forwards attributes from
the LLVMFuncOp in MLIR to LLVM IR functions during translation. This mechanism
relies on LLVM IR being able to recognize string representations of the
attributes and performs some additional checking to avoid hitting assertions
within LLVM code.
Differential Revision: https://reviews.llvm.org/D77072
Add a method that given an affine map returns another with just its unique
results. Use this to drop redundant bounds in max/min for affine.for. Update
affine.for's canonicalization pattern and createCanonicalizedForOp to use
this.
Differential Revision: https://reviews.llvm.org/D77237
Modernize/cleanup code in loop transforms utils - a lot of this code was
written prior to the currently available IR support / code style. This
patch also does some variable renames including inst -> op, comment
updates, turns getCleanupLoopLowerBound into a local function.
Differential Revision: https://reviews.llvm.org/D77175
Summary:
This is to allow optimizations like loop invariant code motion to work
on the ParallelOp.
Additional small cleanup on the ForOp implementation of
LoopLikeInterface and the test file of loop-invariant-code-motion.
Differential Revision: https://reviews.llvm.org/D77128
This revision adds support for generating utilities for passes such as options/statistics/etc. that can be inferred from the tablegen definition. This removes additional boilerplate from the pass, and also makes it easier to remove the reliance on the pass registry to provide certain things(e.g. the pass argument).
Differential Revision: https://reviews.llvm.org/D76659
This removes the need to statically register conversion passes, and also puts all of the conversions within one centralized file.
Differential Revision: https://reviews.llvm.org/D76658
This generates a Passes.td for all of the dialects that have transformation passes. This removes the need for global registration for all of the dialect passes.
Differential Revision: https://reviews.llvm.org/D76657
This will greatly simplify a number of things related to passes:
* Enables generation of pass registration
* Enables generation of boiler plate pass utilities
* Enables generation of pass documentation
This revision focuses on adding the basic structure and adds support for generating the registration for passes in the Transforms/ directory. Future revisions will add more support and move more passes over.
Differential Revision: https://reviews.llvm.org/D76656
Summary:
The commit provides a single method to build affine maps with zero or more
results. Users of mlir::AffineMap previously had to dispatch between two methods
depending on the number of results.
At the same time, this commit fixes the method for building affine map with zero
results that was previously ignoring its `symbolCount` argument.
Differential Revision: https://reviews.llvm.org/D77126
Summary:
OpBuilder(Block) is specifically replaced with
OpBuilder::atBlockEnd(Block);
This is to make insertion behavior clear due to there being no one
correct answer for which location in a block the default insertion
point should be.
Differential Revision: https://reviews.llvm.org/D77060
Summary:
The RAW fusion happens only if the produecer block dominates the consumer block.
The WAW pattern also works with the precondition. I.e., if a producer can
dominate the consumer, they can fairly fuse together.
Since they are all tilable, we can think the pattern like this way:
Input:
```
linalg_op1 view
tile_loop
subview_2
linalg_op2 subview_2
```
Tile the first Linalg op as same as the second Linalg.
```
tile_loop
subview_1
linalg_op1 subview_1
tile_loop
subview_2
liangl_op2 subview_2
```
Since the first Linalg op is tilable in the same way and the computation are
independently, it's fair to fuse it with the second Linalg op.
```
tile_loop
subview_1
linalg_op1 subview_1
linalg_op2 subview_2
```
In short, this patch includes:
- Handling both RAW and WAW pattern.
- Adding a interface method to get input and output buffers.
- Exposing a method to get a StringRef of a dependency type.
- Fixing existing WAW tests and add one more use case: initialize the buffer
before conv op.
Differential Revision: https://reviews.llvm.org/D76897
Summary:
Performs an N-D pooling operation similarly to the description in the TF
documentation:
https://www.tensorflow.org/api_docs/python/tf/nn/pool
Different from the description, this operation doesn't perform on batch and
channel. It only takes tensors of rank `N`.
```
output[x[0], ..., x[N-1]] =
REDUCE_{z[0], ..., z[N-1]}
input[
x[0] * strides[0] - pad_before[0] + dilation_rate[0]*z[0],
...
x[N-1]*strides[N-1] - pad_before[N-1] + dilation_rate[N-1]*z[N-1]
],
```
The required optional arguments are:
- strides: an i64 array specifying the stride (i.e. step) for window
loops.
- dilations: an i64 array specifying the filter upsampling/input
downsampling rate
- padding: an i64 array of pairs (low, high) specifying the number of
elements to pad along a dimension.
If strides or dilations attributes are missing then the default value is
one for each of the input dimensions. Similarly, padding values are zero
for both low and high in each of the dimensions, if not specified.
Differential Revision: https://reviews.llvm.org/D76414