This impl class currently provides the following:
* auto definition of the 'ImplType = StorageClass'
* get/getChecked wrappers around TypeUniquer
* 'verifyConstructionInvariants' hook
- This hook verifies that the arguments passed into get/getChecked are valid
to construct a type instance with.
With this, all non-generic type uniquing has been moved out of MLIRContext.cpp
PiperOrigin-RevId: 227871108
symbols.
Included with this is some other infra:
- Testcases for other canonicalizations that I will implement next.
- Some helpers in AffineMap/Expr for doing simple walks without defining whole
visitor classes.
- A 'replaceDimsAndSymbols' facility that I'll be using to simplify maps and
exprs, e.g. to fold one constant into a mapping and to drop/renumber unused dims.
- Allow index (and everything else) to work in memref's, as we previously
discussed, to make the testcase easier to write.
- A "getAffineBinaryExpr" helper to produce a binop when you know the kind as
an enum.
This line of work will eventually subsume the ComposeAffineApply pass, but it is no where close to that yet :-)
PiperOrigin-RevId: 227852951
Use tablegen to generate definitions of the standard binary arithmetic
operations. These operations share a lot of boilerplate that is better off
generated by a tool.
Using tablegen for standard binary arithmetic operations requires the following
modifications.
1. Add a bit field `hasConstantFolder` to the base Op tablegen class; generate
the `constantFold` method signature if the bit is set. Differentiate between
single-result and zero/multi-result functions that use different signatures.
The implementation of the method remains in C++, similarly to canonicalization
patterns, since it may be large and non-trivial.
2. Define the `AnyType` record of class `Type` since `BinaryOp` currently
provided in op_base.td is supposed to operate on tensors and other tablegen
users may rely on this behavior.
Note that this drops the inline documentation on the operation classes that was
copy-pasted around anyway. Since we don't generate g3doc from tablegen yet,
keep LangRef.md as it is. Eventually, the user documentation can move to the
tablegen definition file as well.
PiperOrigin-RevId: 227820815
Even though it is unexpected except in pathological cases, a nullptr clone may
be returned. This CL handles the nullptr return gracefuly.
PiperOrigin-RevId: 227764615
The strict requirement (i.e. at least 2 HW vectors in a super-vector) was a
premature optimization to avoid interfering with other vector code potentially
introduced via other means.
This CL avoids this premature optimization and the spurious errors it causes
when super-vector size == HW vector size (which is a possible corner case).
This may be revisited in the future.
PiperOrigin-RevId: 227763966
This corner was found when stress testing with a functional end-to-end CPU
path. In the case where the hardware vector size is 1x...x1 the `keep` vector
is empty and would result a crash.
While there is no reason to expect a 1x...x1 HW vector in practice, this case
can just gracefully degrade to scalar, which is what this CL allows.
PiperOrigin-RevId: 227761097
Need to do some ifdef jumps with TableGen to avoid errors due to including the base multiple times. The way TableGen flags repeated includes is by way of checking the include directive this necessitates that the guards are on the includes as well as around the classes/defines.
PiperOrigin-RevId: 227692030
* Match using isa
- This limits the rewrite pattern to ops defined in op registry but that is probably better end state (esp. for additional verification).
PiperOrigin-RevId: 227598946
Add ops.td for TF dialect and start by adding tf.Add (op used in legalizer pattern). This CL does not add TensorOp trait (that's not part of OpTrait namespace).
Remove OpCode emission that is not currently used and can be added back later if needed.
PiperOrigin-RevId: 227590973
Dialect specific types are registered similarly to operations, i.e. registerType<...> within the dialect. Unlike operations, there is no notion of a "verbose" type, that is *all* types must be registered to a dialect. Casting support(isa/dyn_cast/etc.) is implemented by reserving a range of type kinds in the top level Type class as opposed to string comparison like operations.
To support derived types a few hooks need to be implemented:
In the concrete type class:
- static char typeID;
* A unique identifier for the type used during registration.
In the Dialect:
- typeParseHook and typePrintHook must be implemented to provide parser support.
The syntax for dialect extended types is as follows:
dialect-type: '!' dialect-namespace '<' '"' type-specific-data '"' '>'
The 'type-specific-data' is information used to identify different types within the dialect, e.g:
- !tf<"variant"> // Tensor Flow Variant Type
- !tf<"string"> // Tensor Flow String Type
TensorFlow/TensorFlowControl types are now implemented as dialect specific types as a proof
of concept.
PiperOrigin-RevId: 227580052
This change is mechanical and merges the LowerAffineApplyPass and
LowerIfAndForPass into a single LowerAffinePass. It makes a step towards
defining an "affine dialect" that would contain all polyhedral-related
constructs. The motivation for merging these two passes is based on retiring
MLFunctions and, eventually, transforming If and For statements into regular
operations. After that happens, LowerAffinePass becomes yet another
legalization.
PiperOrigin-RevId: 227566113
With the builder to construct the type on the Type, the appropriate mlir::Type can be constructed where needed. Also add a constant attr class that has the attribute and value as members.
PiperOrigin-RevId: 227564789
Existing implementation was created before ML/CFG unification refactoring and
did not concern itself with further lowering to separate concerns. As a
result, it emitted `affine_apply` instructions to implement `for` loop bounds
and `if` conditions and required a follow-up function pass to lower those
`affine_apply` to arithmetic primitives. In the unified function world,
LowerForAndIf is mostly a lowering pass with low complexity. As we move
towards a dialect for affine operations (including `for` and `if`), it makes
sense to lower `for` and `if` conditions directly to arithmetic primitives
instead of relying on `affine_apply`.
Expose `expandAffineExpr` function in LoweringUtils. Use this function
together with `expandAffineMaps` to emit primitives that implement loop and
branch conditions directly.
Also remove tests that become unnecessary after transforming LowerForAndIf into
a function pass.
PiperOrigin-RevId: 227563608
In LoweringUtils, extract out `expandAffineMap`. This function takes an affine
map and a list of values the map should be applied to and emits a sequence of
arithmetic instructions that implement the affine map. It is independent of
the AffineApplyOp and can be used in places where we need to insert an
evaluation of an affine map without relying on a (temporary) `affine_apply`
instruction. This prepares for a merge between LowerAffineApply and
LowerForAndIf passes.
Move the `expandAffineApply` function to the LowerAffineApply pass since it is
the only place that must be aware of the `affine_apply` instructions.
PiperOrigin-RevId: 227563439
The entire compiler now looks at structural properties of the function (e.g.
does it have one block, does it contain an if/for stmt, etc) so the only thing
holding up this difference is round tripping through the parser/printer syntax.
Removing this shrinks the compile by ~140LOC.
This is step 31/n towards merging instructions and statements. The last step
is updating the docs, which I will do as a separate patch in order to split it
from this mostly mechanical patch.
PiperOrigin-RevId: 227540453
Moving forward dialect namespaces cannot contain '.' characters.
This cl also standardizes that operation names must begin with the dialect namespace followed by a '.'.
PiperOrigin-RevId: 227532193
This commit adds support for the "select" operation that lowers directly into
its LLVM IR counterpart. A simple test is included.
PiperOrigin-RevId: 227527893
runOnCFG/MLFunction override locations. Passes that care can handle this
filtering if they choose. Also, eliminate one needless difference between
CFG/ML functions in the parser.
This is step 30/n towards merging instructions and statements.
PiperOrigin-RevId: 227515912
This CL introduces a simple set of Embedded Domain-Specific Components (EDSCs)
in MLIR components:
1. a `Type` system of shell classes that closely matches the MLIR type system. These
types are subdivided into `Bindable` leaf expressions and non-bindable `Expr`
expressions;
2. an `MLIREmitter` class whose purpose is to:
a. maintain a map of `Bindable` leaf expressions to concrete SSAValue*;
b. provide helper functionality to specify bindings of `Bindable` classes to
SSAValue* while verifying comformable types;
c. traverse the `Expr` and emit the MLIR.
This is used on a concrete example to implement MemRef load/store with clipping in the
LowerVectorTransfer pass. More specifically, the following pseudo-C++ code:
```c++
MLFuncBuilder *b = ...;
Location location = ...;
Bindable zero, one, expr, size;
// EDSL expression
auto access = select(expr < zero, zero, select(expr < size, expr, size - one));
auto ssaValue = MLIREmitter(b)
.bind(zero, ...)
.bind(one, ...)
.bind(expr, ...)
.bind(size, ...)
.emit(location, access);
```
is used to emit all the MLIR for a clipped MemRef access.
This simple EDSL can easily be extended to more powerful patterns and should
serve as the counterpart to pattern matchers (and could potentially be unified
once we get enough experience).
In the future, most of this code should be TableGen'd but for now it has
concrete valuable uses: make MLIR programmable in a declarative fashion.
This CL also adds Stmt, proper supporting free functions and rewrites
VectorTransferLowering fully using EDSCs.
The code for creating the EDSCs emitting a VectorTransferReadOp as loops
with clipped loads is:
```c++
Stmt block = Block({
tmpAlloc = alloc(tmpMemRefType),
vectorView = vector_type_cast(tmpAlloc, vectorMemRefType),
ForNest(ivs, lbs, ubs, steps, {
scalarValue = load(scalarMemRef, accessInfo.clippedScalarAccessExprs),
store(scalarValue, tmpAlloc, accessInfo.tmpAccessExprs),
}),
vectorValue = load(vectorView, zero),
tmpDealloc = dealloc(tmpAlloc.getLHS())});
emitter.emitStmt(block);
```
where `accessInfo.clippedScalarAccessExprs)` is created with:
```c++
select(i + ii < zero, zero, select(i + ii < N, i + ii, N - one));
```
The generated MLIR resembles:
```mlir
%1 = dim %0, 0 : memref<?x?x?x?xf32>
%2 = dim %0, 1 : memref<?x?x?x?xf32>
%3 = dim %0, 2 : memref<?x?x?x?xf32>
%4 = dim %0, 3 : memref<?x?x?x?xf32>
%5 = alloc() : memref<5x4x3xf32>
%6 = vector_type_cast %5 : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>>
for %i4 = 0 to 3 {
for %i5 = 0 to 4 {
for %i6 = 0 to 5 {
%7 = affine_apply #map0(%i0, %i4)
%8 = cmpi "slt", %7, %c0 : index
%9 = affine_apply #map0(%i0, %i4)
%10 = cmpi "slt", %9, %1 : index
%11 = affine_apply #map0(%i0, %i4)
%12 = affine_apply #map1(%1, %c1)
%13 = select %10, %11, %12 : index
%14 = select %8, %c0, %13 : index
%15 = affine_apply #map0(%i3, %i6)
%16 = cmpi "slt", %15, %c0 : index
%17 = affine_apply #map0(%i3, %i6)
%18 = cmpi "slt", %17, %4 : index
%19 = affine_apply #map0(%i3, %i6)
%20 = affine_apply #map1(%4, %c1)
%21 = select %18, %19, %20 : index
%22 = select %16, %c0, %21 : index
%23 = load %0[%14, %i1, %i2, %22] : memref<?x?x?x?xf32>
store %23, %5[%i6, %i5, %i4] : memref<5x4x3xf32>
}
}
}
%24 = load %6[%c0] : memref<1xvector<5x4x3xf32>>
dealloc %5 : memref<5x4x3xf32>
```
In particular notice that only 3 out of the 4-d accesses are clipped: this
corresponds indeed to the number of dimensions in the super-vector.
This CL also addresses the cleanups resulting from the review of the prevous
CL and performs some refactoring to simplify the abstraction.
PiperOrigin-RevId: 227367414
on this to merge together the classes, but there may be other simplification
possible. I'll leave that to riverriddle@ as future work.
This is step 29/n towards merging instructions and statements.
PiperOrigin-RevId: 227328680
simplifying them in minor ways. The only significant cleanup here
is the constant folding pass. All the other changes are simple and easy,
but this is still enough to shrink the compiler by 45LOC.
The one pass left to merge is the CSE pass, which will be move involved, so I'm
splitting it out to its own patch (which I'll tackle right after this).
This is step 28/n towards merging instructions and statements.
PiperOrigin-RevId: 227328115
Remove an unnecessary restriction in forward substitution. Slightly
simplify LLVM IR lowering, which previously would crash if given an ML
function, it should now produce a clean error if given a function with an
if/for instruction in it, just like it does any other unsupported op.
This is step 27/n towards merging instructions and statements.
PiperOrigin-RevId: 227324542
representation, shrinking by 70LOC. The PatternRewriter class can probably
also be simplified as well, but one step at a time.
This is step 26/n towards merging instructions and statements. NFC.
PiperOrigin-RevId: 227324218
- drop these ununsed/incomplete sketches given the new design
@albertcohen is working on, and given that FlatAffineConstraints is now
stable and fast enough for all the analyses/transforms that depend on it.
PiperOrigin-RevId: 227322739
- introduce PostDominanceInfo in the right/complete way and use that for post
dominance check in store-load forwarding
- replace all uses of Analysis/Utils::dominates/properlyDominates with
DominanceInfo::dominates/properlyDominates
- drop all redundant copies of dominance methods in Analysis/Utils/
- in pipeline-data-transfer, replace dominates call with a much less expensive
check; similarly, substitute dominates() in checkMemRefAccessDependence with
a simpler check suitable for that context
- fix a bug in properlyDominates
- improve doc for 'for' instruction 'body'
PiperOrigin-RevId: 227320507
- dominates() for blocks was assuming that there was only a single block at the
top level whenever there was a hierarchy of blocks (as in the case of 'for'/'if'
instructions).
- fix the comments as well
PiperOrigin-RevId: 227319738
function pass, and eliminating the need to copy over code and do
interprocedural updates. While here, also improve it to make fewer empty
blocks, and rename it to "LowerIfAndFor" since that is what it does. This is
a net reduction of ~170 lines of code.
As drive-bys, change the splitBlock method to *not* insert an unconditional
branch, since that behavior is annoying for all clients. Also improve the
AsmPrinter to not crash when a block is referenced that isn't linked into a
function.
PiperOrigin-RevId: 227308856
PrintOpStatsPass is maintaining state (op stats ) across functions and doing
per-module work - it should be a module pass.
PiperOrigin-RevId: 227294151
- the load/store forwarding relies on memref dependence routines as well as
SSA/dominance to identify the memref store instance uniquely supplying a value
to a memref load, and replaces the result of that load with the value being
stored. The memref is also deleted when possible if only stores remain.
- add methods for post dominance for MLFunction blocks.
- remove duplicated getLoopDepth/getNestingDepth - move getNestingDepth,
getMemRefAccess, getNumCommonSurroundingLoops into Analysis/Utils (were
earlier static)
- add a helper method in FlatAffineConstraints - isRangeOneToOne.
PiperOrigin-RevId: 227252907
better order.
- update isEmpty() to eliminate IDs in a better order. Speed improvement for
complex cases (for eg. high-d reshape's involving mod's/div's).
- minor efficiency update to projectOut (was earlier making an extra albeit
benign call to gaussianEliminateIds) (NFC).
- move getBestIdToEliminate further up in the file (NFC).
- add the failing test case.
- add debug info to checkMemRefAccessDependence.
PiperOrigin-RevId: 227244634
Function::walk functionality into f->walkInsts/Ops which allows visiting all
instructions, not just ops. Eliminate Function::getBody() and
Function::getReturn() helpers which crash in CFG functions, and were only kept
around as a bridge.
This is step 25/n towards merging instructions and statements.
PiperOrigin-RevId: 227243966
printing the entry block in a CFG function's argument line. Since I'm touching
all of the testcases anyway, change the argument list from printing as
"%arg : type" to "%arg: type" which is more consistent with bb arguments.
In addition to being more consistent, this is a much nicer look for cfg functions.
PiperOrigin-RevId: 227240069
have a designator. This improves diagnostics and merges handling between CFG
and ML functions more. This also eliminates hard coded parser knowledge of
terminator keywords, allowing dialects to define their own terminators.
PiperOrigin-RevId: 227239398
requires enhancing DominanceInfo to handle the structure of an ML function,
which is required anyway. Along the way, this also fixes a const correctness
problem with Instruction::getBlock().
This is step 24/n towards merging instructions and statements.
PiperOrigin-RevId: 227228900
the function signature, giving them common functionality to ml functions. This
is a strictly additive patch that adds new capability without changing behavior
in a significant way (other than a few diagnostic cleanups). A subsequent
patch will change the printer to use this behavior, which will require updating
a ton of testcases. :)
This exposes the fact that we need to make a grammar change for block
arguments, as is tracked by b/122119779
This is step 23/n towards merging instructions and statements, and one of the
first steps towards eliminating the "cfg vs ml" distinction at a syntax and
semantic level.
PiperOrigin-RevId: 227228342
* Allow multi input node patterns in the rewrite;
* Use number of nodes matched as benefit;
* Rewrite relu(add(...)) matching using the new pattern;
To allow for undefined ops, do string compare - will address soon!
PiperOrigin-RevId: 227225425
by ~80 lines. This causes a slight change to diagnostics, but
is otherwise behavior preserving.
This is step 22/n towards merging instructions and statements, MFC.
PiperOrigin-RevId: 227187857
consistent and moving the using declarations over. Hopefully this is the last
truly massive patch in this refactoring.
This is step 21/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227178245
- extend/complete dependence tester to utilize local var info while adding
access function equality constraints; one more step closer to get slicing
based fusion working in the general case of affine_apply's involving mod's/div's.
- update test case to reflect more accurate dependence information; remove
inaccurate comment on test case mod_deps.
- fix a minor "bug" in equality addition in addMemRefAccessConstraints (doesn't
affect correctness, but the fixed version is more intuitive).
- some more surrounding code clean up
- move simplifyAffineExpr out of anonymous AffineExprFlattener class - the
latter has state, and the former should reside outside.
PiperOrigin-RevId: 227175600
The last major renaming is Statement -> Instruction, which is why Statement and
Stmt still appears in various places.
This is step 19/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227163082