A dialect can opt-in to handle versioning through the
`BytecodeDialectInterface`. Few hooks are exposed to the dialect to allow
managing a version encoded into the bytecode file. The version is loaded
lazily and allows to retrieve the version information while parsing the input
IR, and gives an opportunity to each dialect for which a version is present
to perform IR upgrades post-parsing through the `upgradeFromVersion` method.
Custom Attribute and Type encodings can also be upgraded according to the
dialect version using readAttribute and readType methods.
There is no restriction on what kind of information a dialect is allowed to
encode to model its versioning. Currently, versioning is supported only for
bytecode formats.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D143647
`parseAttribute` and `parseType` require null-terminated strings as
input, but this isn't great considering the argument type is
`StringRef`. This changes them to copy to a null-terminated buffer by
default, with a `isKnownNullTerminated` flag added to disable the
copying.
closes#58964
Reviewed By: rriddle, kuhar, lattner
Differential Revision: https://reviews.llvm.org/D145182
Currently these functions report errors directly to stderr, this updates
them to use diagnostics instead. This also makes partially-consumed
strings an error if the `numRead` parameter isn't provided (the
docstrings already claimed this happened, but it didn't.)
While here I also tried to reduce the number of overloads by switching
to using default parameters.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D144804
This patch drops the ZeroBehavior parameter from bit counting
functions like countLeadingZeros. ZeroBehavior specifies the behavior
when the input to count{Leading,Trailing}Zeros is zero and when the
input to count{Leading,Trailing}Ones is all ones.
ZeroBehavior was first introduced on May 24, 2013 in commit
eb91eac9fb866ab1243366d2e238b9961895612d. While that patch did not
state the intention, I would guess ZeroBehavior was for performance
reasons. The x86 machines around that time required a conditional
branch to implement countLeadingZero<uint32_t> that returns the 32 on
zero:
test edi, edi
je .LBB0_2
bsr eax, edi
xor eax, 31
.LBB1_2:
mov eax, 32
That is, we can remove the conditional branch if we don't care about
the behavior on zero.
IIUC, Intel's Haswell architecture, launched on June 4, 2013,
introduced several bit manipulation instructions, including lzcnt and
tzcnt, which eliminated the need for the conditional branch.
I think it's time to retire ZeroBehavior as its utility is very
limited. If you care about compilation speed, you should build LLVM
with an appropriate -march= to take advantage of lzcnt and tzcnt.
Even if not, modern host compilers should be able to optimize away
quite a few conditional branches because the input is often known to
be nonzero from dominating conditional branches.
Differential Revision: https://reviews.llvm.org/D141798
The bytecode reader currently has no mechanism that allows for directly referencing
data from the input buffer safely. This commit adds shared_ptr<SourceMgr> overloads
that provide an explicit and safe way of extending the lifetime of the input. The usage of
these new overloads is adopted in all of our tooling, and is implicitly used in the filename
only parser methods.
Differential Revision: https://reviews.llvm.org/D139366
This addresses the TODO in the code previously and checks that the address of `dataIt` is properly aligned to the requested alignment.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D137855
The current splicing behavior dates back to when all blocks had terminators,
so we would "helpfully" splice before the terminator. This doesn't make sense
anymore, and leads to somewhat unexpected results when parsing multiple
pieces of IR into the same block.
Differential Revision: https://reviews.llvm.org/D135096
This is very useful when you want to parse IR even if
its invalid (e.g. bytecode). It's also useful if you don't
want to pay the cost of verification in certain situations.
Differential Revision: https://reviews.llvm.org/D134847
This adds bytecode support for DenseArrayAttr, DenseIntOrFpElementsAttr,
DenseStringElementsAttr, and SparseElementsAttr.
Differential Revision: https://reviews.llvm.org/D133744
Resources are encoded in two separate sections similarly to
attributes/types, one for the actual data and one for the data
offsets. Unlike other sections, the resource sections are optional
given that in many cases they won't be present. For testing,
bytecode serialization is added for DenseResourceElementsAttr.
Differential Revision: https://reviews.llvm.org/D132729
Dialects can opt-in to providing custom encodings by implementing the
`BytecodeDialectInterface`. This interface provides hooks, namely
`readAttribute`/`readType` and `writeAttribute`/`writeType`, that will be used
by the bytecode reader and writer. These hooks are provided a reader and writer
implementation that can be used to encode various constructs in the underlying
bytecode format. A unique feature of this interface is that dialects may choose
to only encode a subset of their attributes and types in a custom bytecode
format, which can simplify adding new or experimental components that aren't
fully baked.
Differential Revision: https://reviews.llvm.org/D132498
This moves some parsing functionality from BytecodeReader to
AttrTypeReader, and removes some duplication between the attribute/type
code paths.
Differential Revision: https://reviews.llvm.org/D132497
This extracts the string section writer and reader into dedicated
classes, which better separates the logic and will also simplify future
patches that want to interact with the string section.
Differential Revision: https://reviews.llvm.org/D132496
This commit adds a new bytecode serialization format for MLIR.
The actual serialization of MLIR to binary is relatively straightforward,
given the very very general structure of MLIR. The underlying basis for
this format is a variable-length encoding for integers, which gets heavily
used for nearly all aspects of the encoding (given that most of the encoding
is just indexing into lists).
The format currently does not provide support for custom attribute/type
serialization, and thus always uses an assembly format fallback. It also
doesn't provide support for resources. These will be added in followups,
the intention for this patch is to provide something that supports the
basic cases, and can be built on top of.
https://discourse.llvm.org/t/rfc-a-binary-serialization-format-for-mlir/63518
Differential Revision: https://reviews.llvm.org/D131747