Actually there's one functional change here, which is that users can
no longer depend on <random> to include all of C++20 <concepts>. That
inclusion is so new that we believe nobody should be depending on it
yet, even in the presence of Hyrum's Law. We keep the includes of <vector>,
<algorithm>, etc., so as not to break pre-C++20 Hyrum's Law users.
Differential Revision: https://reviews.llvm.org/D114281
This legacy option (available in other Fortran compilers with various
spellings) implies the SAVE attribute for local variables on subprograms
that are not explicitly RECURSIVE. The SAVE attribute essentially implies
static rather than stack storage. This was the default setting in Fortran
until surprisingly recently, so explicit SAVE statements & attributes
could be and often were omitted from older codes. Note that initialized
objects already have an implied SAVE attribute, and objects in COMMON
effectively do too, as data overlays are extinct; and since objects that are
expected to survive from one invocation of a procedure to the next in static
storage should probably be explicit initialized in the first place, so the
use cases for this option are somewhat rare, and all of them could be
handled with explicit SAVE statements or attributes.
This implicit SAVE attribute must not apply to automatic (in the Fortran sense)
local objects, whose sizes cannot be known at compilation time. To get the
semantics of IsSaved() right, the IsAutomatic() predicate was moved into
Evaluate/tools.cpp to allow for dynamic linking of the compiler. The
redundant predicate IsAutomatic() was noticed, removed, and its uses replaced.
GNU Fortran's spelling of the option (-fno-automatic) was added to
the clang-based driver and used for basic sanity testing.
Differential Revision: https://reviews.llvm.org/D114209
We cannot unconditionally generate memref.load ops for such cases;
need to check the source's type.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114376
MLIR supports recursive types but they could not be handled by the conversion
infrastructure directly as it would result in infinite recursion in
`convertType` for elemental types. Support this case by keeping the "call
stack" of nested type conversions in the TypeConverter class and by passing it
as an optional argument to the individual conversion callback. The callback can
then check if a specific type is present on the stack more than once to detect
and handle the recursive case.
This approach is preferred to the alternative approach of having a separate
callback dedicated to handling only the recursive case as the latter was
observed to introduce ~3% time overhead on a 50MB IR file even if it did not
contain recursive types.
This approach is also preferred to keeping a local stack in type converters
that need to handle recursive types as that would compose poorly in case of
out-of-tree or cross-project extensions.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D113579
This is a near-universal language extension; external unit 0
is preconnected to the standard error output.
Differential Revision: https://reviews.llvm.org/D114298
In the test suite, we generally don't use printf or other reporting
utilities. It's not that it wouldn't be useful, it's just that some
platforms don't support IO.
Instead, we try to keep test cases small and self-contained so that
we can reasonably easily reproduce failures locally and debug them.
This patch removes printf in some of the last places in the test suite
that used it. The only remaining places are in a deque test and in the
filesystem tests. The filesystem tests are arguably fine to keep using
IO, since we're testing <filesystem>. The deque test will be handled
separately.
Differential Revision: https://reviews.llvm.org/D114282
This basically reverts 1778831a3d, which split them.
Since they were split 9 years ago, EmitGCCInlineAsmStr() grew a bunch of
features that usually weren't added to EmitMSInlineAsmStr(), and
that was usually a mistake. D71677, D113932, D114167 are all examples
of where things were backported to EmitMSInlineAsmStr().
The names were also not great. EmitMSInlineAsmStr() used to be called for `asm
inteldialect`, which clang produces for Microsoft-style __asm { ... } blocks as
well for GCC-style __asm__ / asm statements with -masm=intel. On the other hand,
EmitGCCInlineAsmStr() used to be called for `asm`, whic clang produces for
GCC-style __asm__ / asm statements with -masm=att (the default).
It's also less code (23 insertions, 188 deletions).
No behavior change.
Differential Revision: https://reviews.llvm.org/D114330
This makes a line in llvm/test/CodeGen/X86/asm-block-labels.ll pass
with `asm inteldialect` too.
I don't know if this is something one can hit in practice with inline
asm. The test is from 2007 (4646aa3e33) but in 2009 blockaddr was
introduced and e.g. `__asm__ __volatile__("brl %0" :: "X"(&&foo) : "memory");`
compiles to
call void asm sideeffect "brl $0", "X,..."(i8* blockaddress(@func, %1))
nowadays (thanks to jrtc27 for that example!).
(6c4d255bf3 switched clang to blockaddress on an opt-in basis,
e4801f7844 added docs for it, 31b132c0b7 added IR support.)
I half-heartedly tried to build clang 2.8 locally, but it didn't
just build. And 2.8 didn't have a prebuilt clang binary yet.
The motivation is to make EmitGCCInlineAsmStr() and EmitMSInlineAsmStr()
more alike, and maybe we should delete this code form EmitGCCInlineAsmStr()
instead. But since it's just 3 lines and it's reachable from LLVM IR,
let's do the safer thing for now.
Differential Revision: https://reviews.llvm.org/D114329
This patch has been tested in D70631, but it should be reviewed
separately.
Reviewed By: ldionne, #libc
Differential Revision: https://reviews.llvm.org/D114248
Make the SimpleSValBuilder capable to simplify existing IntSym
expressions based on a newly added constraint on the sub-expression.
Differential Revision: https://reviews.llvm.org/D113754
Most changes are rewording comments but there are some assertions that I rephrased.
Reviewed By: kparzysz
Differential Revision: https://reviews.llvm.org/D114132
Currently, we restore the return address register as the last restoring
instruction in the epilog. The next instruction is `ret` usually. It is
a use of return address register. In some microarchitectures, there is
load-to-use data hazard. To avoid the load-to-use data hazard, we could
separate the load instruction from its use as far as possible. In this
patch, we reverse the order of restoring callee-saved registers to
increase the distance of `load ra` and `ret` in the epilog.
Differential Revision: https://reviews.llvm.org/D113967
This change switches tsan to the new runtime which features:
- 2x smaller shadow memory (2x of app memory)
- faster fully vectorized race detection
- small fixed-size vector clocks (512b)
- fast vectorized vector clock operations
- unlimited number of alive threads/goroutimes
Depends on D112602.
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D112603
All runtime callbacks must be non-instrumented with the new tsan runtime
(it's now more picky with respect to recursion into runtime).
Disable instrumentation in Darwin tests as we do in all other tests now.
Differential Revision: https://reviews.llvm.org/D114348
This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:
* Addrec construction directly wrote to ValueExprMap in a few places,
without updating ExprValueMap. Add a helper to ensures they stay
consistent. The adjustment in forgetSymbolicName() explicitly
drops the old value from the map, so that we don't rely on it
being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
ExprValueMap, but not dropping the corresponding entries from
ValueExprMap.
Differential Revision: https://reviews.llvm.org/D113349
Using an lldb_private object in the bindings involves three steps
- wrapping the object in it's lldb::SB variant
- using swig to convert/wrap that to a PyObject
- wrapping *that* in a lldb_private::python::PythonObject
Our SBTypeToSWIGWrapper was only handling the middle part. This doesn't
just result in increased boilerplate in the callers, but is also a
functionality problem, as it's very hard to get the lifetime of of all
of these objects right. Most of the callers are creating the SB object
(step 1) on the stack, which means that we end up with dangling python
objects after the function terminates. Most of the time this isn't a
problem, because the python code does not need to persist the objects.
However, there are legitimate cases where they can do it (and even if
the use case is not completely legitimate, crashing is not the best
response to that).
For this reason, some of our code creates the SB object on the heap, but
it has another problem -- it never gets cleaned up.
This patch begins to add a new function (ToSWIGWrapper), which does all
of the three steps, while properly taking care of ownership. In the
first step, I have converted most of the leaky code (except for
SBStructuredData, which needs a bit more work).
Differential Revision: https://reviews.llvm.org/D114259
This is a preparatory commit to enable mocking of qemu startup. That
will involve running the mock server in a separate process, so there's
no need for multithreading.
Initialization is moved from the start function into the constructor
(which can then take an actual socket instead of a class), and the run
method is made public.
Depends on D114156.
Differential Revision: https://reviews.llvm.org/D114157
After padding, we introduce a ExtractSliceOp to get the final unpadded result. This revision uses getAsOpFoldResult to compute the size of the unpadded result, which guarantees the result type has a partially static shape if some of the sizes of the unpadded result are statically known. At the moment, we rely on canonicalization to cleanup the types after padding.
Depends On D114085
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114153
Adapt tiling to always generate an extract/insert slice pair for output tensors even if the tensor is not tiled. Having an explicit extract/insert slice pair simplifies followup transformations such as padding and bufferization. In particular, it makes read and written iteration argument slices explicit.
Depends On D114067
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114085
The purpose of the change is to make clear whether the user is
retrieving the original function or the wrapper function, in line with
the invoke commands. This new functionality is useful for users that
already have defined their own packed interface, so they do not want the
extra layer of indirection, or for users wanting to the look at the
resulting primary function rather than the wrapper function.
All locations, except the python bindings now have a `lookupPacked`
method that matches the original `lookup` functionality. `lookup`
still exists, but with new semantics.
- `lookup` returns the function with a given name. If `bool f(int,int)`
is compiled, `lookup` will return a reference to `bool(*f)(int,int)`.
- `lookupPacked` returns the packed wrapper of the function with the
given name. If `bool f(int,int)` is compiled, `lookupPacked` will return
`void(*mlir_f)(void**)`.
Differential Revision: https://reviews.llvm.org/D114352
Remove the tile and fuse test pass that has been replaced by codegen strategy.
Depends On D114067
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114068
Update the reference publication for the SyncDependenceAnalysis and Divergence Analysis. Fix phrasing, formatting. Add comments on reducible loop limitation.
Reviewed By: sameerds
Differential Revision: https://reviews.llvm.org/D114146
Apparently my methodology was suboptimal, and not only did miss all the +VL tuples,
i also missed some plain tuples. I believe, this adds everything missing.
Indeed, these manual costmodels are just not okay long-term.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D114334
Much like the VPMOVM2[BW] / VPMOV[BW]2M from AVX512BW,
these either sign-extent the mask register into a vector,
or pack the mask from vector register.
Apparently, we didn't even have MCA tests for these,
added in rG2f364f6f0d3a2420ca78cbd80abb186657180e05,
so i'm just guessing that their perf characteristics
are optimal.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D114314
Add a pattern to apply the new tile and fuse on tensors method. Integrate the pattern into the CodegenStrategy and use the CodegenStrategy to implement the tests.
Depends On D114012
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114067
This patch fixes PR52111. The problem is that LV propagates poison-generating flags (`nuw`/`nsw`, `exact`
and `inbounds`) in instructions that contribute to the address computation of widen loads/stores that are
guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop
is linearized, these flags may lead to generating a poison value that is effectively used as the base address
of the widen load/store. The fix drops all the integer poison-generating flags from instructions that
contribute to the address computation of a widen load/store whose original instruction was in a basic block
that needed predication and is not predicated after vectorization.
Reviewed By: fhahn, spatel, nlopes
Differential Revision: https://reviews.llvm.org/D111846
Tile and fuse failed if the outermost tile loop is a reduction dimension. Add the necessary check to handle outermost reductions and introduce a test case to verify the change.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D114012
This adds and uses look-up tables for non-loop branch probabilities, which have
have probabilities directly encoded into the tables for the different condition
codes. Compared to having this logic inlined in different functions, as it used
to be the case, I think this is compacter and thus also easier to check/cross
reference. This also adds a test for pointer heuristics that was missing.
Differential Revision: https://reviews.llvm.org/D114009
1. IndexTokenSource::getNextToken cannot return nullptr; some code was
still written assuming it can; make getNextToken more resilient against
incorrect input and fix its call-sites.
2. Change various asserts that can happen due to user provided input to
conditionals in the code.
This teaches AArch64TargetLowering::shouldSinkOperands to sink splat
shuffles to certain neon intrinsics, so that they can make use of the
lane variants of the instructions that are available.
Differential Revision: https://reviews.llvm.org/D112994