Otherwise Visual Studio 2017 will complain about
llvm::StringRef::strlen not being constexpr:
StringRef.h(80): error C3615: constexpr function 'llvm::StringRef::strLen' cannot result in a constant expression
StringRef.h(84): note: failure was caused by call of undefined function or one not declared 'constexpr'
These instructions can set the exception in FPSW. But I
don't think they can change FPCW. So this looks like a typo.
Differential Revision: https://reviews.llvm.org/D73864
Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.
Also this patch modifies clang to use the new directives when `-fopenmp-enable-irbuilder` commandline option is passed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D72304
bo (splat X), (bo Y, OtherOp) --> bo (splat (bo X, Y)), OtherOp
This patch depends on the splat analysis enhancement in D73549.
See the test with comment:
; Negative test - mismatched splat elements
...as the motivation for that first patch.
The motivating case for reassociating splatted ops is shown in PR42174:
https://bugs.llvm.org/show_bug.cgi?id=42174
In that example, a slight change in order-of-associative math results
in a big difference in IR and codegen. This patch gets all of the
unnecessary shuffles out of the way, but doesn't address the potential
scalarization (see D50992 or D73480 for that).
Differential Revision: https://reviews.llvm.org/D73703
These can be lowered to code sequences using CMPFP and CMPFPE which then get
selected to VCMP and VCMPE. The implementation isn't fully correct, as the chain
operand isn't handled correctly, but resolving that looks like it would involve
changes around FPSCR-handling instructions and how the FPSCR is modelled.
The fp-intrinsics test was already testing some of this but as the entire test
was being XFAILed it wasn't noticed. Un-XFAIL the test and instead leave the
cases where we aren't generating the right instruction sequences as FIXME.
Differential Revision: https://reviews.llvm.org/D73194
Previously, if a debug line Prologue was created via
createBasicPrologue, its TotalLength field did not account for any
contents in the table itself. This change fixes this issue.
Reviewed by: probinson
Differential Revision: https://reviews.llvm.org/D73772
Summary:
In big-endian MVE, the simple vector load/store instructions (i.e.
both contiguous and non-widening) don't all store the bytes of a
register to memory in the same order: it matters whether you did a
VSTRB.8, VSTRH.16 or VSTRW.32. Put another way, the in-register
formats of different vector types relate to each other in a different
way from the in-memory formats.
So, if you want to 'bitcast' or 'reinterpret' one vector type as
another, you have to carefully specify which you mean: did you want to
reinterpret the //register// format of one type as that of the other,
or the //memory// format?
The ACLE `vreinterpretq` intrinsics are specified to reinterpret the
register format. But I had implemented them as LLVM IR bitcast, which
is specified for all types as a reinterpretation of the memory format.
So a `vreinterpretq` intrinsic, applied to values already in registers,
would code-generate incorrectly if compiled big-endian: instead of
emitting no code, it would emit a `vrev`.
To fix this, I've introduced a new IR intrinsic to perform a
register-format reinterpretation: `@llvm.arm.mve.vreinterpretq`. It's
implemented by a trivial isel pattern that expects the input in an
MQPR register, and just returns it unchanged.
In the clang codegen, I only emit this new intrinsic where it's
actually needed: I prefer a bitcast wherever it will have the right
effect, because LLVM understands bitcasts better. So we still generate
bitcasts in little-endian mode, and even in big-endian when you're
casting between two vector types with the same lane size.
For testing, I've moved all the codegen tests of vreinterpretq out
into their own file, so that they can have a different set of RUN
lines to check both big- and little-endian.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73786
Summary:
These instructions generate a vector of consecutive elements starting
from a given base value and incrementing by 1, 2, 4 or 8. The `wdup`
versions also wrap the values back to zero when they reach a given
limit value. The instruction updates the scalar base register so that
another use of the same instruction will continue the sequence from
where the previous one left off.
At the IR level, I've represented these instructions as a family of
target-specific intrinsics with two return values (the constructed
vector and the updated base). The user-facing ACLE API provides a set
of intrinsics that throw away the written-back base and another set
that receive it as a pointer so they can update it, plus the usual
predicated versions.
Because the intrinsics return two values (as do the underlying
instructions), the isel has to be done in C++.
This is the first family of MVE intrinsics that use the `imm_1248`
immediate type in the clang Tablegen framework, so naturally, I found
I'd given it the wrong C integer type. Also added some tests of the
check that the immediate has a legal value, because this is the first
time those particular checks have been exercised.
Finally, I also had to fix a bug in MveEmitter which failed an
assertion when I nested two `seq` nodes (the inner one used to extract
the two values from the pair returned by the IR intrinsic, and the
outer one put on by the predication multiclass).
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73357
Summary:
The unpredicated case of this is trivial: the clang codegen just makes
a vector splat of the input, and LLVM isel is already prepared to
handle that. For the predicated version, I've generated a `select`
between the same vector splat and the `inactive` input parameter, and
added new Tablegen isel rules to match that pattern into a predicated
`MVE_VDUP` instruction.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73356
Disable the red zone in the unit test allocator to fix the test errors in sanitizer builds.
The red zone changed the amount of allocated bytes which made the test fail as it
checked the number of allocated bytes of the allocator.
Summary:
It turns out that CUR_DIRECTION is just an internal placeholder, not an actual
valid encoded value.
Reviewers: gchatelet
Subscribers: tschuett, mstojanovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73343
This reverts commit b848b510a8d52dbf50ee53a9a1ce844abb60d9bd as the unit tests
fail on the sanitizer bots:
/b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/unittests/Support/AllocatorTest.cpp:145: Failure
Expected: SlabSize
Which is: 4096
To be equal to: Alloc.getTotalMemory()
Which is: 4097
Summary:
A Copy with a source that is zeros is the same as a Set of zeros.
This fixes the invariant that SrcAlign should always be non-null.
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73791
Summary: There are no counters for individual ports, but this is already
enough to find a lot of issues in the current model (upcoming patch).
Reviewers: dblaikie, gchatelet
Subscribers: hiraditya, tschuett, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72032
Summary:
D68092 introduced a new SIRemoveShortExecBranches optimization pass and
broke some graphics shaders. The problem is that it was removing
branches over KILL pseudo instructions, and the fix is to explicitly
check for that in mustRetainExeczBranch.
Reviewers: critson, arsenm, nhaehnle, cdevadas, hakzsam
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73771
Duplicating instructions can lead to code size increases but using
a threshold of 3 is good for reducing code size.
Differential Revision: https://reviews.llvm.org/D72916
Summary:
In D68549 we noticed that our BumpPtrAllocator we use for LLDB's ConstString implementation is growing its slabs at
a rate that is too slow for our use case. It causes that we spend a lot of time calling `malloc` for all the tiny slabs that our
ConstString BumpPtrAllocators create. We also can't just increase the slab size in the ConstString implementation
(which is what D68549 originally did) as this really increased the amount of (mostly unused) allocated memory
in any process using ConstString.
This patch adds a template argument for the BumpPtrAllocatorImpl that allows specifying a faster rate at which the
BumpPtrAllocator increases the slab size. This allows LLDB to specify a faster rate at which the slabs grow which
should keep both memory consumption and time spent calling malloc low.
Reviewers: george.karpenkov, chandlerc, NoQ
Subscribers: NoQ, llvm-commits, llunak
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71654
There seems to be another instance of non-determinism which causes the
number of iterations to be either 1 or 3 for one benchmark, depending
on the system. This needs to be investigated and resolved. In the
meantime we do not verify the number of iterations for this benchmark.
If all call sites are in `norecurse` functions we can derive `norecurse`
as the ReversePostOrderFunctionAttrsPass does. This should make
ReversePostOrderFunctionAttrsLegacyPass obsolete once the Attributor is
enabled.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D72017
If we know that all call sites have been processed we can derive an
early fixpoint. The use in this patch is likely not to trigger right now
but a follow up patch will make use of it.
Reviewed By: uenoku, baziotis
Differential Revision: https://reviews.llvm.org/D72016
The method was initially added for DWARFVerifier::verifyUnitHeader() but
its results were never actually used.
Differential Revision: https://reviews.llvm.org/D73773
We only need to call this on floating point comparisons. In this
case these are known to be integer compares. One of them even
has a SUB opcode instead of CMP.
With this patch new trivial edges can be added to an SCC in a CGSCC
pass via the updateCGAndAnalysisManagerForCGSCCPass method. It shares
almost all the code with the existing
updateCGAndAnalysisManagerForFunctionPass method but it implements the
first step towards the TODOs.
This was initially part of D70927.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D72025
If we had `noalias` on an argument the inliner created alias scope
metadata already. However, the call site `noalias` annotation was not
considered. Since the Attributor can derive such call site `noalias`
annotation we should treat them the same as argument annotations.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D73528
This is the first of multiple parts to make OpenMP context/trait
handling reusable and generic. This patch was originally part of D71830
but with the unit tests it can be tested independently.
This patch implements an almost complete handling of OpenMP
contexts/traits such that we can reuse most of the logic in Flang
through the OMPContext.{h,cpp} in llvm/Frontend/OpenMP.
All but construct SIMD specifiers, e.g., inbranch, and the device ISA
selector are define in llvm/lib/Frontend/OpenMP/OMPKinds.def. From
these definitions we generate the enum classes TraitSet,
TraitSelector, and TraitProperty as well as conversion and helper
functions in llvm/lib/Frontend/OpenMP/OMPContext.{h,cpp}.
The OpenMP context is now an explicit object (see `struct OMPContext`).
This is in anticipation of construct traits that need to be tracked. The
OpenMP context, as well as the VariantMatchInfo, are basically made up
of a set of active or respectively required traits, e.g., 'host', and an
ordered container of constructs which allows duplication. Matching and
scoring is kept as generic as possible to allow easy extension in the
future.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D71847
The change in 9b84dabc5f36f7e027c62b5c00ea97e21cfcacdd uses a newer
syntax which not all the builders support. Use an explicit `OR` over
the languages instead to repair the builders.