This adds some basic and/or/xor reduction costs for NEON/MVE, handling them
like other reductions where vector operations are used to reduce to legal
sizes, followed by an optional VREV+VAND/VORR/VEOR step and scalarization from
there.
This is causing a lot of warnings with older compilers and
errors for -Werror builders downstream.
I'm removing it until we can decide a good way to do this,
as it does seem important that this class not be discarded
given it controls formatting.
FuncToLLVM uses the data layout string attribute in 3 different ways:
1. LowerToLLVMOptions options(&getContext(), getAnalysis<DataLayoutAnalysis>().getAtOrAbove(m));
2. options.dataLayout = llvm::DataLayout(this->dataLayout);
3. m->setAttr(..., this->dataLayout));
The 3rd way is unrelated to the other 2 and occurs after conversion, making it confusing.
This revision separates this post-hoc module annotation functionality into its own pass.
The convert-func-to-llvm pass loses its `data-layout` option and instead recovers it from
the `llvm.data_layout` attribute attached to the module, when present.
In the future, `LowerToLLVMOptions options(&getContext(), getAnalysis<DataLayoutAnalysis>().getAtOrAbove(m))` and
`options.dataLayout = llvm::DataLayout(dataLayout);` should be unified.
Reviewed By: ftynse, mehdi_amini
Differential Revision: https://reviews.llvm.org/D157604
Consider the following code:
#include <windows.h>
#include <TraceLoggingActivity.h>
#include <TraceLoggingProvider.h>
#include <winmeta.h>
TRACELOGGING_DEFINE_PROVIDER(
g_hMyComponentProvider,
"SimpleTraceLoggingProvider",
// {0205c616-cf97-5c11-9756-56a2cee02ca7}
(0x0205c616,0xcf97,0x5c11,0x97,0x56,0x56,0xa2,0xce,0xe0,0x2c,0xa7));
void test()
{
TraceLoggingFunction(g_hMyComponentProvider);
}
int main()
{
TraceLoggingRegister(g_hMyComponentProvider);
test();
TraceLoggingUnregister(g_hMyComponentProvider);
}
It compiles with MSVC, but clang-cl reports an error:
C:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared/TraceLoggingActivity.h(377,30): note: expanded from macro '_tlgThisFunctionName'
#define _tlgThisFunctionName __FUNCTION__
^
.\tl.cpp(14,5): error: cannot initialize an array element of type 'char' with an lvalue of type 'const char[5]'
TraceLoggingFunction(g_hMyComponentProvider);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The second commit is not needed to support above code, however, during isolated tests in ms_predefined_expr.cpp
I found that MSVC accepts code with constexpr, whereas clang-cl does not.
I see that in most places PredefinedExpr is supported in constant evaluation, so I didn't wrap my code with ``if(MicrosoftExt)``.
Reviewed By: cor3ntin
Differential Revision: https://reviews.llvm.org/D158591
This adds some basic smin/smax/umin/umax reduction costs for MVE/NEON, similar
to the existing Add reduction costs. They follow the same style as Add
reductions, but include a higher cost as the costs tend to be dependant on the
element size for vminv/vmaxv. These costs may not be precise, but will be more
inline than the default that extracts each element.
When calling a function which requires no streaming-mode change from an
__arm_locally_streaming function, LLVM would otherwise emit:
// function prologue
smstart
b streaming_compatible_function // tail call
// never an smstop
The same transform op can now be used to apply registered pass pipelines.
This revision also adds a helper function for querying `PassPipelineInfo` objects and moves the corresponding `lookup` function for `PassInfo` objects to the `PassInfo` class.
Differential Revision: https://reviews.llvm.org/D159211
Now we can store it in DynTypedNode, we can target these nodes
(SelectionTree) and resolve them (FindTarget).
This makes Hover, go-to-def etc work in all(?) cases.
Also support it in DumpAST.
Differential Revision: https://reviews.llvm.org/D159299
R12 is callee-saved in functions with pacbti-m enabled, but this is
done in assignCalleeSavedSpillSlots, meaning that in
determineCalleeSaves we have to manually set CanEliminateFrame.
This fixes a bug where in leaf functions with no other callee-saved
registers the aut instruction wouldn't be emitted and stack offsets
of arguments passed on the stack would be incorrect.
Differential Revision: https://reviews.llvm.org/D157865
This patch removes the member TTI from VPReductionRecipe, as the
generation of reduction operations no longer requires TTI.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D158148
The dump() is not actually included recursively in any other nodes' dump,
as this is too verbose (similar to NNS) but useful in its own right.
It's unfortunate to not have the actual tests yet, but the DynTypedNode
tests are matcher-based and adding matchers is a larger task than
DynTypedNode support (but can't be done first).
(I've got a clangd change stacked on this that uses DynTypedNode and
dump(), and both work. I'll send a change for matchers next).
Differential Revision: https://reviews.llvm.org/D159300
Similar to the other reductions, this changes the cost of fmin/fmax reductions
under MVE/NEON to perform vector operations until the types need to be
scalarized. The fp16 vectors can perform a VREV+FMIN/FMAX to skip a step of the
reduction, and otherwise need lanewise extract fro the top lanes.
Update VPBlendRecipe::print() to print the result directly, instead of
relying on the stored Phi pointer. This brings the recipe in line with
how other recipes are printed.
The mix-in of the `MultiTileSizesOp` set the default value of its
`divisor` argument. This repeats information from the tablegen
defintion, is not necessary (since the generic code deals with `None`
and default values), and has the risk of running out of sync without
people noticing. This patch removes the setting of the value and forward
`None` to the generic constructor instead.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D159416
This patch simplifies and improves the mix-in of the `TileOp`. In
particular:
* Accept all types of sizes (static, dynamic, scalable) in a single
argument `sizes`.
* Use the existing convenience function to dispatch different types of
sizes instead of repeating the implementation in the mix-in.
* Pass on `None` values as is of optional arguments to the init function
of the super class.
* Reformat with default indentation width (4 spaces vs 2 spaces).
* Add a a test for providing scalable sizes.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D159417
This patch removes some manual conversion of mixed Python/attribute
arguments to `I64ArrayAttr`s, which turned out to be unnecessary.
Interestingly, this change does not depend on the additional attribute
builders added in the (currently pending)
https://reviews.llvm.org/D159403 patch.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D159419
`bufferize_to_allocation` does not supports ops with regions, unless `bufferize_destination_only` is set. In that case, only the operand is replaced with an allocation and wrapped in a `to_tensor` op. The error checking was too strict.
Differential Revision: https://reviews.llvm.org/D159420
This is the last piece required for the loop versioning patch to work on
code lowered via HLFIR. With this patch, HLFIR performance on spec2017
roms is now similar to the FIR lowering.
Adding support for fir.array_coor means that many more loops will be
versioned, even in the FIR lowering. So far as I have seen, these do not
seem to have an impact on performance for the benchmarks I tried, but I
expect it would speed up some programs, if the loop being versioned
happened to be the hot code.
The main difference between fir.array_coor and fir.coordinate_of is
that fir.coordinate_of uses zero-based indices, whereas fir.array_coor
uses the indices as specified in the Fortran program (starting from 1 by
default, but also supporting non default lower bounds). I opted to
transform fir.array_coor operations into fir.coordinate_of operations
because this allows both to share the same offset calculation logic.
The tricky bit of this patch is getting the correct lower bounds for the
array operand to subtract from the fir.array_coor indices to get a
zero-based indices. So far as I can tell, the FIR lowering will always
provide lower bounds (shift) information in the shape operand to the
fir.array_coor when non-default lower bounds are used. If none is given,
I originally tried falling back to reading lower bounds from the box,
but this led to misscompilation in SPEC2017 cam4. Therefore the pass
instead assumes that if it can't already find an SSA value for the shift
information, the default lower bound (1) should be used.
A suspect the incorrect lower bounds in the box for the FIR lowering was
already a known issue (see https://reviews.llvm.org/D158119).
Differential Revision: https://reviews.llvm.org/D158597
This adds some basic fadd/fmul reduction costs for MVE/NEON. It reduces by
halving the vector size until it it gets scalarized, with some additional costs
for fp16 which may require extracting the top lanes.
Differential Revision: https://reviews.llvm.org/D159367
Otherwise these functions are not inlined when invoked from streaming
functions.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D159188
If no frame-pointer is available and the compiler has scavenged a
spill-slot in the callee-save area, the compiler may be forced to emit an
'addvl' inside the streaming-mode-changing call sequence when it needs to
fill (reload) an FP register being passed to the call.
We can avoid this entirely by disabling stack-slot scavenging when there
are streaming-mode-changing call-sequences in the function.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D159196
Fix the operand description: the update was in the wrong place. As a result the
latency of the update was modelled incorrectly, it wasn't available as early as
it should. This was visible in llvm-mca timeline views.
This fixes the problem for the Neoverse V2, but the problem also affects the
other Neoverse cores.
Patch by: Ricardo Jesus
Differential Revision: https://reviews.llvm.org/D159254
The following commit enabled the analysis of ranges for heap allocations:
22ca38da25
The range turns out to be empty in cases such as the one in test (which
is [1,1)), leading to an assertion failure. This patch fixes for the
same case.
Fixes https://github.com/llvm/llvm-project/issues/63856
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D159160
…-delete
So the purpose of the check is more clear. Update examples code to show
compliant code.
Fixes#65221
---------
Co-authored-by: Carlos Gálvez <carlos.galvez@zenseact.com>
Now that the codegen for the expanded ISD::ROTL sequence has been improved,
it's probably profitable to lower a shuffle that's a rotate to the
vsll+vsrl+vor sequence to avoid a vrgather where possible, even if we don't
have the vror instruction.
This patch relaxes the restriction on ISD::ROTL being legal in
lowerVECTOR_SHUFFLEAsRotate. It also attempts to do the lowering twice: Once
if zvbb is enabled before any of the interleave/deinterleave/vmerge lowerings,
and a second time unconditionally just before it falls back to the vrgather.
This way it doesn't interfere with any of the above patterns that may be more
profitable than the expanded ISD::ROTL sequence.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D159353
This makes `ExprToVal` dumping consistent with `LocToVal` dumping.
Reviewed By: ymandel, xazax.hun
Differential Revision: https://reviews.llvm.org/D159274