We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.
For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.
Fixes PR37731
Rebased and reapplied after being reverted in rL347541 due to PR39774 - which was fixed by D54955/rL347759 and D55017/rL347997
Differential Revision: https://reviews.llvm.org/D54585
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348076 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:
1. VirtReg2Value is generated lazily; there were some cases where
a lookup was performed before all relevant virtual registers were
created, leading to an out-of-sync mapping. Those cases were:
- Complex code to lower formal arguments that generated CopyFromReg
nodes from live-in registers (fixed by never querying the mapping
for live-in registers).
- Code that generates CopyToReg for formal arguments that are used
outside the entry basic block (fixed by never querying the
mapping for Register nodes, which don't need the divergence info
anyway).
2. For complex values that are lowered to a sequence of registers,
all registers must be reflected in the VirtReg2Value mapping.
I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.
There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.
Reviewers: alex-t, arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D54340
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348049 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This simplifies writing predicates for pattern fragments that are
automatically re-associated or commuted.
For example, a followup patch adds patterns for fragments of the form
(add (shl $x, $y), $z) to the AMDGPU backend. Such patterns are
automatically commuted to (add $z, (shl $x, $y)), which makes it basically
impossible to refer to $x, $y, and $z generically in the PredicateCode.
With this change, the PredicateCode can refer to $x, $y, and $z simply
as `Operands[i]`.
Test confirmed that there are no changes to any of the generated files
when building all (non-experimental) targets.
Change-Id: I61c00ace7eed42c1d4edc4c5351174b56b77a79c
Reviewers: arsenm, rampitec, RKSimon, craig.topper, hfinkel, uweigand
Subscribers: wdng, tpr, llvm-commits
Differential Revision: https://reviews.llvm.org/D51994
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347992 91177308-0d34-0410-b5e6-96231b3b80d8
DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend
operands when it is able to do so. For some targets this is more expensive
than a sign-extension, which is also a valid choice. Introduce the
isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger
helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy ==
MVT::i64, as it can be performed using a single instruction.
Differential Revision: https://reviews.llvm.org/D52978
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347977 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Replace `aext([asz]ext x)` with `aext/sext/zext x` in order to
reduce the number of instructions generated to clean up some
legalization artifacts.
Reviewers: aditya_nandakumar, dsanders, aemerson, bogner
Reviewed By: aemerson
Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D54174
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347893 91177308-0d34-0410-b5e6-96231b3b80d8
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.
This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.
The goal of this patch is to refactor all this to return a base
operand instead of a base register.
Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.
Differential Revision: https://reviews.llvm.org/D54846
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347746 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts r294500. DwarfCompileUnit::addAddressExpr uses DIEExpr
for PCOffset. In that case the expression is unrelated to thread locals
and so emitting a value of the DIEExpr does not have to always mean
emit-debug-thread-local.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347744 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Add a hook to the GCMetadataPrinter for emitting stack maps in
custom format. The hook will be called at stack map generation
time. The default stack map format is used if there is no hook.
For this to be useful a few data structures and accessors are
exposed from the StackMaps class, so the custom printer can
access the stack map data.
This patch authored by Cherry Zhang <cherryyz@google.com>.
Reviewers: thanm, apilipenko, reames
Reviewed By: reames
Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D53892
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347584 91177308-0d34-0410-b5e6-96231b3b80d8
This reverts commit r346970.
It was causing PR39774, a crash in slp-vectorizer on a rather simple loop
with just a bunch of 'and's in the body.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347541 91177308-0d34-0410-b5e6-96231b3b80d8
This should likely be adjusted to limit this transform
further, but these diffs should be clear wins.
If we have blendv/conditional move, then we should assume
those are cheap ops. The loads become independent of the
compare, so those can be speculated before we need to use
the values in the blend/mov.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347526 91177308-0d34-0410-b5e6-96231b3b80d8
rL347502 moved the null sibling, so we should group all of these
together. I'm not sure why these aren't methods of the SDValue
class itself, but that's another patch if that's possible.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347523 91177308-0d34-0410-b5e6-96231b3b80d8
...and use them to avoid creating obviously undef values as
discussed in the post-commit thread for r347478.
The diffs in vector div/rem show that we were missing real
optimizations by creating bogus shift nodes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347502 91177308-0d34-0410-b5e6-96231b3b80d8
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.
The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).
Differential Revision: https://reviews.llvm.org/D52653
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347208 91177308-0d34-0410-b5e6-96231b3b80d8
We were adding the entire scalarization extraction cost for reductions, which returns the total cost of extracting every element of a vector type.
For reductions we don't need to do this - we just need to extract the 0'th element after the reduction pattern has completed.
Fixes PR37731
Differential Revision: https://reviews.llvm.org/D54585
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346970 91177308-0d34-0410-b5e6-96231b3b80d8
Using the MBB flags, we can tell if X16/X17/NZCV are unused in a block,
and also not live out.
If this holds for all MBBs, then we can avoid checking for liveness on
that candidate. Furthermore, if it holds for an individual candidate's
MBB, then we can avoid checking for liveness on that candidate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346901 91177308-0d34-0410-b5e6-96231b3b80d8
The machine scheduler currently biases register copies to/from
physical registers to be closer to their point of use / def to
minimize their live ranges. This change extends this to also physical
register assignments from immediate values.
This causes a reduction in reduction in overall register pressure and
minor reduction in spills and indirectly fixes an out-of-registers
assertion (PR39391).
Most test changes are from minor instruction reorderings and register
name selection changes and direct consequences of that.
Reviewers: MatzeB, qcolombet, myatsina, pcc
Subscribers: nemanjai, jvesely, nhaehnle, eraman, hiraditya,
javed.absar, arphaman, jfb, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54218
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346894 91177308-0d34-0410-b5e6-96231b3b80d8
Add support for the expansion of funnelshift/rotates to getIntrinsicInstrCost.
This also required us to move the X86 fshl/fshr costs to the same place as the rotates to avoid expansion and get correct scalarization vs vectorization costs.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346854 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This adds support for the 'event section' specified in the exception
handling proposal. (This was named 'exception section' first, but later
renamed to 'event section' to take possibilities of other kinds of
events into consideration. But currently we only store exception info in
this section.)
The event section is added between the global section and the export
section. This is for ease of validation per request of the V8 team.
This patch:
- Creates the event symbol type, which is a weak symbol
- Makes 'throw' instruction take the event symbol '__cpp_exception'
- Adds relocation support for events
- Adds WasmObjectWriter / WasmObjectFile (Reader) support
- Adds obj2yaml / yaml2obj support
- Adds '.eventtype' printing support
Reviewers: dschuff, sbc100, aardappel
Subscribers: jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D54096
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346825 91177308-0d34-0410-b5e6-96231b3b80d8
We already determine a bunch of information about an MBB in
getMachineOutlinerMBBFlags. We can reuse that information to avoid calculating
things that must be false/true.
The first thing we can easily check is if an outlined sequence could ever
contain calls. There's no reason to walk over the outlined range, checking for
calls, if we already know that there are no calls in the block containing the
sequence.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346809 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, the extend_vector_inreg opcode required their input register to be the same total width as their output. But this doesn't match up with how the X86 instructions are defined. For X86 the input just needs to be a legal type with at least enough elements to cover the output.
This patch weakens the check on these nodes and allows them to be used as long as they have more input elements than output elements. I haven't changed type legalization behavior so it will still create them with matching input and output sizes.
X86 will custom legalize these nodes by shrinking the input to be a 128 bit vector and once we've done that we treat them as legal operations. We still have one case during type legalization where we must custom handle v64i8 on avx512f targets without avx512bw where v64i8 isn't a legal type. In this case we will custom type legalize to a *extend_vector_inreg with a v16i8 input. After that the input is a legal type so type legalization should ignore the node and doesn't need to know about the relaxed restriction. We are no longer allowed to use the default expansion for these nodes during vector op legalization since the default expansion uses a shuffle which required the widths to match. Custom legalization for all types will prevent us from reaching the default expansion code.
I believe DAG combine works correctly with the released restriction because it doesn't check the number of input elements.
The rest of the patch is changing X86 to use either the vector_inreg nodes or the regular zero_extend/sign_extend nodes. I had to add additional isel patterns to handle any_extend during isel since simplifydemandedbits can create them at any time so we can't legalize to zero_extend before isel. We don't yet create any_extend_vector_inreg in simplifydemandedbits.
Differential Revision: https://reviews.llvm.org/D54346
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346784 91177308-0d34-0410-b5e6-96231b3b80d8
The IEEE-754 Standard makes it clear that fneg(x) and
fsub(-0.0, x) are two different operations. The former is a bitwise
operation, while the latter is an arithmetic operation. This patch
creates a dedicated FNeg IR Instruction to model that behavior.
Differential Revision: https://reviews.llvm.org/D53877
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346774 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of returning Flags, return true if the MBB is safe to outline from.
This lets us check for unsafe situations, like say, in AArch64, X17 is live
across a MBB without being defined in that MBB. In that case, there's no point
in performing an instruction mapping.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346718 91177308-0d34-0410-b5e6-96231b3b80d8
Remove another bit of unused configuration potential from GCStrategy. It's not entirely clear what the intention here was, but from the docs, it sounds like this may have been subsumed by patchable call support.
Note: This change is deliberately small to make it clear that while implemented, there's nothing using the option. A following NFC will do most of the simplifications.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346701 91177308-0d34-0410-b5e6-96231b3b80d8
Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.
This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346656 91177308-0d34-0410-b5e6-96231b3b80d8
The custom root mechanism didn't actually do anything. ShadowStackGC, the only one which used it, just removed the gcroots before they reached the normal lowering in SelectionDAG. As a result, the state flag had no value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346632 91177308-0d34-0410-b5e6-96231b3b80d8
The GCStrategy provides three configuration options were are largely redundant.
1) Support for conditionally lowering gcread and gcwrite to loads and stores. This is redundant since any GC which wished to use these abstractions would lower them out of existance before the built in lowering anyways. As such, there's no need to have the lowering being conditional.
2) Conditional initialization for allocas marked via gcroot. Semantically, roots have to be initialized before first potential use. Arguably, the frontend really should have responsibility for that, but the old API allowed the frontend to ignore this detail. Only one builtin GC used the non-initializing mode. Since no one to my knowledge actually uses the ErlangGC strategy, I decide the slight pessimization was worth the simplicity. If that turns out to be problematic, we can always improve the insertion algorithm to detect more existing initializing stores.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346621 91177308-0d34-0410-b5e6-96231b3b80d8
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.
I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.
For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.
Differential Revision: https://reviews.llvm.org/D54073
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346595 91177308-0d34-0410-b5e6-96231b3b80d8
Previous version used type erasure through a `void* (*)()` pointer,
which triggered gcc warning and implied a lot of reinterpret_cast.
This version should make it harder to hit ourselves in the foot.
Differential revision: https://reviews.llvm.org/D54203
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346522 91177308-0d34-0410-b5e6-96231b3b80d8
Currently in llvm, CalleeSavedInfo can only assign a callee saved register to
stack frame index to be spilled in the prologue. We would like to enable
spilling gprs to vector registers. This patch adds the capability to spill to
other registers aside from just the stack. It also adds the changes for power9
to spill gprs to volatile vector registers when they are available.
This happens only for leaf functions when using the option
-ppc-enable-pe-vector-spills.
Differential Revision: https://reviews.llvm.org/D39386
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346512 91177308-0d34-0410-b5e6-96231b3b80d8
NFC-ish. This doesn't change the behaviour of the outliner, but does make sure
that you won't end up with say
OUTLINED_FUNCTION_2:
...
ret
OUTLINED_FUNCTION_248:
...
ret
as the only outlined functions in your module. Those should really be
OUTLINED_FUNCTION_0:
...
ret
OUTLINED_FUNCTION_1:
...
ret
If we produce outlined functions, they probably should have sequential numbers
attached to them. This makes it a bit easier+stable to write outliner tests.
The point of this is to move towards a bit more stability in outlined function
names. By doing this, we at least don't rely on the traversal order of the
suffix tree. Instead, we rely on the order of the candidate list, which is
*far* more consistent. The candidate list is ordered by the end indices of
candidates, so we're more likely to get a stable ordering. This is still
susceptible to changes in the cost model though (like, if we suddenly find new
candidates, for example).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346340 91177308-0d34-0410-b5e6-96231b3b80d8
Change the type in a couple of lists and sets that only store physical
registers from unsigned to MCPhysRegs. The later is only 16bits and
saves us a bit of memory.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346254 91177308-0d34-0410-b5e6-96231b3b80d8
It was causing a crash because we were trying to get the definition
of a target register. Fixed the issue by adding a check and added
a test case for that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346251 91177308-0d34-0410-b5e6-96231b3b80d8
MachineFunction can only be used in code using lib/CodeGen, hence we
can keep a more specific reference to LLVMTargetMachine rather than just
TargetMachine around.
Do the same for references in ScheduleDAG and RegUsageInfoCollector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346183 91177308-0d34-0410-b5e6-96231b3b80d8
MachineModuleInfo can only be used in code using lib/CodeGen, hence we
can keep a more specific reference to LLVMTargetMachine rather than just
TargetMachine around.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346182 91177308-0d34-0410-b5e6-96231b3b80d8
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346180 91177308-0d34-0410-b5e6-96231b3b80d8
These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers.
The rest of the patch is just changing all callers to use getNode directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346087 91177308-0d34-0410-b5e6-96231b3b80d8
- Make some TargetPassConfig methods that just check whether options have
been set static.
- Shuffle code in LLVMTargetMachine around so addPassesToGenerateCode
only deals with TargetPassConfig now (but not with MCContext or the
creation of MachineModuleInfo)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345918 91177308-0d34-0410-b5e6-96231b3b80d8