Instead of defaulting to a cost = 1, expand to element extract/insert like we do for other shuffles.
This exposes an issue in LoopVectorize which could call SK_ExtractSubvector with a scalar subvector type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346656 91177308-0d34-0410-b5e6-96231b3b80d8
The custom root mechanism didn't actually do anything. ShadowStackGC, the only one which used it, just removed the gcroots before they reached the normal lowering in SelectionDAG. As a result, the state flag had no value.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346632 91177308-0d34-0410-b5e6-96231b3b80d8
The GCStrategy provides three configuration options were are largely redundant.
1) Support for conditionally lowering gcread and gcwrite to loads and stores. This is redundant since any GC which wished to use these abstractions would lower them out of existance before the built in lowering anyways. As such, there's no need to have the lowering being conditional.
2) Conditional initialization for allocas marked via gcroot. Semantically, roots have to be initialized before first potential use. Arguably, the frontend really should have responsibility for that, but the old API allowed the frontend to ignore this detail. Only one builtin GC used the non-initializing mode. Since no one to my knowledge actually uses the ErlangGC strategy, I decide the slight pessimization was worth the simplicity. If that turns out to be problematic, we can always improve the insertion algorithm to detect more existing initializing stores.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346621 91177308-0d34-0410-b5e6-96231b3b80d8
This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more
opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs.
Apart from 2-3 strange cases, these are all wins.
I've structured this to be no-functional-change-intended for any target except for x86
because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those
targets have existing regression tests (4, 4, 10 files respectively) that would be
affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show
any regression test diffs. The trade-off is deciding if an extra vector load is better
than a single wide load + extract_subvector.
For x86, this is almost always better (on paper at least) because we often can fold
loads into subsequent ops and not increase the official instruction count. There's also
some unknown -- but potentially large -- benefit from using narrower vector ops if wide
ops are implemented with multiple uops and/or frequency throttling is avoided.
Differential Revision: https://reviews.llvm.org/D54073
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346595 91177308-0d34-0410-b5e6-96231b3b80d8
Previous version used type erasure through a `void* (*)()` pointer,
which triggered gcc warning and implied a lot of reinterpret_cast.
This version should make it harder to hit ourselves in the foot.
Differential revision: https://reviews.llvm.org/D54203
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346522 91177308-0d34-0410-b5e6-96231b3b80d8
Currently in llvm, CalleeSavedInfo can only assign a callee saved register to
stack frame index to be spilled in the prologue. We would like to enable
spilling gprs to vector registers. This patch adds the capability to spill to
other registers aside from just the stack. It also adds the changes for power9
to spill gprs to volatile vector registers when they are available.
This happens only for leaf functions when using the option
-ppc-enable-pe-vector-spills.
Differential Revision: https://reviews.llvm.org/D39386
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346512 91177308-0d34-0410-b5e6-96231b3b80d8
NFC-ish. This doesn't change the behaviour of the outliner, but does make sure
that you won't end up with say
OUTLINED_FUNCTION_2:
...
ret
OUTLINED_FUNCTION_248:
...
ret
as the only outlined functions in your module. Those should really be
OUTLINED_FUNCTION_0:
...
ret
OUTLINED_FUNCTION_1:
...
ret
If we produce outlined functions, they probably should have sequential numbers
attached to them. This makes it a bit easier+stable to write outliner tests.
The point of this is to move towards a bit more stability in outlined function
names. By doing this, we at least don't rely on the traversal order of the
suffix tree. Instead, we rely on the order of the candidate list, which is
*far* more consistent. The candidate list is ordered by the end indices of
candidates, so we're more likely to get a stable ordering. This is still
susceptible to changes in the cost model though (like, if we suddenly find new
candidates, for example).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346340 91177308-0d34-0410-b5e6-96231b3b80d8
Change the type in a couple of lists and sets that only store physical
registers from unsigned to MCPhysRegs. The later is only 16bits and
saves us a bit of memory.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346254 91177308-0d34-0410-b5e6-96231b3b80d8
It was causing a crash because we were trying to get the definition
of a target register. Fixed the issue by adding a check and added
a test case for that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346251 91177308-0d34-0410-b5e6-96231b3b80d8
MachineFunction can only be used in code using lib/CodeGen, hence we
can keep a more specific reference to LLVMTargetMachine rather than just
TargetMachine around.
Do the same for references in ScheduleDAG and RegUsageInfoCollector.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346183 91177308-0d34-0410-b5e6-96231b3b80d8
MachineModuleInfo can only be used in code using lib/CodeGen, hence we
can keep a more specific reference to LLVMTargetMachine rather than just
TargetMachine around.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346182 91177308-0d34-0410-b5e6-96231b3b80d8
The main caller of this already has an MVT and several targets called getSimpleVT inside without checking isSimple. This makes the simpleness explicit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346180 91177308-0d34-0410-b5e6-96231b3b80d8
These methods were just wrappers around getNode with additional asserts (identical and repeated 3 times). But getNode already has a switch that can be used to hold these asserts that allows them to be shared for all 3 opcodes. This also enables checking on the places that create these nodes without using the wrappers.
The rest of the patch is just changing all callers to use getNode directly.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346087 91177308-0d34-0410-b5e6-96231b3b80d8
- Make some TargetPassConfig methods that just check whether options have
been set static.
- Shuffle code in LLVMTargetMachine around so addPassesToGenerateCode
only deals with TargetPassConfig now (but not with MCContext or the
creation of MachineModuleInfo)
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345918 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This function was causing a crash when `MaxElements == 1` because
it was trying to create a single element vector type.
Reviewers: dsanders, aemerson, aditya_nandakumar
Reviewed By: dsanders
Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D53734
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345875 91177308-0d34-0410-b5e6-96231b3b80d8
optsize using masked wide loads
Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53668
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345705 91177308-0d34-0410-b5e6-96231b3b80d8
Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector.
Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination!
I've done my best to fix a number of vectorizer uses:
SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment.
LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta.
Differential Revision: https://reviews.llvm.org/D53573
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345617 91177308-0d34-0410-b5e6-96231b3b80d8
Add an intrinsic that takes 2 integers and perform saturation subtraction on
them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D53783
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345512 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This adds support for LSDA (exception table) generation for wasm EH.
Wasm EH mostly follows the structure of Itanium-style exception tables,
with one exception: a call site table entry in wasm EH corresponds to
not a call site but a landing pad.
In wasm EH, the VM is responsible for stack unwinding. After an
exception occurs and the stack is unwound, the control flow is
transferred to wasm 'catch' instruction by the VM, after which the
personality function is called from the compiler-generated code. (Refer
to WasmEHPrepare pass for more information on this part.)
This patch:
- Changes wasm.landingpad.index intrinsic to take a token argument, to
make this 1:1 match with a catchpad instruction
- Stores landingpad index info and catch type info MachineFunction in
before instruction selection
- Lowers wasm.lsda intrinsic to an MCSymbol pointing to the start of an
exception table
- Adds WasmException class with overridden methods for table generation
- Adds support for LSDA section in Wasm object writer
Reviewers: dschuff, sbc100, rnk
Subscribers: mgorny, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52748
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345345 91177308-0d34-0410-b5e6-96231b3b80d8
As suggested on D52965, this patch moves the i64 to f64 UINT_TO_FP expansion code from LegalizeDAG into TargetLowering and makes it available to LegalizeVectorOps as well.
Not only does this help perform X86 lowering as a true vectorization instead of (partially vectorized) scalar conversions, it avoids the HADDPD op from the scalar code which can be slow on most targets.
The AVX512F does have the vcvtusi2sdq scalar operation but we don't unroll to use it as it seems to only help for the v2f64 case - otherwise the unrolling cost will certainly be too high. My feeling is that we should leave it to the vectorizers - and if it generates the vector UINT_TO_FP we should use it.
Differential Revision: https://reviews.llvm.org/D53649
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345256 91177308-0d34-0410-b5e6-96231b3b80d8
I noticed while fixing PR39368 that we don't have generic shuffle costs for broadcast style shuffles.
This patch adds SK_BROADCAST handling, but exposes ARM/AARCH64 lack of handling of this type, which I've added a fix for at the same time.
Differential Revision: https://reviews.llvm.org/D53570
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345253 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.
Reviewers: arsenm, aheejin, dschuff, javed.absar
Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D53112
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345218 91177308-0d34-0410-b5e6-96231b3b80d8
When implementing memset's today we often see this pattern:
$x0 = MOV 0xXYXYXYXYXYXYXYXY
store $x0, ...
$w1 = MOV 0xXYXYXYXY
store $w1, ...
We first create a 64bit constant in a 64bit register with all bytes the
same and then create a 32bit constant with all bytes the same in a 32bit
register. In many targets we could just access the lower byte of the
64bit register instead.
- Ideally this would be handled by the ConstantHoist pass but it runs
too early when memset isn't expanded yet.
- The memset expansion code already had this optimization implemented,
however SelectionDAG constantfolding would constantfold the
"trunc(bigconstnat)" pattern to "smallconstant".
- This patch makes the memset expansion mark the constant as Opaque and
stop DAGCombiner from constant folding in this situation. (Similar to
how ConstantHoisting marks things as Opaque to avoid folding
ADD/SUB/etc.)
Differential Revision: https://reviews.llvm.org/D53181
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345102 91177308-0d34-0410-b5e6-96231b3b80d8
As suggested on D53258, this patch move the CTPOP expansion code from SelectionDAGLegalize to TargetLowering to allow it to be reused by the VectorLegalizer.
Proper vector support will be added by D53258.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345066 91177308-0d34-0410-b5e6-96231b3b80d8
As suggested on D53258, this patch shares common CTLZ expansion code between VectorLegalizer and SelectionDAGLegalize by putting it in TargetLowering.
Extension to D53474
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345060 91177308-0d34-0410-b5e6-96231b3b80d8
As suggested on D53258, this patch demonstrates sharing common CTTZ expansion code between VectorLegalizer and SelectionDAGLegalize by putting it in TargetLowering.
I intend to move CTLZ and (scalar) CTPOP over as well and then update D53258 accordingly.
Differential Revision: https://reviews.llvm.org/D53474
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345039 91177308-0d34-0410-b5e6-96231b3b80d8
Add an intrinsic that takes 2 integers and perform unsigned saturation
addition on them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D53340
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344971 91177308-0d34-0410-b5e6-96231b3b80d8
Introduce new versions that follow the IEEE semantics
to help with legalization that may need quieted inputs.
There are some regressions from inserting unnecessary
canonicalizes when these are matched from fast math
fcmp + select which should be fixed in a future commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344914 91177308-0d34-0410-b5e6-96231b3b80d8
Change of approach, it looks like it's a much better idea to deal with
the vregs that have LLTs and reg classes both properly, than trying to
avoid creating those across all GlobalISel passes and all targets.
The change mostly touches MachineRegisterInfo::constrainRegClass,
which is apparently only used by MachineCSE. The changes are NFC for
any pipeline but one that contains MachineCSE mid-GlobalISel.
NOTE on isCallerPreservedOrConstPhysReg change in MachineCSE:
There is no test covering it as the only way to insert a new pass
(MachineCSE) from a command line I know of is llc's -run-pass option,
which only works with MIR, but MIRParser freezes reserved registers upon
MachineFunctions creation, making it impossible to reproduce the state
that exposes the issue.
Reviwed By: aditya_nandakumar
Differential Revision: https://reviews.llvm.org/D53144
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344822 91177308-0d34-0410-b5e6-96231b3b80d8
When using MachineInstr to get SlotIndex, the MI could not be a debug
instruction. mi2iMap does not contain debug instructions in it.
After enabling DBG_LABEL in the generated code, the first instruction in
the bundle may be a debug instruction. In this patch, I use the first
non-debug instruction in the bundle to query SlotIndex in mi2iMap.
Bugzilla report: https://bugs.llvm.org/show_bug.cgi?id=39094
Differential revision: https://reviews.llvm.org/D52927
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344770 91177308-0d34-0410-b5e6-96231b3b80d8