MVE has VPT instructions, which perform the duties of both a VCMP and a VPST in
a single instruction, performing the compare and starting the VPT block in one.
This teaches the MVEVPTBlockPass to fold them, searching back through the
basicblock for a valid VCMP and creating the VPT from its operands.
There are some changes to the VPT instructions to accommodate this, altering
the order of the operands to match the VCMP better, and changing P0 register
defs to be VPR defs, as is used in other places.
Differential Revision: https://reviews.llvm.org/D66577
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371982 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Adds the following inline asm constraints for SVE:
- Upl: One of the low eight SVE predicate registers, P0 to P7 inclusive
- Upa: SVE predicate register with full range, P0 to P15
Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, cameron.mcinally, greened, rengolin
Reviewed By: rovka
Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66524
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371967 91177308-0d34-0410-b5e6-96231b3b80d8
Masked loads and store fit naturally with MVE, the instructions being easily
predicated. This adds lowering for the simple cases of masked loads and stores.
It does not yet deal with widening/narrowing or pre/post inc, and so is
currently behind an option.
The llvm masked load intrinsic will accept a "passthru" value, dictating the
values used for the zero masked lanes. In MVE the instructions write 0 to the
zero predicated lanes, so we need to match a passthru that isn't 0 (or undef)
with a select instruction to pull in the correct data after the load.
Differential Revision: https://reviews.llvm.org/D67186
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371932 91177308-0d34-0410-b5e6-96231b3b80d8
For some reason we sometimes insert new instructions one instruction before
the first non-PHI when legalizing. This can result in having non-PHI
instructions before PHIs, which mean that PHI elimination doesn't catch them.
Differential Revision: https://reviews.llvm.org/D67570
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371901 91177308-0d34-0410-b5e6-96231b3b80d8
Because memory intrinsics are handled differently than other calls, we need to
check them for tail call eligiblity in the legalizer. This allows us to still
inline them when it's beneficial to do so, but also tail call when possible.
This adds simple tail calling support for when the intrinsic is followed by a
return.
It ports the attribute checks from `TargetLowering::isInTailCallPosition` into
a similarly-named function in LegalizerHelper.cpp. The target-specific
`isUsedByReturnOnly` hook is not ported here.
Update tailcall-mem-intrinsics.ll to show that GlobalISel can now tail call
memory intrinsics.
Update legalize-memcpy-et-al.mir to have a case where we don't tail call.
Differential Revision: https://reviews.llvm.org/D67566
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371893 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for tail calling callees with varargs, equivalent to how it
is done in AArch64ISelLowering.
This only works for sibling calls, and does not add the necessary support for
musttail with varargs. (See r345641 for equivalent ISelLowering support.) This
should be implemented when we stop falling back on musttail.
Update call-translator-tail-call.ll to show that we can now tail call varargs.
Differential Revision: https://reviews.llvm.org/D67518
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371868 91177308-0d34-0410-b5e6-96231b3b80d8
All tests with -run-pass !=none should not in MIR/, See MIR/README.
```
Tests for codegen passes should NOT be here but in
test/CodeGen/sometarget. As
a rule of thumb this directory should only contain tests using
'llc -run-pass none'.
```
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371857 91177308-0d34-0410-b5e6-96231b3b80d8
Follow up of rL371321 that added FMA FP16 patterns. This adds more tests
for @llvm.fma.f16. This probably shows we miss one fmsub optimisation
opportunity, which I will look into.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371833 91177308-0d34-0410-b5e6-96231b3b80d8
Unlike SelectionDAG, treat this as a normally legalizable operation.
In SelectionDAG this is supposed to only ever formed if it's legal,
but I've found that to be restricting. For AMDGPU this is contextually
legal depending on whether denormal flushing is allowed in the use
function.
Technically we currently treat the denormal mode as a subtarget
feature, so custom lowering could be avoided. However I consider this
to be a defect, and this should be contextually dependent on the
controllable rounding mode of the parent function.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371800 91177308-0d34-0410-b5e6-96231b3b80d8
This testcase is invalid, and caught by the verifier. For the verifier
to catch it, the live interval computation needs to complete. Remove
the assert so the verifier catches this, which is less confusing.
In this testcase there is an undefined use of a subregister, and lanes
which aren't used or defined. An equivalent testcase with the
super-register shrunk to have no untouched lanes already hit this
verifier error.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371792 91177308-0d34-0410-b5e6-96231b3b80d8
This was relying on the SGPR usable for the carry out clobber to also
be used for the input. There was no carry out on gfx9. With no carry
out clobber to worry about, so the literal can just be directly used
with a VOP2 add.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371791 91177308-0d34-0410-b5e6-96231b3b80d8
With the landing of the previous patch (in particular D66318) there are a lot fewer diffs now. I added an experimental O0 line, and updated all the tests to group experimental and non-experimental O0/O3 together.
Skimming the remaining diffs, there's only a few which are obviously incorrect. There's a large number which are questionable, so more todo.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371790 91177308-0d34-0410-b5e6-96231b3b80d8
Swiftself uses a callee-saved register. We can tail call when the register used
in the caller and callee is the same.
This behaviour is equivalent to that in `TargetLowering::parametersInCSRMatch`.
Update call-translator-tail-call.ll to verify that we can do this. When we
support inline assembly, we can write a check similar to the one in the
general swiftself.ll. For now, we need to verify that we get the correct COPY
instruction after call lowering.
Differential Revision: https://reviews.llvm.org/D67511
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371788 91177308-0d34-0410-b5e6-96231b3b80d8
This is the first sweep of generic code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. That will come later. See D66309 for context.
Differential Revision: https://reviews.llvm.org/D66318
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371786 91177308-0d34-0410-b5e6-96231b3b80d8
This adds support for lowering sibling calls with outgoing arguments.
e.g
```
define void @foo(i32 %a)
```
Support is ported from AArch64ISelLowering's `isEligibleForTailCallOptimization`.
The only thing that is missing is a full port of
`TargetLowering::parametersInCSRMatch`. So, if we're using swiftself,
we'll never tail call.
- Rename `analyzeCallResult` to `analyzeArgInfo`, since the function is now used
for both outgoing and incoming arguments
- Teach `OutgoingArgHandler` about tail calls. Tail calls use frame indices for
stack arguments.
- Teach `lowerFormalArguments` to set the bytes in the caller's stack argument
area. This is used later to check if the tail call's parameters will fit on
the caller's stack.
- Add `areCalleeOutgoingArgsTailCallable` to perform the eligibility check on
the callee's outgoing arguments.
For testing:
- Update call-translator-tail-call to verify that we can now tail call with
outgoing arguments, use G_FRAME_INDEX for stack arguments, and respect the
size of the caller's stack
- Remove GISel-specific check lines from speculation-hardening.ll, since GISel
now tail calls like the other selectors
- Add a GISel test line to tailcall-string-rvo.ll since we can tail call in that
test now
- Add a GISel test line to tailcall_misched_graph.ll since we tail call there
now. Add specific check lines for GISel, since the debug output from the
machine-scheduler differs with GlobalISel. The dependency still holds, but
the output comes out in a different order.
Differential Revision: https://reviews.llvm.org/D67471
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371780 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Since the SPE4RC register class contains an identical set of registers
and an identical spill size to the GPRC class its slightly confusing
the tablegen emitter. It's preventing the GPRC_and_GPRC_NOR0 synthesized
register class from inheriting VTs and AltOrders from GPRC or GPRC_NOR0.
This is because SPE4C is found first in the super register class list
when inheriting these properties and it doesn't set the VTs or
AltOrders the same way as GPRC or GPRC_NOR0.
This patch replaces all uses of GPE4RC with GPRC and allows GPRC and
GPRC_NOR0 to contain f32.
The test changes here are because the AltOrders are being inherited
to GPRC_NOR0 now.
Found while trying to determine if getCommonSubClass needs to take
a VT argument. It was originally added to support fp128 on x86-64,
I've changed some things about that so that it might be needed
anymore. But a PowerPC test crashed without it and I think its
due to this subclass issue.
Reviewers: jhibbits, nemanjai, kbarton, hfinkel
Subscribers: wuzish, nemanjai, mehdi_amini, hiraditya, kbarton, MaskRay, dexonsmith, jsji, shchenz, steven.zhang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67513
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371779 91177308-0d34-0410-b5e6-96231b3b80d8
The X86 decision assumes the compare will produce a result in an XMM
register, but that can't happen for an fp128 compare since those
go to a libcall the returns an i32. Pass the VT so X86 can check
the type.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371775 91177308-0d34-0410-b5e6-96231b3b80d8
IRTranslator creates G_DYN_STACKALLOC instruction during expansion of
alloca when argument that tells number of elements to allocate on stack
is a virtual register. Use default lowering for MIPS32.
Differential Revision: https://reviews.llvm.org/D67440
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371728 91177308-0d34-0410-b5e6-96231b3b80d8
G_IMPLICIT_DEF is used for both integer and floating point implicit-def.
Handle G_IMPLICIT_DEF as ambiguous opcode in MipsRegisterBankInfo.
Select G_IMPLICIT_DEF for MIPS32.
Differential Revision: https://reviews.llvm.org/D67439
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371727 91177308-0d34-0410-b5e6-96231b3b80d8
This is the main CodeGen patch to support the arm64_32 watchOS ABI in LLVM.
FastISel is mostly disabled for now since it would generate incorrect code for
ILP32.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371722 91177308-0d34-0410-b5e6-96231b3b80d8
Current implementation of estimating divisions loses precision since it
estimates reciprocal first and does multiplication. This patch is to re-order
arithmetic operations in the last iteration in DAGCombiner to improve the
accuracy.
Reviewed By: Sanjay Patel, Jinsong Ji
Differential Revision: https://reviews.llvm.org/D66050
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371713 91177308-0d34-0410-b5e6-96231b3b80d8
AVX512 instructions can cause a frequency drop on these CPUs. This
can negate the performance gains from using wider vectors. Enabling
prefer-vector-width=256 will prevent generation of zmm registers
unless explicit 512 bit operations are used in the original source
code.
I believe gcc and icc both do something similar to this by default.
Differential Revision: https://reviews.llvm.org/D67259
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371694 91177308-0d34-0410-b5e6-96231b3b80d8
First we were asserting that the ValNo of a VA was the wrong value. It doesn't actually
make a difference for us in CallLowering but fix that anyway to silence the assert.
The bigger issue was that after fixing the assert we were generating invalid MIR
because the merging/unmerging of values split across multiple registers wasn't
also implemented for memory locs. This happens when we run out of registers and
have to pass the split types like i128 -> i64 x 2 on the stack. This is do-able, but
for now just fall back.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371693 91177308-0d34-0410-b5e6-96231b3b80d8
Before, we only checked the callee for swifterror. However, we should also be
checking the caller to see if it has a swifterror parameter.
Since we don't currently handle outgoing arguments, this didn't show up in the
swifterror.ll testcase.
Also, remove the swifterror checks from call-translator-tail-call.ll, since
they are covered by the existing swifterror testing. Better to have it all in
one place.
Differential Revision: https://reviews.llvm.org/D67465
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@371692 91177308-0d34-0410-b5e6-96231b3b80d8