Change the interface of CallLowering::lowerFormalArguments to accept
several virtual registers for each formal argument, instead of just one.
This is a follow-up to D46018.
CallLowering::lowerReturn was similarly refactored in D49660. lowerCall
will be refactored in the same way in follow-up patches.
With this change, we forward the virtual registers generated for
aggregates to CallLowering. Therefore, the target can decide itself
whether it wants to handle them as separate pieces or use one big
register. We also copy the pack/unpackRegs helpers to CallLowering to
facilitate this.
ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.
AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was
put into a s64 instead of a p0. Added a test-case which illustrates the
problem more clearly (it crashes without this patch) and fixed the
existing test-case to expect p0.
AMDGPU has been updated to unpack into the virtual registers for
kernels. I think the other code paths fall back for aggregates, so this
should be NFC.
Mips doesn't support aggregates yet, so it's also NFC.
x86 seems to have code for dealing with aggregates, but I couldn't find
the tests for it, so I just added a fallback to DAGISel if we get more
than one virtual register for an argument.
Differential Revision: https://reviews.llvm.org/D63549
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364510 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The +DumpCode attribute is a horrible hack in AMDGPU to embed the
disassembly of the generated code into the elf file. It is used by LLPC
to implement an extension that allows the application to read back the
disassembly of the code.
It tries to print an entry label at the start of every function, but
that didn't work for the first function in the module because
DumpCodeInstEmitter wasn't initialised until EmitFunctionBodyStart
which is too late.
Change-Id: I790d73ddf4f51fd02ab32529380c7cb7c607c4ee
Reviewers: arsenm, tpr, kzhuravl
Reviewed By: arsenm
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63712
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364508 91177308-0d34-0410-b5e6-96231b3b80d8
The LivePhysRegs calculated in order to find a scratch register in the
epilogue code wrongly uses 'LiveIns'. Instead, it should use the
'Liveout' sets. For the liveness, also considering the operands of
the terminator (return) instruction which is the insertion point for
the scratch-exec-copy instruction.
Patch by Christudasan Devadasan
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364470 91177308-0d34-0410-b5e6-96231b3b80d8
Original patch https://reviews.llvm.org/D63659 from
Steven Perron <stevenperron@google.com>
The pass AMDGPUUnifyDivergentExitNodes does not update the phi nodes in
the successors of blocks that is splits. This is fixed by calling
BasicBlock::splitBasicBlock to split the block instead of doing it
manually. This does extra work because a new conditional branch is
created in BB which is immediately replaced, but I think the simplicity
is worth it. It also helps make the code more future proof in case other
things need to be updated.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364342 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The symbols use the processor-specific SHN_AMDGPU_LDS section index
introduced with a previous change. The linker is then expected to resolve
relocations, which are also emitted.
Initially disabled for HSA and PAL environments until they have caught up
in terms of linker and runtime loader.
Some notes:
- The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered
to a constant at compile times, which means some tests can no longer
be applied.
The current "solution" is a terrible hack, but the intrinsic isn't
used by Mesa, so we can keep it for now.
- We no longer know the full LDS size per kernel at compile time, which
means that we can no longer generate a relevant error message at
compile time. It would be possible to add a check for the size of
individual variables, but ultimately the linker will have to perform
the final check.
Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275
Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin
Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61494
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364297 91177308-0d34-0410-b5e6-96231b3b80d8
Scalar extends to s64 can use S_BFE_{I64|U64}, but vector extends need
to extend to the 32-bit half, and then to 64.
I'm not sure what the line should be between what RegBankSelect
handles, and what instruction select does, but for now I'm erring on
the side of RegBankSelect for future post-RBS combines.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364212 91177308-0d34-0410-b5e6-96231b3b80d8
This needs different handling if the source is known to be a valid
condition or not. Handle turning it into shifts or a select during
regbankselect.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364186 91177308-0d34-0410-b5e6-96231b3b80d8
This matters for byval uses outside of the entry block, which appear
as copies.
Previously, the only folding done was during selection, which could
not see the underlying frame index. For any uses outside the entry
block, the frame index was materialized in the entry block relative to
the global scratch wave offset.
This may produce worse code in cases where the offset ends up not
fitting in the MUBUF offset field. A better heuristic would be helpfu
for extreme frames.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364185 91177308-0d34-0410-b5e6-96231b3b80d8
We sometimes get poor code size because constants of types < 32b are legalized
as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the
localizer can no longer sink them (although it's possible to extend it to do so).
On AArch64 however s8 and s16 constants can be selected in the same way as s32
constants, with a mov pseudo into a W register. If we make s8 and s16 constants
legal then we can avoid unnecessary truncates, they can be CSE'd, and the
localizer can sink them as normal.
There is a caveat: if the user of a smaller constant has to widen the sources,
we end up with an anyext of the smaller typed G_CONSTANT. This can cause
regressions because of the additional extend and missed pattern matching. To
remedy this, there's a new artifact combiner to generate the wider G_CONSTANT
if it's legal for the target.
Differential Revision: https://reviews.llvm.org/D63587
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364075 91177308-0d34-0410-b5e6-96231b3b80d8
Every called function could possibly need this to calculate the
absolute address of stack objectst, and this avoids inserting a copy
around every call site in the kernel. It's also somewhat cleaner to
keep this in a callee saved SGPR.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363990 91177308-0d34-0410-b5e6-96231b3b80d8
The attribute can specify elimination for leaf or non-leaf, so it
should always be considered. I copied this bug from AArch64, which
probably should also be fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363949 91177308-0d34-0410-b5e6-96231b3b80d8
This should only matter in vectors with an undef component, since a
full undef vector would have been folded out.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363941 91177308-0d34-0410-b5e6-96231b3b80d8
Introducing VCC defs during SIFixSGPRCopies is generally
problematic. Avoid it by starting with the VOP3 form with the general
condition register. This is the easiest to fix instance, but doesn't
solve any specific problems I'm looking at.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363904 91177308-0d34-0410-b5e6-96231b3b80d8
The def instruction for the vreg may not match, because it may be
folding through a reg_sequence. The assert was overly conservative and
not necessary. It's not actually important if DefMI really defined the
register, because the fold that will be done cares about the def of
the value that will be folded.
For some reason copies aren't making it through the reg_sequence,
although they should.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363876 91177308-0d34-0410-b5e6-96231b3b80d8
This reapplies r363678, using the correct chain for the CopyToReg for
v0. glueCopyToM0 counterintuitively changes the operands of the
original node.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363870 91177308-0d34-0410-b5e6-96231b3b80d8
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.
Patch by Matthias Braun
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363757 91177308-0d34-0410-b5e6-96231b3b80d8
This was ignoring the flag on fneg, and using the source instruction's
flags. Also fixes tests missing from r358702.
Note the expansion itself isn't correct without nnan, but that should
be fixed separately.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363637 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
The purpose of the padding is to guard against stale code being
fetched into the instruction cache by the lowest level prefetching.
We're generating relocatable ELF here, and so the padding should
arguably be added by the linker. This is in fact what Mesa does.
This also fixes multi-part shaders for Mesa.
Change-Id: I6bfede58f20e9f337762ccf39ef9e0e263e69e82
Reviewers: arsenm, rampitec, t-tye
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63427
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363602 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This is related to the changes to the groupstaticsize intrinsic in
D61494 which would otherwise make the related tests in these files
fail or much less useful.
Note that for some reason, SOPK generation is less effective in the
amdhsa OS, which is why I chose PAL. I haven't investigated this
deeper.
Change-Id: I6bb99569338f7a433c28b4c9eb1e3e036b00d166
Reviewers: arsenm
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63392
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363600 91177308-0d34-0410-b5e6-96231b3b80d8
The pass works in two modes:
Mode 1: Just set attributes starting from kernels. This can work at
the very beginning of opt and llc pipeline, but cannot clone functions
because it must be a function pass.
Mode 2: Actually clone functions for new attributes. This can only work
after all function passes in the opt pipeline because it has to be a
module pass.
Differential Revision: https://reviews.llvm.org/D63208
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363586 91177308-0d34-0410-b5e6-96231b3b80d8