This reapplies r356149, using the correct overload of findUnusedReg
which passes the current iterator.
This worked most of the time, because the scavenger iterator was moved
at the end of the frame index loop in PEI. This would fail if the
spill was the first instruction. This was further hidden by the fact
that the scavenger wasn't passed in for normal frame index
elimination.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357098 91177308-0d34-0410-b5e6-96231b3b80d8
Another test is needed for the case where the scavenge fail, but
there's another issue with that which needs an additional fix.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357093 91177308-0d34-0410-b5e6-96231b3b80d8
Also includes one example of how this transform is unsound. This isn't
verifying the copies are used in the control flow intrinisic patterns.
Also add option to disable exec mask opt pass. Since this pass is
unsound, it may be useful to turn it off until it is fixed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357091 91177308-0d34-0410-b5e6-96231b3b80d8
This is not a control flow instruction, so should not be marked as
isBarrier. This fixes a verifier error if followed by unreachable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357081 91177308-0d34-0410-b5e6-96231b3b80d8
Subregister indexes are not used for physical register operands, so
isFullCopy is implied by the physical register check.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356956 91177308-0d34-0410-b5e6-96231b3b80d8
These were defaulting to true, but they are just wrappers around bit
operations. This avoids regressions in the exec mask optimization
passes in a future commit.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356952 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
Adding contained caching to AliasAnalysis. BasicAA is currently the only one using it.
AA changes:
- This patch is pulling the caches from BasicAAResults to AAResults, meaning the getModRefInfo call benefits from the IsCapturedCache as well when in "batch mode".
- All AAResultBase implementations add the QueryInfo member to all APIs. AAResults APIs maintain wrapper APIs such that all alias()/getModRefInfo call sites are unchanged.
- AA now provides a BatchAAResults type as a wrapper to AAResults. It keeps the AAResults instance and a QueryInfo instantiated to batch mode. It delegates all work to the AAResults instance with the batched QueryInfo. More API wrappers may be needed in BatchAAResults; only the minimum needed is currently added.
MemorySSA changes:
- All walkers are now templated on the AA used (AliasAnalysis=AAResults or BatchAAResults).
- At build time, we optimize uses; now we create a local walker (lives only as long as OptimizeUses does) using BatchAAResults.
- All Walkers have an internal AA and only use that now, never the AA in MemorySSA. The Walkers receive the AA they will use when built.
- The walker we use for queries after the build is instantiated on AliasAnalysis and is built after building MemorySSA and setting AA.
- All static methods doing walking are now templated on AliasAnalysisType if they are used both during build and after. If used only during build, the method now only takes a BatchAAResults. If used only after build, the method now takes an AliasAnalysis.
Subscribers: sanjoy, arsenm, jvesely, nhaehnle, jlebar, george.burgess.iv, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59315
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356783 91177308-0d34-0410-b5e6-96231b3b80d8
Some image ops return three or five dwords. Previously, we modeled that
with a 4 or 8 dword register class. The register allocator could
cleverly spot that some subregs were dead and allocate something else
there, but that caused the de-optimization that waitcnt insertion would
think that the result was used immediately.
This commit allows such an image op to have a result with a three or
five dword result, avoiding the above de-optimization.
Differential Revision: https://reviews.llvm.org/D58905
Change-Id: I3651211bbd7ed22721ee7b9fefd7bcc60a809d8b
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356757 91177308-0d34-0410-b5e6-96231b3b80d8
Now we have vec3 MVTs, this commit implements dwordx3 variants of the
buffer intrinsics.
On gfx6, a dwordx3 buffer load intrinsic is implemented as a dwordx4
instruction, and a dwordx3 buffer store intrinsic is not supported.
We need to support the dwordx3 load intrinsic because it is generated by
subtarget-unaware code in InstCombine.
Differential Revision: https://reviews.llvm.org/D58904
Change-Id: I016729d8557b98a52f529638ae97c340a5922a4e
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356755 91177308-0d34-0410-b5e6-96231b3b80d8
Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.
SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58902
Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356659 91177308-0d34-0410-b5e6-96231b3b80d8
My previous fix rL356591 "[AMDGPU] Added MsgPack format PAL metadata"
accidentally caused a spurious PAL metadata .note record to be emitted
for any AMDGPU output. That caused failures in the lld test
amdgpu-relocs.s. Fixed.
Differential Revision: https://reviews.llvm.org/D59613
Change-Id: Ie04a2aaae890dcd490f22c89edf9913a77ce070e
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356621 91177308-0d34-0410-b5e6-96231b3b80d8
Summary: - The linking is broken when this library is built as shared one.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59610
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356617 91177308-0d34-0410-b5e6-96231b3b80d8
The constantness shouldn't change the register bank choice. We also
don't need to restrict this to only indexing VGPRs, since it's
possible to index SGPRs (but SelectionDAG made using this
difficult). Allow directly indexing SGPRs when appropriate.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356611 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
- Should use `targetconstant` instead of `constant` operand for clamp
bit, which is expected as an immediate operand. Under certain
conditions, such as a common `i1 false` constant is used in other
place and selected before the instruction with clamp bit, register
operand may be added instead of immediate one. Use `targetcosntant` to
enforce that.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59608
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356608 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
PAL metadata now supports both the old linear reg=val pairs format and
the new MsgPack format.
The MsgPack format uses YAML as its textual representation. On output to
YAML, a mnemonic name is provided for some hardware registers.
Differential Revision: https://reviews.llvm.org/D57028
Change-Id: I2bbaabaaca4b3574f7e03b80fbef7c7a69d06a94
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356591 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This commit introduces a new AMDGPUPALMetadata class that:
* is inside the AMDGPU target;
* keeps an in-memory representation of PAL metadata;
* provides a method to read the frontend-supplied metadata from LLVM IR;
* provides methods for the asm printer to set metadata items;
* provides methods to write the metadata as a binary blob to put in a
.note record or as an asm directive;
* provides a method to read the metadata as a binary blob from a .note
record.
Because llvm-readobj cannot call directly into a target, I had to remove
llvm-readobj's ability to dump PAL metadata, pending a resolution to
https://reviews.llvm.org/D52821
Differential Revision: https://reviews.llvm.org/D57027
Change-Id: I756dc830894fcb6850324cdcfa87c0120eb2cf64
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356582 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
If an MIMG instruction has managed to get through to adjustWritemask in isel but
has no uses (and doesn't enable TFC) then prevent an assertion by not attempting
to adjust the writemask.
The instruction will be removed anyway.
Change-Id: I9a5dba6bafe1f35ac99c1b73df390936e2ac27a7
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58964
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356540 91177308-0d34-0410-b5e6-96231b3b80d8
This will allow targets more flexibility to replace the
register allocator core passes. In a future commit,
AMDGPU will run the core register assignment passes
twice, and will also want to disallow using the
standard -regalloc option.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356506 91177308-0d34-0410-b5e6-96231b3b80d8
I found this really weird WWM-related case whereby through the WWM
transformations our isel lowering was trying to promote 2 min's into a
min3 for the i8 type, which our hardware doesn't support.
The new min3_i8.ll test case would previously spew the error:
PromoteIntegerResult #0: t69: i8 = SMIN3 t70, Constant:i8<0>, t68
Before the simple fix to our isel lowering to not do it for i8 MVT's.
Differential Revision: https://reviews.llvm.org/D59543
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356464 91177308-0d34-0410-b5e6-96231b3b80d8
Allow the clamp modifier on vop3 int arithmetic instructions in assembly
and disassembly.
This involved adding a clamp operand to the affected instructions in MIR
and MC, and thus having to fix up several places in codegen and MIR
tests.
Differential Revision: https://reviews.llvm.org/D59267
Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356399 91177308-0d34-0410-b5e6-96231b3b80d8
This commit allows v_cndmask_b32_e64 with abs, neg source
modifiers on src0, src1 to be assembled and disassembled.
This does appear to be allowed, even though they are floating point
modifiers and the operand type is b32.
To do this, I added src0_modifiers and src1_modifiers to the
MachineInstr, which involved fixing up several places in codegen and mir
tests.
Differential Revision: https://reviews.llvm.org/D59191
Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356398 91177308-0d34-0410-b5e6-96231b3b80d8
This fixes a couple of unflushed raw_string_ostream bugs in recent
commits that only show up on a bot building on windows with expensive
checks.
Differential Revision: https://reviews.llvm.org/D59396
Change-Id: I9c6208325503b3ee0786b4b688e13fc24a15babf
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356394 91177308-0d34-0410-b5e6-96231b3b80d8
Add an experimental buffer fat pointer address space that is currently
unhandled in the backend. This commit reserves address space 7 as a
non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer
descriptor + 32-bit offset) that is heavily used in graphics workloads
using the AMDGPU backend.
Differential Revision: https://reviews.llvm.org/D58957
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356373 91177308-0d34-0410-b5e6-96231b3b80d8
There are a few different issues, mostly stemming from using
generation based checks for anything instead of subtarget
features. Stop adding flat-address-space as a feature for HSA, as it
should only be a device property. This was incorrectly allowing flat
instructions to select for SI.
Increase the default generation for HSA to avoid the encoding error
when emitting objects. This has some other side effects from various
checks which probably should be separate subtarget features (in the
cost model and for dealing with the DS offset folding issue).
Partial fix for bug 41070. It should probably be an error to try using
amdhsa without flat support.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356347 91177308-0d34-0410-b5e6-96231b3b80d8
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:
* Fixed assumptions of power-of-2 vector type in kernel arg handling,
and added v5 kernel arg tests and v3/v5 shader arg tests.
* Added v5 tests for cost analysis.
* Added vec3/vec5 arg test cases.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58928
Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356342 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
At the exit of the loop, the compiler uses a register to remember and accumulate
the number of threads that have already exited. When all active threads exit the
loop, this register is used to restore the exec mask, and the execution continues
for the post loop code.
When there is a "continue" in the loop, the compiler made a mistake to reset the
register to 0 when the "continue" backedge is taken. This will result in some
threads not executing the post loop code as they are supposed to.
This patch fixed the issue.
Reviewers:
nhaehnle, arsenm
Differential Revision:
https://reviews.llvm.org/D59312
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356298 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
- During the fixing of SGPR copying from VGPR, ensure users of SCC is
properly propagated, i.e.
* only propagate through live def of SCC,
* skip the SCC-def inst itself, and
* stop the propagation on the other SCC-def inst after checking its
SCC-use first.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59362
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356258 91177308-0d34-0410-b5e6-96231b3b80d8
This has been a very painful missing feature that has made producing
reduced testcases difficult. In particular the various registers
determined for stack access during function lowering were necessary to
avoid undefined register errors in a large percentage of
cases. Implement a subset of the important fields that need to be
preserved for AMDGPU.
Most of the changes are to support targets parsing register fields and
properly reporting errors. The biggest sort-of bug remaining is for
fields that can be initialized from the IR section will be overwritten
by a default initialized machineFunctionInfo section. Another
remaining bug is the machineFunctionInfo section is still printed even
if empty.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356215 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
MsgPackDocument is the lighter-weight replacement for MsgPackTypes. This
commit switches AMDGPU HSA metadata processing to use MsgPackDocument
instead of MsgPackTypes.
Differential Revision: https://reviews.llvm.org/D57024
Change-Id: I0751668013abe8c87db01db1170831a76079b3a6
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356081 91177308-0d34-0410-b5e6-96231b3b80d8
This indicates an intrinsic parameter is required to be a constant,
and should not be replaced with a non-constant value.
Add the attribute to all AMDGPU and generic intrinsics that comments
indicate it should apply to. I scanned other target intrinsics, but I
don't see any obvious comments indicating which arguments are intended
to be only immediates.
This breaks one questionable testcase for the autoupgrade. I'm unclear
on whether the autoupgrade is supposed to really handle declarations
which were never valid. The verifier fails because the attributes now
refer to a parameter past the end of the argument list.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355981 91177308-0d34-0410-b5e6-96231b3b80d8