Commit Graph

292 Commits

Author SHA1 Message Date
Ron Lieberman
086b7dc10d [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos
The introduction of S_{ADD|SUB}_U64_PSEUDO instructions which are decomposed
into VOP3 instruction pairs for S_ADD_U64_PSEUDO:
  V_ADD_I32_e64
  V_ADDC_U32_e64
and for S_SUB_U64_PSEUDO
  V_SUB_I32_e64
  V_SUBB_U32_e64
preclude the use of SDWA to encode a constant.
SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions,
but not on VOP3 instructions.

We desire to fold the bit-and operand into the instruction encoding
for the V_ADD_I32 instruction. This requires that we transform the
VOP3 into a VOP2 form of the instruction (_e32).
  %19:vgpr_32 = V_AND_B32_e32 255,
      killed %16:vgpr_32, implicit $exec
  %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64
      %26.sub0:vreg_64, %19:vgpr_32, implicit $exec
 %48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64
      %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec

which then allows the SDWA encoding and becomes
  %47:vgpr_32 = V_ADD_I32_sdwa
      0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0,
      implicit-def $vcc, implicit $exec
  %48:vgpr_32 = V_ADDC_U32_e32
      0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec


Differential Revision: https://reviews.llvm.org/D54882


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348132 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-03 13:04:54 +00:00
Graham Sellers
84cbc95b54 [AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR
The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit.

Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it.

Differential: https://reviews.llvm.org/D55071


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348075 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-01 12:27:53 +00:00
Nicolai Haehnle
e3924b1c15 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348050 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-30 22:55:38 +00:00
Valery Pykhtin
d339265d52 [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)
Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses.

Differential revision: https://reviews.llvm.org/D53762

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347993 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-30 14:21:56 +00:00
David Stuttard
cb049f818d Revert r347871 "Fix: Add support for TFE/LWE in image intrinsic"
Also revert fix r347876

One of the buildbots was reporting a failure in some relevant tests that I can't
repro or explain at present, so reverting until I can isolate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347911 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 20:14:17 +00:00
Graham Sellers
19d2eebe6c [AMDGPU] Add and update scalar instructions
This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit.

A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly).

Differential: https://reviews.llvm.org/D54714

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347877 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 16:05:38 +00:00
David Stuttard
7fd87699a5 Add support for TFE/LWE in image intrinsics
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347871 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 15:21:13 +00:00
Francis Visoiu Mistrih
83895b33cb [CodeGen][NFC] Make TII::getMemOpBaseImmOfs return a base operand
Currently, instructions doing memory accesses through a base operand that is
not a register can not be analyzed using `TII::getMemOpBaseRegImmOfs`.

This means that functions such as `TII::shouldClusterMemOps` will bail
out on instructions using an FI as a base instead of a register.

The goal of this patch is to refactor all this to return a base
operand instead of a base register.

Then in a separate patch, I will add FI support to the mem op clustering
in the MachineScheduler.

Differential Revision: https://reviews.llvm.org/D54846

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347746 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-28 12:00:20 +00:00
Matt Arsenault
9926acb6bf AMDGPU: Record SGPR spills when restoring too
It's possible in some cases to have a restore present
without a corresponding spill. Due to an apparent bug
in D54366 <https://reviews.llvm.org/D54366>, only the
restore for a register was emitted. It's probably
always a bug for this to happen, but due to how SGPR
spilling is implemented, this makes the issues appear
worse than it is.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347595 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-26 21:28:40 +00:00
Matt Arsenault
a7df14b63e AMDGPU: Fix analyzeBranch failing with pseudoterminators
If a block had one of the _term instructions used for gluing
exec modifying instructions to the end of the block,
analyzeBranch would fail, preventing the verifier from catching
a broken successor list.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347027 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-16 05:03:02 +00:00
Stanislav Mekhanoshin
8153acf695 [AMDGPU] Always pass TRI into findRegister[Use/Def]OperandIdx
This only covers AMDGPU BE, hopefully all occurrences.

Differential Revision: https://reviews.llvm.org/D54235

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346528 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-09 17:58:59 +00:00
Nicolai Haehnle
69f971eb18 Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"
This reverts commit r344696 for now (except for some test additions).

See https://bugs.freedesktop.org/show_bug.cgi?id=108611.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346364 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 21:53:43 +00:00
Scott Linder
3c1f69b206 [AMDGPU] Remove FeatureVGPRSpilling
This feature is only relevant to shaders, and is no longer used. When disabled,
lowering of reserved registers for shaders causes a compiler crash.

Remove the feature and add a test for compilation of shaders at OptNone.

Differential Revision: https://reviews.llvm.org/D53829


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345763 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-31 18:54:06 +00:00
Matt Arsenault
4ae1eac0ec AMDGPU: Use scavengeRegisterBackwards
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@345559 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-30 01:33:14 +00:00
Nicolai Haehnle
cc436fd266 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344696 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-17 15:37:30 +00:00
Scott Linder
9099160b87 [AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions
Emit a waterfall loop in the general case for a potentially-divergent Rsrc
operand. When practical, avoid this by using Addr64 instructions.

Recommits r341413 with changes to update the MachineDominatorTree when present.

Differential Revision: https://reviews.llvm.org/D51742


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343992 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-08 18:47:01 +00:00
Alexander Timofeev
f33751e706 [AMDGPU] Preliminary patch for divergence driven instruction selection. Load offset inlining pattern changed.
Differential revision: https://reviews.llvm.org/D51975

    Reviewers: rampitec

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342115 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-13 06:34:56 +00:00
Alexander Timofeev
8f483e6fc2 [AMDGPU] Preliminary patch for divergence driven instruction selection. Immediate selection predicate changed
Differential revision: https://reviews.llvm.org/D51734
Reviewers: rampitec

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341928 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-11 11:56:50 +00:00
Alexander Timofeev
6df7a517ca [AMDGPU] Preliminary patch for divergence driven instruction selection. Inline immediate move to V_MADAK_F32.
Differential revision: https://reviews.llvm.org/D51586

    Reviewer: rampitec

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341843 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-10 16:42:49 +00:00
Alexander Timofeev
55e56ce884 [AMDGPU] Preliminary patch for divergence driven instruction selection. Fold immediate SMRD offset.
Differential revision: https://reviews.llvm.org/D51610

Reviewer: rampitec

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341636 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-07 09:05:34 +00:00
Scott Linder
63ca6b52d5 Revert r341413
Causes a regression in expensive checks.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341589 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-06 21:38:56 +00:00
Scott Linder
6da9a35885 [AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions
Emit a waterfall loop in the general case for a potentially-divergent Rsrc
operand. When practical, avoid this by using Addr64 instructions.

Differential Revision: https://reviews.llvm.org/D50982


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341413 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-04 21:50:47 +00:00
Matt Arsenault
7e212e4168 AMDGPU: Remove remnants of old address space mapping
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341165 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 05:49:54 +00:00
Nicolai Haehnle
1209b727bc AMDGPU: Fix getInstSizeInBytes
Summary:
Add some optional code to validate getInstSizeInBytes for emitted
instructions. This flushed out some issues which are fixed by this
patch:

- Streamline getInstSizeInBytes
- Properly define the VI readlane/writelane instruction as VOP3
- Fix the inline constant determination. Specifically, this change
  fixes an issue where a 32-bit value of 0xffffffff was recorded
  as unsigned. This is equal to -1 when restricting to a 32-bit
  comparison, and an inline constant can be used.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D50629

Change-Id: Id87c3b7975839da0de8156a124b0ce98c5fb47f2

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340903 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-29 07:46:09 +00:00
Matt Arsenault
ac00e8c95b AMDGPU: Shrink insts to fold immediates
This needs to be done in the SSA fold operands
pass to be effective, so there is a bit of overlap
with SIShrinkInstructions but I don't think this
is practically avoidable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340859 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-28 18:34:24 +00:00
Matt Arsenault
7b8d06c2e3 AMDGPU: Move canShrink into TII
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340855 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-28 18:22:34 +00:00
Tim Renouf
ee9a2bdfd5 [AMDGPU] Add support for multi-dword s.buffer.load intrinsic
Summary:
Patch by Marek Olsak and David Stuttard, both of AMD.

This adds a new amdgcn intrinsic supporting s.buffer.load, in particular
multiple dword variants. These are convenient to use from some front-end
implementations.

Also modified the existing llvm.SI.load.const intrinsic to common up the
underlying implementation.

This modification also requires that we can lower to non-uniform loads correctly
by splitting larger dword variants into sizes supported by the non-uniform
versions of the load.

V2: Addressed minor review comments.
V3: i1 glc is now i32 cachepolicy for consistency with buffer and
    tbuffer intrinsics, plus fixed formatting issue.
V4: Added glc test.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D51098

Change-Id: I83a6e00681158bb243591a94a51c7baa445f169b

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340684 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-25 14:53:17 +00:00
Marcello Maggioni
b0ab6f767f [PSV] Update API to be able to use TargetCustom without UB.
getTargetCustom() requires values for "Kind" in the constructor
that are not in the PSVKind enum. Passing a value that is not inside
an enum as an argument to a constructor of the type of the enum is
UB. Changing to the underlying type of the enum would solve the UB

Differential Revision: https://reviews.llvm.org/D50909

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340200 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-20 19:23:45 +00:00
Chandler Carruth
2a752bfdae [MI] Change the array of MachineMemOperand pointers to be
a generically extensible collection of extra info attached to
a `MachineInstr`.

The primary change here is cleaning up the APIs used for setting and
manipulating the `MachineMemOperand` pointer arrays so chat we can
change how they are allocated.

Then we introduce an extra info object that using the trailing object
pattern to attach some number of MMOs but also other extra info. The
design of this is specifically so that this extra info has a fixed
necessary cost (the header tracking what extra info is included) and
everything else can be tail allocated. This pattern works especially
well with a `BumpPtrAllocator` which we use here.

I've also added the basic scaffolding for putting interesting pointers
into this, namely pre- and post-instruction symbols. These aren't used
anywhere yet, they're just there to ensure I've actually gotten the data
structure types correct. I'll flesh out support for these in
a subsequent patch (MIR dumping, parsing, the works).

Finally, I've included an optimization where we store any single pointer
inline in the `MachineInstr` to avoid the allocation overhead. This is
expected to be the overwhelmingly most common case and so should avoid
any memory usage growth due to slightly less clever / dense allocation
when dealing with >1 MMO. This did require several ergonomic
improvements to the `PointerSumType` to reasonably support the various
usage models.

This also has a side effect of freeing up 8 bits within the
`MachineInstr` which could be repurposed for something else.

The suggested direction here came largely from Hal Finkel. I hope it was
worth it. ;] It does hopefully clear a path for subsequent extensions
w/o nearly as much leg work. Lots of thanks to Reid and Justin for
careful reviews and ideas about how to do all of this.

Differential Revision: https://reviews.llvm.org/D50701

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@339940 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-16 21:30:05 +00:00
Nicolai Haehnle
bfada8913e AMDGPU: Force skip over s_sendmsg and exp instructions
Summary:
These instructions interact with hardware blocks outside the shader core,
and they can have "scalar" side effects even when EXEC = 0. We don't
want these scalar side effects to occur when all lanes want to skip
these instructions, so always add the execz skip branch instruction
for basic blocks that contain them.

Also ensure that we skip scalar stores / atomics, though we don't
code-gen those yet.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48431

Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338235 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-30 09:23:59 +00:00
Matt Arsenault
e9c22aa83f AMDGPU: Fix code size for return_to_epilog pseudo
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338113 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-27 09:15:03 +00:00
Tom Stellard
1d6fd076a3 AMDGPU: Refactor Subtarget classes
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
  AMDGPUSubtarget::Generation.

Reviewers: arsenm, jvesely

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49037

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@336851 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-11 20:59:01 +00:00
Tom Stellard
cba2181e77 AMDGPU: Separate R600 and GCN TableGen files
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc.  This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself.  This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335942 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-28 23:47:12 +00:00
Stanislav Mekhanoshin
e5240df77a [AMDGPU] Construct memory clauses before RA
Memory clauses are formed into bundles in presence of xnack.
Their source operands are marked as early-clobber.

This allows to allocate distinct source and destination registers
within a clause and prevent breaking the clause with s_nop in the
hazard recognizer.

Clauses are undone before post-RA scheduler to allow some rescheduling,
which will not break the clause since artificial edges are created in
the dag to keep memory operations together. Yet this allows a better
ILP in some cases.

Differential Revision: https://reviews.llvm.org/D47511

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@333691 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-31 20:13:51 +00:00
Tom Stellard
f02d6fd47c AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers
Summary:
MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction
and register defintions, which are huge so we only want to include
them where needed.

This will also make it easier if we want to split the R600 and GCN
definitions into separate tablegenerated files.

I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h
because it uses some enums from the header to initialize default values
for the SIMachineFunction class, so I ended up having to remove includes of
SIMachineFunctionInfo.h from headers too.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46272

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332930 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-22 02:03:23 +00:00
Stanislav Mekhanoshin
7de8fd2b74 [AMDGPU] Added checks for dpp_ctrl value
- Report error for invalid dpp_ctrl values.
- Changed the way it is reported, now the error will be emitted into
  asm and will work with release build as well.
- Added dpp_ctrl value verifier for codegen.
- Added symbolic constants for dpp_ctrl.

Differential Revision: https://reviews.llvm.org/D46565

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331775 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-08 16:53:02 +00:00
Adrian Prantl
26b584c691 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331272 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-01 15:54:18 +00:00
Matt Arsenault
ac9b3ef76a AMDGPU: Add Vega12 and Vega20
Changes by
  Matt Arsenault
  Konstantin Zhuravlyov

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331215 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-30 19:08:16 +00:00
Stanislav Mekhanoshin
dbe20f7ece [AMDGPU] Truncate packed inline constant
If a packed inline constant is sign extended it must be truncated
after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented
as 0xFFFFFFFFBC000000 in the IR because the immediate is sign extended
to 64 bit. After the value shifted right by 16 to use it in a low part
with op_sel_hi it becomes 0xFFFFFFFFBC00 and does not qualify as inline
constant any longer.

Fixed the error and added verification code. Without the fix and with
the verification bug is causing pk_max_f16_literal.ll to fail.

Differential Revision: https://reviews.llvm.org/D45987

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330752 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-24 18:17:55 +00:00
Matt Arsenault
8c0fb2c473 AMDGPU: Move a flawed assert when spilling SGPRs
It's possible to validly spill the frame offset register
in a call sequence to a VGPR. There are definitely issues
with SGPR spilling to memory, so move the assert later.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330612 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-23 16:13:30 +00:00
Matt Arsenault
7c20cc0669 AMDGPU: Assign enum name to stack ID
Also assert that it is correct for SGPRs. There is currently a bug
where stack slot coloring replaces SGPR spill FIs with one with
the default ID, which results in a more confusing assert later
about a dead object.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330607 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-23 15:51:26 +00:00
Nicolai Haehnle
9820647547 AMDGPU: Legalize the operand of SI_INIT_M0
Summary:
This fixes a case where the argument to a sendmsg intrinsic
ends up in a VGPR, for whatever reason.

The underlying performance issue is that a multiplication that
can be an s_mul_i32 is instead needlessly generated as
v_mul_u32_u24, but this is not addressed by this patch.

Change-Id: I61fd4034314d5acdf6074632c30b65364dfa7328

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45826

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330393 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-20 07:14:25 +00:00
Stanislav Mekhanoshin
8aac337938 [AMDGPU] Use packed literals with zero either lower or hi part
Differential Revision: https://reviews.llvm.org/D45790

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330365 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-19 21:16:50 +00:00
David Blaikie
9d9a46a465 Fix layering of MachineValueType.h by moving it from CodeGen to Support
This is used by llvm tblgen as well as by LLVM Targets, so the only
common place is Support for now. (maybe we need another target for these
sorts of things - but for now I'm at least making them correct & we can
make them better if/when people have strong feelings)

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328395 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-23 23:58:25 +00:00
Matt Arsenault
fe57640983 AMDGPU: Don't leave dead illegal VGPR->SGPR copies
Normally DCE kills these, but at -O0 these get left behind
leaving suspicious looking illegal copies.

Replace with IMPLICIT_DEF to avoid iterator issues.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@327842 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-19 14:07:15 +00:00
Tim Renouf
e24854a562 [AMDGPU] added writelane intrinsic
Summary:
For use by LLPC SPV_AMD_shader_ballot extension.

The v_writelane instruction was already implemented for use by SGPR
spilling, but I had to add an extra dummy operand tied to the
destination, to represent that all lanes except the selected one keep
the old value of the destination register.

.ll test changes were due to schedule changes caused by that new
operand.

Differential Revision: https://reviews.llvm.org/D42838

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326353 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-28 19:10:32 +00:00
Marek Olsak
51588a0550 AMDGPU: Fix S_BUFFER_LOAD_DWORD_SGPR moveToVALU
Author: Bas Nieuwenhuizen

https://reviews.llvm.org/D42881

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@324353 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-06 15:17:55 +00:00
Marek Olsak
e16a4c10b6 AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9
Summary:
This enables load merging into x2, x4, which is driven by inline offsets.

6500 shaders are affected:
Code Size in affected shaders: -15.14 %

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D42078

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@323909 91177308-0d34-0410-b5e6-96231b3b80d8
2018-01-31 20:18:11 +00:00
Matthias Braun
d318139827 MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@320884 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-15 22:22:58 +00:00
Sam Kolton
cf56d5d898 [AMDGPU] SDWA: add support for PRESERVE into SDWA peephole.
Summary:

Reviewers: arsenm, vpykhtin, rampitec

Subscribers: kzhuravl, wdng, nhaehnle, mgorny, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D37817

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319662 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-04 16:22:32 +00:00