Commit Graph

360 Commits

Author SHA1 Message Date
Nicolai Haehnle
90f5eff5ac AMDGPU: Write LDS objects out as global symbols in code generation
Summary:
The symbols use the processor-specific SHN_AMDGPU_LDS section index
introduced with a previous change. The linker is then expected to resolve
relocations, which are also emitted.

Initially disabled for HSA and PAL environments until they have caught up
in terms of linker and runtime loader.

Some notes:

- The llvm.amdgcn.groupstaticsize intrinsics can no longer be lowered
  to a constant at compile times, which means some tests can no longer
  be applied.

  The current "solution" is a terrible hack, but the intrinsic isn't
  used by Mesa, so we can keep it for now.

- We no longer know the full LDS size per kernel at compile time, which
  means that we can no longer generate a relevant error message at
  compile time. It would be possible to add a check for the size of
  individual variables, but ultimately the linker will have to perform
  the final check.

Change-Id: If66dbf33fccfbf3609aefefa2558ac0850d42275

Reviewers: arsenm, rampitec, t-tye, b-sumner, jsjodin

Subscribers: qcolombet, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61494

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364297 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-25 11:52:30 +00:00
Matt Arsenault
d544680fb3 AMDGPU: Don't clobber VCC in MUBUF addr64 emulation
Introducing VCC defs during SIFixSGPRCopies is generally
problematic. Avoid it by starting with the VOP3 form with the general
condition register. This is the easiest to fix instance, but doesn't
solve any specific problems I'm looking at.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363904 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-20 00:51:28 +00:00
Matt Arsenault
d5a79b9727 AMDGPU: Consolidate some getGeneration checks
This is incomplete, and ideally these would all be removed, but it's
better to localize them to the subtarget first with comments about
what they're for.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363902 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-19 23:54:58 +00:00
Matt Arsenault
bff29f6333 AMDGPU: Fix folding immediate into readfirstlane through reg_sequence
The def instruction for the vreg may not match, because it may be
folding through a reg_sequence. The assert was overly conservative and
not necessary. It's not actually important if DefMI really defined the
register, because the fold that will be done cares about the def of
the value that will be folded.

For some reason copies aren't making it through the reg_sequence,
although they should.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363876 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-19 20:44:15 +00:00
Matt Arsenault
e1eedb6602 Reapply "AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics"
This reapplies r363678, using the correct chain for the CopyToReg for
v0. glueCopyToM0 counterintuitively changes the operands of the
original node.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363870 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-19 19:55:27 +00:00
Simon Pilgrim
1ad9529ddc Revert rL363678 : AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics
There may or may not be additional work to handle this correctly on
SI/CI.
........
Breaks EXPENSIVE_CHECKS buildbots - http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/78/

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363797 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-19 13:00:54 +00:00
Matt Arsenault
6a59b73682 AMDGPU: Add ds_gws_init / ds_gws_barrier intrinsics
There may or may not be additional work to handle this correctly on
SI/CI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363678 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-18 13:19:57 +00:00
Matt Arsenault
9f1f314a53 AMDGPU: Change API for checking for exec modification
Invert the name and return value to better reflect the imprecise
nature.

Force passing in the DefMI, since it's known in the 2 users and could
possibly fail for an arbitrary vreg.

Allow specifying a specific user instruction. Scan through use
instructions, instead of use operands. Add scan thresholds instead of
searching infinitely.

Stop using a set to track seen uses. I didn't understand this usage,
or why it would not check the last use. I don't think the use list has
any particular order.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363675 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-18 12:48:36 +00:00
Sander de Smalen
f4bff34d4d Describe stack-id as an enum
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.

This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.

Reviewers: arsenm, thegameg, qcolombet

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D60137


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363533 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-17 09:13:29 +00:00
Nicolai Haehnle
6435d005d6 AMDGPU: Prepare for explicit absolute relocations in code generation
Summary:
We will use absolute relocations for LDS symbols.

Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61492

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363517 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-16 17:43:37 +00:00
Nicolai Haehnle
58b383765e AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0
Summary:
Instead of encoding a high-word of 0 using a fake TargetGlobalAddress,
just use a literal target constant. This simplifies some subsequent changes.

The generated assembly is now more explicit about the kind of relocation
that is to be used.

Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61491

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363516 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-16 17:32:01 +00:00
Stanislav Mekhanoshin
c6fce1250e [AMDGPU] gfx10 conditional registers handling
This is cpp source part of wave32 support, excluding overriden
getRegClass().

Differential Revision: https://reviews.llvm.org/D63351

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363513 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-16 17:13:09 +00:00
Matt Arsenault
f079a4a7c8 AMDGPU: Fix missing const
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363383 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-14 13:26:23 +00:00
Stanislav Mekhanoshin
57160edc26 [AMDGPU] gfx1010 dpp16 and dpp8
Differential Revision: https://reviews.llvm.org/D63203

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363186 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-12 18:02:41 +00:00
Stanislav Mekhanoshin
7a1388e6ef [AMDGPU] gfx1010 premlane instructions
Differential Revision: https://reviews.llvm.org/D63202

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363185 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-12 17:52:51 +00:00
Matt Arsenault
6944e4abc8 AMDGPU: Force skips around traps
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362852 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-07 23:02:52 +00:00
Matt Arsenault
45a0798df6 AMDGPU: Insert skip branches over return blocks
SIInsertSkips really doesn't understand the control flow, and makes
very stupid assumptions about the block layout. This was able to get
away with not skipping return blocks, since usually after
structurization there is only one placed at the end of the
function. Tail duplication can break this assumption.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362754 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-06 22:51:51 +00:00
Alexander Timofeev
4dcfa85e5d [AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fdd
"Divergence driven ISel. Assign register class for cross block values
       according to the divergence."
       that discovered the design flaw leading to several issues that
       required to be solved before.

       This change reverts AMDGPU specific changes and keeps common part
       unaffected.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362749 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-06 21:13:02 +00:00
Matt Arsenault
63ff74f3d0 AMDGPU: Invert frame index offset interpretation
Since the beginning, the offset of a frame index has been consistently
interpreted backwards. It was treating it as an offset from the
scratch wave offset register as a frame register. The correct
interpretation is the offset from the SP on entry to the function,
before the prolog. Frame index elimination then should select either
SP or another register as an FP.

Treat the scratch wave offset on kernel entry as the pre-incremented
SP. Rely more heavily on the standard hasFP and frame pointer
elimination logic, and clean up the private reservation code. This
saves a copy in most callee functions.

The kernel prolog emission code is still kind of a mess relying on
checking the uses of physical registers, which I would prefer to
eliminate.

Currently selection directly emits MUBUF instructions, which require
using a reference to some register. Use the register chosen for SP,
and then ignore this later. This should probably be cleaned up to use
pseudos that don't refer to any specific base register until frame
index elimination.

Add a workaround for shaders using large numbers of SGPRs. I'm not
sure these cases were ever working correctly, since as far as I can
tell the logic for figuring out which SGPR is the scratch wave offset
doesn't match up with the shader input initialization in the shader
programming guide.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362661 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-05 22:20:47 +00:00
Matt Arsenault
3fdd033bfd AMDGPU: Fix using 2 different enums for same operand flags
These enums are really for the same namespace of flags set on
arbitrary MachineOperands, so merge them to avoid value collisions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362640 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-05 20:32:25 +00:00
Dmitry Preobrazhensky
541ca56bcf [AMDGPU][MC] Added support of SCC, VCCZ and EXECZ operands
See bug 39292: https://bugs.llvm.org/show_bug.cgi?id=39292

Reviewers: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D62660

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362400 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-03 13:51:24 +00:00
Alexander Timofeev
d224ecc383 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
             the correct register classes to the cross block values beforehand. For the divergent targets
             same value type requires different register classes dependent on the value divergence.

    Reviewers: rampitec, nhaehnle

    Differential Revision: https://reviews.llvm.org/D59990

    This commit was reverted because of the build failure.
    The reason was mlformed patch.
    Build failure fixed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@361741 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-26 20:33:26 +00:00
Peter Collingbourne
d7a83f9517 Revert r361644, "[AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence."
Broke sanitizer bots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/21694/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32478/steps/check-llvm%20asan/logs/stdio

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@361688 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-25 01:52:38 +00:00
Alexander Timofeev
6a29119c95 [AMDGPU] Divergence driven ISel. Assign register class for cross block values according to the divergence.
Details: To make instruction selection really divergence driven it is necessary to assign
         the correct register classes to the cross block values beforehand. For the divergent targets
         same value type requires different register classes dependent on the value divergence.

Reviewers: rampitec, nhaehnle

Differential Revision: https://reviews.llvm.org/D59990

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@361644 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-24 15:32:18 +00:00
Matt Arsenault
0b58446fa3 MC: Allow getMaxInstLength to depend on the subtarget
Keep it optional in cases this is ever needed in some global
context. Currently it's only used for getting an upper bound inline
asm code size.

For AMDGPU, gfx10 increases the maximum instruction size to
20-bytes. This avoids penalizing older subtargets when estimating code
size, and making some annoying branch relaxation test adjustments.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@361405 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-22 16:28:41 +00:00
Matt Arsenault
662b7589c0 AMDGPU: Assume calls read exec
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@361333 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-21 23:23:16 +00:00
Matt Arsenault
790ef25542 AMDGPU: Force skip branches over calls
Unfortunately the way SIInsertSkips works is backwards, and is
required for correctness. r338235 added handling of some special cases
where skipping is mandatory to avoid side effects if no lanes are
active. It conservatively handled asm correctly, but the same logic
needs to apply to calls.

Usually the call sequence code is larger than the skip threshold,
although the way the count is computed is really broken, so I'm not
sure if anything was likely to really hit this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@361202 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-20 22:04:42 +00:00
Stanislav Mekhanoshin
c6c19d0e3f [AMDGPU] Fixed handling of imemdiate i1 literals
This bug was exposed by the rL360395.

Differential Revision: https://reviews.llvm.org/D61812

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360689 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-14 16:18:00 +00:00
Nicolai Haehnle
c65d9a8f49 AMDGPU: Verify that SOP2/SOPC instructions have at most one immediate operand
Summary:
No test case because I don't know of a way to trigger this, but I
accidentally caused this to fail while working on a different change.

Change-Id: I8015aa447fe27163cc4e4902205a203bd44bf7e3

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61490

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360123 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-07 09:19:09 +00:00
Stanislav Mekhanoshin
99c7e3c032 [AMDGPU] gfx1010 verifier changes
Differential Revision: https://reviews.llvm.org/D61521

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360095 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-06 22:49:45 +00:00
Stanislav Mekhanoshin
116060bd7d [AMDGPU] gfx1010: prefer V_MUL_LO_U32 over V_MUL_LO_I32
GFX10 deprecates v_mul_lo_i32 instruction, so choose u32 form for
all targets.

Differential Revision: https://reviews.llvm.org/D61525

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360094 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-06 22:27:05 +00:00
Stanislav Mekhanoshin
7f5f318431 [AMDGPU] gfx1010: use fmac instructions
Differential Revision: https://reviews.llvm.org/D61527

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359959 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-04 04:20:37 +00:00
Stanislav Mekhanoshin
ffc5401cfb [AMDGPU] gfx1010 allows VOP3 to have a literal
Differential Revision: https://reviews.llvm.org/D61413

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359756 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-02 04:01:39 +00:00
Stanislav Mekhanoshin
2090ec980a [AMDGPU] gfx1010 constant bus limit
Constant bus limit has increased to 2 with GFX10.

Differential Revision: https://reviews.llvm.org/D61404

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359754 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-02 03:47:23 +00:00
Stanislav Mekhanoshin
542de76c15 [AMDGPU] gfx1010 MIMG implementation
Differential Revision: https://reviews.llvm.org/D61339

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359698 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-01 16:32:58 +00:00
Stanislav Mekhanoshin
0b378026ac [AMDGPU] gfx1010 VMEM and SMEM implementation
Differential Revision: https://reviews.llvm.org/D61330

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359621 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-30 22:08:23 +00:00
Mark Searles
dfc7fb5622 Revert "AMDGPU: Split block for si_end_cf"
This reverts commit 7a6ef30046.

We discovered some internal test failures, so reverting for now.

Differential Revision: https://reviews.llvm.org/D61213

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359363 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-27 00:51:18 +00:00
Stanislav Mekhanoshin
09f8a0f6a0 [AMDGPU] gfx1010 VOP3 and VOP3P implementation
Differential Revision: https://reviews.llvm.org/D61202

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359328 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-26 17:56:03 +00:00
Stanislav Mekhanoshin
834873d34d [AMDGPU] gfx1010 VOP2 changes
Differential Revision: https://reviews.llvm.org/D61156

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359316 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-26 16:37:51 +00:00
Stanislav Mekhanoshin
f43d543c45 [AMDGPU] Add gfx1010 target definitions
Differential Revision: https://reviews.llvm.org/D61041

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359113 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-24 17:03:15 +00:00
Bjorn Pettersson
bba2202bb1 [CodeGen] Add "const" to MachineInstr::mayAlias
Summary:
The basic idea here is to make it possible to use
MachineInstr::mayAlias also when the MachineInstr
is const (or the "Other" MachineInstr is const).

The addition of const in MachineInstr::mayAlias
then rippled down to the need for adding const
in several other places, such as
TargetTransformInfo::getMemOperandWithOffset.

Reviewers: hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, MatzeB, arsenm, jvesely, nhaehnle, hiraditya, javed.absar, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60856

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@358744 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-19 09:08:38 +00:00
Matt Arsenault
7a6ef30046 AMDGPU: Split block for si_end_cf
Relying on no spill or other code being inserted before this was
precarious. It relied on code diligently checking isBasicBlockPrologue
which is likely to be forgotten.

Ideally this could be done earlier, but this doesn't work because of
phis. Any other instruction can't be placed before them, so we have to
accept the position being incorrect during SSA.

This avoids regressions in the fast register allocator rewrite from
inverting the direction.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357634 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-03 20:53:20 +00:00
Neil Henning
fcc236c268 [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.
This change incorporates an effort by Connor Abbot to change how we deal
with WWM operations potentially trashing valid values in inactive lanes.

Previously, the SIFixWWMLiveness pass would work out which registers
were being trashed within WWM regions, and ensure that the register
allocator did not have any values it was depending on resident in those
registers if the WWM section would trash them. This worked perfectly
well, but would cause sometimes severe register pressure when the WWM
section resided before divergent control flow (or at least that is where
I mostly observed it).

This fix instead runs through the WWM sections and pre allocates some
registers for WWM. It then reserves these registers so that the register
allocator cannot use them. This results in a significant register
saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just
this change!).

Differential Revision: https://reviews.llvm.org/D59295

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357400 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-01 15:19:52 +00:00
Matt Arsenault
a88dcdbff2 AMDGPU: Make exec mask optimzations more resistant to block splits
Also improve the check for SALU instructions to also ignore
implicit_def and other fake instructions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357170 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-28 14:01:39 +00:00
Matt Arsenault
68048d45d3 AMDGPU: Don't hardcode num defs for MUBUF instructions
This shouldn't change anything since the no-ret atomics are selected
later.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357084 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-27 16:12:29 +00:00
Matt Arsenault
f46cbbae71 AMDGPU: Fix areLoadsFromSameBasePtr for DS atomics
The offset operand index is different for atomics.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357073 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-27 15:41:00 +00:00
Tim Renouf
7684aab92a [AMDGPU] Added v5i32 and v5f32 register classes
They are not used by anything yet, but a subsequent commit will start
using them for image ops that return 5 dwords.

Differential Revision: https://reviews.llvm.org/D58903

Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356735 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-22 10:11:21 +00:00
Tim Renouf
a047778b62 [AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.

SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58902

Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356659 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-21 12:01:21 +00:00
Michael Liao
671c6db195 [AMDGPU] Enable code selection using s_mul_hi_u32/s_mul_hi_i32.
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59501

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356405 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-18 20:40:09 +00:00
Tim Renouf
a90929573c [AMDGPU] Asm/disasm clamp modifier on vop3 int arithmetic
Allow the clamp modifier on vop3 int arithmetic instructions in assembly
and disassembly.

This involved adding a clamp operand to the affected instructions in MIR
and MC, and thus having to fix up several places in codegen and MIR
tests.

Differential Revision: https://reviews.llvm.org/D59267

Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356399 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-18 19:35:44 +00:00