131 Commits

Author SHA1 Message Date
Piotr Sobczak
8db4d72fa5 [AMDGPU] Extend buffer intrinsics with swizzling
Summary:
Extend cachepolicy operand in the new VMEM buffer intrinsics
to supply information whether the buffer data is swizzled.
Also, propagate this information to MIR.

Intrinsics updated:
int_amdgcn_raw_buffer_load
int_amdgcn_raw_buffer_load_format
int_amdgcn_raw_buffer_store
int_amdgcn_raw_buffer_store_format
int_amdgcn_raw_tbuffer_load
int_amdgcn_raw_tbuffer_store
int_amdgcn_struct_buffer_load
int_amdgcn_struct_buffer_load_format
int_amdgcn_struct_buffer_store
int_amdgcn_struct_buffer_store_format
int_amdgcn_struct_tbuffer_load
int_amdgcn_struct_tbuffer_store

Furthermore, disable merging of VMEM buffer instructions
in SI Load/Store optimizer, if the "swizzled" bit on the instruction
is on.

The default value of the bit is 0, meaning that data in buffer
is linear and buffer instructions can be merged.

There is no difference in the generated code with this commit.
However, in the future it will be expected that front-ends
use buffer intrinsics with correct "swizzled" bit set.

Reviewers: arsenm, nhaehnle, tpr

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68200

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@373491 91177308-0d34-0410-b5e6-96231b3b80d8
2019-10-02 17:22:36 +00:00
Jay Foad
35b4599076 Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0"
Summary:
D61491 caused us to use relocs when they're not strictly necessary, to
refer to symbols in the text section. This is a pessimization and it's a
problem for some loaders that don't support relocs yet.

Reviewers: nhaehnle, arsenm, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65813

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@370667 91177308-0d34-0410-b5e6-96231b3b80d8
2019-09-02 14:40:57 +00:00
Matt Arsenault
295d2ad713 AMDGPU: Disambiguate v3f16 format in load/store tables
Currently the searchable tables report the number of dwords. These
round to the same number for 3 and 4 component d16
instructions. Change this to report the number of elements so this
isn't ambiguous.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@369202 91177308-0d34-0410-b5e6-96231b3b80d8
2019-08-18 00:20:43 +00:00
Stanislav Mekhanoshin
ac72201f67 [AMDGPU] Fixed occupancy calculation for gfx10
Differential Revision: https://reviews.llvm.org/D65010

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366616 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-19 21:29:51 +00:00
Dmitry Preobrazhensky
e201b95d1a [AMDGPU][MC][GFX9][GFX10] Added support of GET_DOORBELL message
Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D64729

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366071 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-15 15:12:16 +00:00
Stanislav Mekhanoshin
bc8cb016c6 [AMDGPU] gfx908 mAI instructions, MC part
Differential Revision: https://reviews.llvm.org/D64446

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365563 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-09 21:43:09 +00:00
Dmitry Preobrazhensky
71895e64c4 [AMDGPU][MC] Corrected parsing of FLAT offset modifier
Summary of changes:

- simplified handling of FLAT offset: offset_s13 and offset_u12 have been replaced with flat_offset;
- provided information about error position for pre-gfx9 targets;
- improved errors handling.

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D64244

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365321 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-08 14:27:37 +00:00
Dmitry Preobrazhensky
539c69ef52 [AMDGPU][MC] Fix 2 for sanitizer failure in 364645
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364656 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 16:28:46 +00:00
Dmitry Preobrazhensky
3c357a49a3 [AMDGPU][MC] Enabled constant expressions as operands of sendmsg
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D62735

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@364645 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-28 14:14:02 +00:00
Stanislav Mekhanoshin
c33d0e6650 [AMDGPU] gfx1010 wave32 metadata
Differential Revision: https://reviews.llvm.org/D63207

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363577 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-17 16:48:56 +00:00
Stanislav Mekhanoshin
5c3ee716e4 [AMDGPU] gfx1010 base changes for wave32
Differential Revision: https://reviews.llvm.org/D63293

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363299 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-13 19:18:29 +00:00
Dmitry Preobrazhensky
e4a2667a65 [AMDGPU][MC] Enabled constant expressions as operands of s_getreg/s_setreg
See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820

Reviewers: artem.tamazov, arsenm

Differential Revision: https://reviews.llvm.org/D61125

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363255 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-13 12:46:37 +00:00
Piotr Sobczak
944600b17a [AMDGPU] Optimize image_[load|store]_mip
Summary:
Replace image_load_mip/image_store_mip
with image_load/image_store if lod is 0.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63073

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@362957 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-10 15:58:51 +00:00
Stanislav Mekhanoshin
6ae9485eb1 [AMDGPU] gfx1010 utility functions
Differential Revision: https://reviews.llvm.org/D61094

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359224 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-25 18:53:41 +00:00
Stanislav Mekhanoshin
f43d543c45 [AMDGPU] Add gfx1010 target definitions
Differential Revision: https://reviews.llvm.org/D61041

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@359113 91177308-0d34-0410-b5e6-96231b3b80d8
2019-04-24 17:03:15 +00:00
Matt Arsenault
234b3a117e AMDGPU: Remove dx10-clamp from subtarget features
Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdgpu-dx10-clamp attribute to override this if desired.

Also introduce a new amdgpu-ieee attribute to match.

The values need to match to allow inlining. I think it is OK for the
caller's dx10-clamp attribute to override the callee, but there
doesn't appear to be the infrastructure to do this currently without
definining the attribute in the generic Attributes.td.

Eventually the calling convention lowering will need to insert a mode
switch somewhere for these.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@357302 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-29 19:14:54 +00:00
Tim Renouf
7684aab92a [AMDGPU] Added v5i32 and v5f32 register classes
They are not used by anything yet, but a subsequent commit will start
using them for image ops that return 5 dwords.

Differential Revision: https://reviews.llvm.org/D58903

Change-Id: I63e1904081e39a6d66e4eb96d51df25ad399d271

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356735 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-22 10:11:21 +00:00
Tim Renouf
a047778b62 [AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.

SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58902

Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356659 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-21 12:01:21 +00:00
Dmitry Preobrazhensky
203bf432d6 [AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, v_readlane_b32 and v_writelane_b32
See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662

Reviewers: artem.tamazov, arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D58713

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@355312 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-04 12:48:32 +00:00
Matt Arsenault
163f2c349f AMDGPU: Remove GCN features and predicates
These are no longer necessary since the R600 tablegen files are split
out now.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@353548 91177308-0d34-0410-b5e6-96231b3b80d8
2019-02-08 19:18:01 +00:00
Chandler Carruth
6b547686c5 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@351636 91177308-0d34-0410-b5e6-96231b3b80d8
2019-01-19 08:50:56 +00:00
Neil Henning
4778d2ba93 [AMDGPU] Extend the SI Load/Store optimizer to combine more things.
I've extended the load/store optimizer to be able to produce dwordx3
loads and stores, This change allows many more load/stores to be combined,
and results in much more optimal code for our hardware.

Differential Revision: https://reviews.llvm.org/D54042

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348937 91177308-0d34-0410-b5e6-96231b3b80d8
2018-12-12 16:15:21 +00:00
Nicolai Haehnle
e3924b1c15 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@348050 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-30 22:55:38 +00:00
Nicolai Haehnle
f2ec2633c5 AMDGPU/InsertWaitcnts: Untangle some semi-global state
Summary:
Reduce the statefulness of the algorithm in two ways:

1. More clearly split generateWaitcntInstBefore into two phases: the
   first one which determines the required wait, if any, without changing
   the ScoreBrackets, and the second one which actually inserts the wait
   and updates the brackets.

2. Communicate pre-existing s_waitcnt instructions using an argument to
   generateWaitcntInstBefore instead of through the ScoreBrackets.

To simplify these changes, a Waitcnt structure is introduced which carries
the counts of an s_waitcnt instruction in decoded form.

There are some functional changes:

1. The FIXME for the VCCZ bug workaround was implemented: we only wait for
   SMEM instructions as required instead of waiting on all counters.

2. We now properly track pre-existing waitcnt's in all cases, which leads
   to less conservative waitcnts being emitted in some cases.

     s_load_dword ...
     s_waitcnt lgkmcnt(0)    <-- pre-existing wait count
     ds_read_b32 v0, ...
     ds_read_b32 v1, ...
     s_waitcnt lgkmcnt(0)    <-- this is too conservative
     use(v0)
     more code
     use(v1)

   This increases code size a bit, but the reduced latency should still be a
   win in basically all cases. The worst code size regressions in my shader-db
   are:

 WORST REGRESSIONS - Code Size
 Before After     Delta Percentage
   1724  1736        12    0.70 %   shaders/private/f1-2015/1334.shader_test [0]
   2276  2284         8    0.35 %   shaders/private/f1-2015/1306.shader_test [0]
   4632  4640         8    0.17 %   shaders/private/ue4_elemental/62.shader_test [0]
   2376  2384         8    0.34 %   shaders/private/f1-2015/1308.shader_test [0]
   3284  3292         8    0.24 %   shaders/private/talos_principle/1955.shader_test [0]

Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54226

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347848 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-29 11:06:06 +00:00
Ron Lieberman
322a8075ef [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST
Add a pass to fixup various vector ISel issues.
Currently we handle converting GLOBAL_{LOAD|STORE}_*
and GLOBAL_Atomic_* instructions into their _SADDR variants.
This involves feeding the sreg into the saddr field of the new instruction.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347008 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-16 01:13:34 +00:00
Konstantin Zhuravlyov
92d120add2 AMDHSA: More code object v3 fixes:
- Make sure IsaInfo::hasCodeObjectV3 returns true only
    for AMDHSA
  - Update assembler metadata tests to use v2 by default


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@347001 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-15 23:14:23 +00:00
Nicolai Haehnle
69f971eb18 Revert "AMDGPU: Divergence-driven selection of scalar buffer load intrinsics"
This reverts commit r344696 for now (except for some test additions).

See https://bugs.freedesktop.org/show_bug.cgi?id=108611.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346364 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-07 21:53:43 +00:00
Konstantin Zhuravlyov
7829a6dfd5 AMDGPU: Add sram-ecc feature
Differential Revision: https://reviews.llvm.org/D53222


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@346177 91177308-0d34-0410-b5e6-96231b3b80d8
2018-11-05 22:44:19 +00:00
Nicolai Haehnle
cc436fd266 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@344696 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-17 15:37:30 +00:00
Konstantin Zhuravlyov
4d82ce5c27 AMDGPU: Re-apply r341982 after fixing the layering issue
Move isa version determination into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342069 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-12 18:50:47 +00:00
Ilya Biryukov
867f48781f Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser."
This reverts commit r341982.

The change introduced a layering violation. Reverting to unbreak
our integrate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342023 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-12 07:05:30 +00:00
Konstantin Zhuravlyov
b479681381 AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination
into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).

Differential Revision: https://reviews.llvm.org/D51890



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341982 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-11 18:56:51 +00:00
Matt Arsenault
7e212e4168 AMDGPU: Remove remnants of old address space mapping
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341165 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 05:49:54 +00:00
Tim Renouf
a53b9eb46e [AMDGPU] New buffer intrinsics
Summary:
This commit adds new intrinsics
  llvm.amdgcn.raw.buffer.load
  llvm.amdgcn.raw.buffer.load.format
  llvm.amdgcn.raw.buffer.load.format.d16
  llvm.amdgcn.struct.buffer.load
  llvm.amdgcn.struct.buffer.load.format
  llvm.amdgcn.struct.buffer.load.format.d16
  llvm.amdgcn.raw.buffer.store
  llvm.amdgcn.raw.buffer.store.format
  llvm.amdgcn.raw.buffer.store.format.d16
  llvm.amdgcn.struct.buffer.store
  llvm.amdgcn.struct.buffer.store.format
  llvm.amdgcn.struct.buffer.store.format.d16
  llvm.amdgcn.raw.buffer.atomic.*
  llvm.amdgcn.struct.buffer.atomic.*

with the following changes from the llvm.amdgcn.buffer.*
intrinsics:

* there are separate raw and struct versions: raw does not have an
  index arg and sets idxen=0 in the instruction, and struct always sets
  idxen=1 in the instruction even if the index is 0, to allow for the
  fact that gfx9 does bounds checking differently depending on whether
  idxen is set;

* there is a combined cachepolicy arg (glc+slc)

* there are now only two offset args: one for the offset that is
  included in bounds checking and swizzling, to be split between the
  instruction's voffset and immoffset fields, and one for the offset
  that is excluded from bounds checking and swizzling, to go into the
  instruction's soffset field.

The AMDISD::BUFFER_* SD nodes always have an index operand, all three
offset operands, combined cachepolicy operand, and an extra idxen
operand.

The obsolescent llvm.amdgcn.buffer.* intrinsics continue to work.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D50306

Change-Id: If897ea7dc34fcbf4d5496e98cc99a934f62fc205

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@340269 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-21 11:07:10 +00:00
Ryan Taylor
a9d3893acc [AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero
Summary:
Add _L to _LZ image intrinsic table mapping to table gen.
In ISelLowering check if image intrinsic has lod and if it's equal
to zero, if so remove lod and change opcode to equivalent mapped _LZ.

Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71

Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49483

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338523 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-01 12:12:01 +00:00
Tom Stellard
cba2181e77 AMDGPU: Separate R600 and GCN TableGen files
Summary:
We now have two sets of generated TableGen files, one for R600 and one
for GCN, so each sub-target now has its own tables of instructions,
registers, ISel patterns, etc.  This should help reduce compile time
since each sub-target now only has to consider information that
is specific to itself.  This will also help prevent the R600
sub-target from slowing down new features for GCN, like disassembler
support, GlobalISel, etc.

Reviewers: arsenm, nhaehnle, jvesely

Reviewed By: arsenm

Subscribers: MatzeB, kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46365

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335942 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-28 23:47:12 +00:00
Scott Linder
43cbf8d92e [AMDGPU] Update assembler for HSA Code Object v3
Update AMDGPU assembler syntax behind the code-object-v3 feature:

* Replace/rename most AMDGPU assembler directives/symbols and document them.
* Provide more diagnostics (e.g. values out of range, missing values, repeated
  values).
* Provide path for backwards compatibility, even with underlying descriptor
  changes.

Differential Revision: https://reviews.llvm.org/D47736



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335281 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 19:38:56 +00:00
Nicolai Haehnle
d42a61b0d4 AMDGPU: Select MIMG instructions manually in SITargetLowering
Summary:
Having TableGen patterns for image intrinsics is hitting limitations:
for D16 we already have to manually pre-lower the packing of data
values, and we will have to do the same for A16 eventually.

Since there is already some custom C++ code anyway, it is arguably easier
to just do everything in C++, now that we can use the beefed-up generic
tables backend of TableGen to provide all the required metadata and map
intrinsics to corresponding opcodes. With this approach, all image
intrinsic lowering happens in SITargetLowering::lowerImage. That code is
dense due to all the cases that it handles, but it should still be easier
to follow than what we had before, by virtue of it all being done in a
single location, and by virtue of not relying on the TableGen pattern
magic that very few people really understand.

This means that we will have MachineSDNodes with MIMG instructions
during DAG combining, but that seems alright: previously we had
intrinsic nodes instead, but those are similarly opaque to the generic
CodeGen infrastructure, and the final pattern matching just did a 1:1
translation to machine instructions anyway. If anything, the fact that
we now merge the address words into a vector before DAG combine should
be an advantage.

Change-Id: I417f26bd88f54ce9781c1668acc01f3f99774de6

Reviewers: arsenm, rampitec, rtaylor, tstellar

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48017

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335228 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 13:36:57 +00:00
Nicolai Haehnle
f6113ec066 AMDGPU: Refactor MIMG instruction TableGen using generic tables
Summary:
This allows us to access rich information about MIMG opcodes from C++ code.
Simplifying the mapping between equivalent opcodes of different data size
becomes quite natural.

This also flattens the MIMG-related class and multiclass hierarchy a little,
and collapses together some of the scaffolding for sample and gather4 opcodes.

Change-Id: I1a2549fdc1e881ff100e5393d2d87e73729a0ccd

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48016

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335227 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 13:36:44 +00:00
Nicolai Haehnle
db5003ee85 AMDGPU: Use generic tables instead of SearchableTable
Summary:

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D48014

Change-Id: Ibb43f90d955275571aff17d0c3ecfb5e5b299641

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@335226 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-21 13:36:33 +00:00
Konstantin Zhuravlyov
299cf5ff6a AMDHSA: Code object v3 updates
- Do not emit following assembler directives:
  - .hsa_code_object_version
  - .hsa_code_object_isa
  - .amd_amdgpu_isa
  - .amd_amdgpu_hsa_metadata
  - .amd_amdgpu_pal_metadata
- Do not emit .note entries
- Cleanup and bring in sync kernel descriptor header file
- Emit kernel descriptor into .rodata with appropriate relocations and
  alignments



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334519 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-12 18:02:46 +00:00
Konstantin Zhuravlyov
7dc086936f AMDGPU : Recalculate SGPRs when trap handler is supported
Differential Revision: https://reviews.llvm.org/D29911


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@332523 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-16 20:47:48 +00:00
Adrian Prantl
26b584c691 Remove \brief commands from doxygen comments.
We've been running doxygen with the autobrief option for a couple of
years now. This makes the \brief markers into our comments
redundant. Since they are a visual distraction and we don't want to
encourage more \brief markers in new code either, this patch removes
them all.

Patch produced by

  for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done

Differential Revision: https://reviews.llvm.org/D46290

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331272 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-01 15:54:18 +00:00
Matt Arsenault
ac9b3ef76a AMDGPU: Add Vega12 and Vega20
Changes by
  Matt Arsenault
  Konstantin Zhuravlyov

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331215 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-30 19:08:16 +00:00
Stanislav Mekhanoshin
eca9ecf8f9 [AMDGPU] Enabled v2.16 literals for VOP3P
Literal encoding needs op_sel_hi to select low 16 bit in this case.

Differential Revision: https://reviews.llvm.org/D45745

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@330230 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-17 23:09:05 +00:00
Konstantin Zhuravlyov
419eb8a73e AMDGPU: Remove max_scratch_backing_memory_byte_size from kernel header
1. Remove max_scratch_backing_memory_byte_size from kernel header
2. Make it a reserved field
3. Ignore it while parsing assembly for backwards compatibility
4. Bump up minor version of kernel header

Differential Revision: https://reviews.llvm.org/D45452


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@329620 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-09 20:47:22 +00:00
Nicolai Haehnle
2360e2aafa AMDGPU: Make isIntrinsicSourceOfDivergence table-driven
Summary:
This is in preparation for the new dimension-aware image intrinsics,
which I'd rather not have to list here by hand.

Change-Id: Iaa16e3a635a11283918ce0d9e1e618591b0bf6fa

Reviewers: arsenm, rampitec, b-sumner

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D44938

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@328939 91177308-0d34-0410-b5e6-96231b3b80d8
2018-04-01 17:09:14 +00:00
Stanislav Mekhanoshin
0dc2b94c4e [AMDGPU] Add default ISA version targets
In case if -mattr used to modify feature set bits in llvm-mc call
getIsaVersion can fail to identify specific ISA due to test mismatch.
Adding default fallback tests which will always correctly report at
least major version.

Differential Revision: https://reviews.llvm.org/D44163

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326825 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-06 18:33:55 +00:00
Alexander Timofeev
77d8d0a7e7 Pass Divergence Analysis data to Selection DAG to drive divergence
dependent instruction selection.

Differential revision: https://reviews.llvm.org/D35267

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@326703 91177308-0d34-0410-b5e6-96231b3b80d8
2018-03-05 15:12:21 +00:00
Konstantin Zhuravlyov
4bf0aa0c17 AMDGPU: Bring processors and features in sync with the spec
- Remove gfx800
- Make iceland gfx802
- Add xnack to gfx902

Differential Revision: https://reviews.llvm.org/D43355


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325393 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-16 21:26:25 +00:00