2867 Commits

Author SHA1 Message Date
Alexander Timofeev
00e50653ed [AMDGPU] Divergence driven instruction selection. Shift operations.
Summary: This change enables VOP3 shifts to be explicitly selected
         dependent on the divergence.

Differential Revision: https://reviews.llvm.org/D52559

Reviewers: rampitec

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343455 91177308-0d34-0410-b5e6-96231b3b80d8
2018-10-01 11:06:35 +00:00
Vitaly Buka
05252670a6 [cxx2a] Fix warning triggered by r343285
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343369 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-29 02:17:12 +00:00
Konstantin Zhuravlyov
1f7247b6c9 AMDGPU: Split HasExt into HasExtDPP/SDWA/SDWA9
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343264 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-27 20:49:00 +00:00
Konstantin Zhuravlyov
62258deb2a AMDGPU: Split VOP2Inst into VOP2Inst_e32/e64/sdwa
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343259 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-27 19:46:41 +00:00
Konstantin Zhuravlyov
43980ea1fb AMDGPU/NFC: Simplify VOP_MAC_F16/F32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343254 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-27 19:24:05 +00:00
Stanislav Mekhanoshin
45657b2c3b [AMDGPU] Fold copy (copy vgpr)
This allows to reduce a number of used VGPRs in some cases.

Differential Revision: https://reviews.llvm.org/D52577

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343249 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-27 18:55:20 +00:00
Fangrui Song
3b35e17b21 llvm::sort(C.begin(), C.end(), ...) -> llvm::sort(C, ...)
Summary: The convenience wrapper in STLExtras is available since rL342102.

Reviewers: dblaikie, javed.absar, JDevlieghere, andreadb

Subscribers: MatzeB, sanjoy, arsenm, dschuff, mehdi_amini, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, javed.absar, gbedwell, jrtc27, mgrang, atanasyan, steven_wu, george.burgess.iv, dexonsmith, kristina, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D52573

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343163 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-27 02:13:45 +00:00
Tom Stellard
2f5ec56062 AMDGPU/SI: Change predicate to isCIOnly for 32-bit imm s_buffer_load* patterns
Summary:
This is essentially NFC, because the complex pattern used for these patterns
will fail on non-CI, but this makes the pattern consistent with other CI
smrd patterns.  It is also a performance improvement, because the pattern
will now fail earlier on non-CI.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52469

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343125 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-26 16:53:36 +00:00
Stanislav Mekhanoshin
4d704664d6 [AMDGPU] Fix ds combine with subregs
Differential Revision: https://reviews.llvm.org/D52522

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343047 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-25 23:33:18 +00:00
Changpeng Fang
837d4755cd AMDGPU: Add Selection patterns to support add of one bit.
Summary:
  We generate s_xor to lower add of i1s in general cases, and s_not to
lower add with a one-bit imm of -1 (true).

Reviewers:
  rampitec

Differential Revision:
  https://reviews.llvm.org/D52518

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@343030 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-25 21:21:18 +00:00
Sameer Sahasrabuddhe
116128c1c0 [AMDGPU] restore r342722 which was reverted with r342743
[AMDGPU] lower-switch in preISel as a workaround for legacy DA

Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.

As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342956 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-25 09:39:21 +00:00
Matt Arsenault
9e2438a3e1 AMDGPU: Fix private handling for allowsMisalignedMemoryAccesses
If the alignment is at least 4, this should report true.

Something still seems off with how < 4-byte types are
handled here though.

Fixing this seems to change how some combines get
to where they get, but somehow isn't changing the net
result.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342879 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-24 13:18:15 +00:00
Sameer Sahasrabuddhe
0ccb4cd734 revert changes from r342722
"[AMDGPU] lower-switch in preISel as a workaround for legacy DA"

This broke regression tests. The first breakage was noticed here:
http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/23549


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342743 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-21 16:31:51 +00:00
Sameer Sahasrabuddhe
5b5e532790 [AMDGPU] lower-switch in preISel as a workaround for legacy DA
Summary:
The default target of the switch instruction may sometimes be an
"unreachable" block, when it is guaranteed that one of the cases is
always taken. The dominator tree concludes that such a switch
instruction does not have an immediate post dominator. This confuses
divergence analysis, which is unable to propagate sync dependence to
the targets of the switch instruction.

As a workaround, the AMDGPU target now invokes lower-switch as a
preISel pass. LowerSwitch is designed to handle the unreachable
default target correctly, allowing the divergence analysis to locate
the correct immediate dominator of the now-lowered switch.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, simoll

Differential Revision: https://reviews.llvm.org/D52221

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342722 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-21 11:26:55 +00:00
Alexander Timofeev
a12681683a [AMDGPU] Divergence driven instruction selection. Part 1.
Summary: This change is the first part of the AMDGPU target description
    change. The aim of it is the effective splitting the vector and scalar
    flows at the selection stage. Selection uses predicate functions based
    on the framework implemented earlier - https://reviews.llvm.org/D35267

    Differential revision: https://reviews.llvm.org/D52019

    Reviewers: rampitec

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342719 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-21 10:31:22 +00:00
Carl Ritson
76c0cf6582 [AMDGPU] Add instruction selection for i1 to f16 conversion
Summary:
This is required for GPUs with 16 bit instructions where f16 is a
legal register type and hence int_to_fp i1 to f16 is not lowered
by legalizing.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52018

Change-Id: Ie4c0fd6ced7cf10ad612023c6879724d9ded5851

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342558 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-19 16:32:12 +00:00
Matthias Braun
b064c24e4a ScheduleDAG: Cleanup dumping code; NFC
- Instead of having both `SUnit::dump(ScheduleDAG*)` and
  `ScheduleDAG::dumpNode(ScheduleDAG*)`, just keep the latter around.
- Add `ScheduleDAG::dump()` and avoid code duplication in several
  places. Implement it for different ScheduleDAG variants.
- Add `ScheduleDAG::dumpNodeName()` in favor of the `SUnit::print()`
  functions. They were only ever used for debug dumping and putting the
  function into ScheduleDAG is consistent with the `dumpNode()` change.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342520 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-19 00:23:35 +00:00
Farhana Aleen
40cb0cb6c7 [AMDGPU] Match udot8 pattern
Summary: D.u32 = S0.u4[0] * S1.u4[0] +

         S0.u4[1] * S1.u4[1] +
         S0.u4[2] * S1.u4[2] +
         S0.u4[3] * S1.u4[3] +
         S0.u4[4] * S1.u4[4] +
         S0.u4[5] * S1.u4[5] +
         S0.u4[6] * S1.u4[6] +
         S0.u4[7] * S1.u4[7] +
         S2.u32

Author: FarhanaAleen

Reviewed By: arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D51947

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342497 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-18 16:59:48 +00:00
Matt Arsenault
8ec7f7a8ef AMDGPU: Don't form fmed3 if it will require materialization
If there is a single use constant, it can be folded into the
min/max, but not into med3.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342443 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-18 02:34:54 +00:00
Matt Arsenault
e63ae03460 AMDGPU: Expand vector canonicalizes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342439 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-18 01:51:33 +00:00
Stanislav Mekhanoshin
a5c8509bc3 [AMDGPU] Initialize instruction itinerary from GCNSubtarget
I need to use it in the GCN codegen.

Differential Revision: https://reviews.llvm.org/D52123

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342400 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-17 16:04:32 +00:00
Konstantin Zhuravlyov
6b2a177e15 AMDGPU: Clear the bits before they are being set in program resource registers
Change by Tony Tye


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342270 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-14 20:00:36 +00:00
David Stuttard
3dd28f03d5 [AMDGPU] Ensure trig range reduction only used for subtargets that require it
Summary:
GFX9 and above support sin/cos instructions with a greater range and thus don't
require a fract instruction prior to invocation.

Added a subtarget feature to reflect this and added code to take advantage of
expanded range on GFX9+

Also updated the tests to check correct behaviour

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D51933

Change-Id: I1c1f1d3726a5ae32116646ca5cfa1ab4ef69e5b0

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342222 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-14 10:27:19 +00:00
Tim Renouf
5d02c0ff96 [AMDGPU] Removed unused method
Summary:
I accidentally left this behind in D50306, and it causes a build warning
when I build with gcc7.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52022

Change-Id: I30f7a47047e9d9d841f652da66d2fea19e74842c

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342189 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-13 21:56:25 +00:00
Matt Arsenault
809cc972c8 AMDGPU: Fix not preserving alignent in call setups
If an argument was passed on the stack, this
was using the default alignment.

I'm not sure there's an observable change from this. This
was observable due to bugs in expansion of unaligned
loads and stores, but since that is fixed I don't think
this matters much.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342133 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-13 12:14:31 +00:00
Alexander Timofeev
6785a334c3 [AMDGPU] Load divergence predicate refactoring
Differential revision: https://reviews.llvm.org/D51931

    Reviewers: rampitec

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342120 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-13 09:06:56 +00:00
Alexander Timofeev
f33751e706 [AMDGPU] Preliminary patch for divergence driven instruction selection. Load offset inlining pattern changed.
Differential revision: https://reviews.llvm.org/D51975

    Reviewers: rampitec

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342115 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-13 06:34:56 +00:00
Konstantin Zhuravlyov
bc55752c49 AMDGPU: Print all kernel descriptor directives (including the ones with default values)
Change by Tony Tye

Differential Revision: https://reviews.llvm.org/D51954


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342077 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-12 20:25:39 +00:00
Konstantin Zhuravlyov
4d82ce5c27 AMDGPU: Re-apply r341982 after fixing the layering issue
Move isa version determination into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342069 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-12 18:50:47 +00:00
Ilya Biryukov
867f48781f Revert "AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination into TargetParser."
This reverts commit r341982.

The change introduced a layering violation. Reverting to unbreak
our integrate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@342023 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-12 07:05:30 +00:00
Konstantin Zhuravlyov
b479681381 AMDGPU: Move isa version and EF_AMDGPU_MACH_* determination
into TargetParser.

Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).

Differential Revision: https://reviews.llvm.org/D51890



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341982 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-11 18:56:51 +00:00
Alexander Timofeev
8f483e6fc2 [AMDGPU] Preliminary patch for divergence driven instruction selection. Immediate selection predicate changed
Differential revision: https://reviews.llvm.org/D51734
Reviewers: rampitec

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341928 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-11 11:56:50 +00:00
Matt Arsenault
4d739a8afd AMDGPU: Remove leftovers from configurable address spaces
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341895 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-11 04:00:49 +00:00
Alexander Timofeev
6df7a517ca [AMDGPU] Preliminary patch for divergence driven instruction selection. Inline immediate move to V_MADAK_F32.
Differential revision: https://reviews.llvm.org/D51586

    Reviewer: rampitec

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341843 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-10 16:42:49 +00:00
Matt Arsenault
32ca0aafd1 AMDGPU: Remove function pointer type hack
Now the pointer size should always be correct and
we don't need to improperly inspect the pointee type.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341806 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-10 12:16:11 +00:00
Matt Arsenault
fa0980a921 AMDGPU: Stop reporting is-noop addrspacecast for constant 32-bit
This will require something to cast. Before this would eliminate
the cast, which would result in copies of $noreg.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341803 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-10 11:59:27 +00:00
Matt Arsenault
2390ffbd78 DAG: Handle odd vector sizes in calling conv splitting
This already worked if only one register piece was used,
but didn't if a type was split into multiple, unequal
sized pieces.

Fixes not splitting 3i16/v3f16 into two registers for
AMDGPU.

This will also allow fixing the ABI for 16-bit vectors
in a future commit so that it's the same for all subtargets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341801 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-10 11:49:23 +00:00
Carl Ritson
a5defd3b7d [AMDGPU] Prevent sequences of non-instructions disrupting GCNHazardRecognizer wait state counting
Summary:
This fixes a bug where a large number of implicit def instructions can fill the GCNHazardRecognizer lookahead buffer causing required NOPs to not be inserted.

Reviewers: nhaehnle, arsenm

Reviewed By: arsenm

Subscribers: sheredom, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D51726

Change-Id: Ie75338f94de704ee5816b05afd0c922c6748a95b

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341798 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-10 10:14:48 +00:00
Matt Arsenault
814247cd96 AMDGPU: Use GOT PSV since it has an address space now
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341768 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-10 02:23:39 +00:00
Matt Arsenault
185f21fbd3 AMDGPU: Don't abort on unknown addrspace argument
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341767 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-10 02:23:30 +00:00
Alexander Timofeev
55e56ce884 [AMDGPU] Preliminary patch for divergence driven instruction selection. Fold immediate SMRD offset.
Differential revision: https://reviews.llvm.org/D51610

Reviewer: rampitec

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341636 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-07 09:05:34 +00:00
Scott Linder
63ca6b52d5 Revert r341413
Causes a regression in expensive checks.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341589 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-06 21:38:56 +00:00
Matt Arsenault
bd42453404 AMDGPU: Remove old hack for function addresses
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341567 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-06 17:23:24 +00:00
Scott Linder
6da9a35885 [AMDGPU] Legalize VGPR Rsrc operands for MUBUF instructions
Emit a waterfall loop in the general case for a potentially-divergent Rsrc
operand. When practical, avoid this by using Addr64 instructions.

Differential Revision: https://reviews.llvm.org/D50982


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341413 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-04 21:50:47 +00:00
Matt Arsenault
49cbe528e3 AMDGPU: Fix DAG divergence not reporting flat loads
Match behavior in DAG of r340343

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341393 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-04 18:58:19 +00:00
Simon Pilgrim
3b146be88b Remove unnecessary semicolon to silence -Wpedantic warning. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341303 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-03 10:17:25 +00:00
Tom Stellard
051c613085 AMDGPU/GlobalISel: Define instruction mapping for G_SELECT
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D49737

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341271 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-01 02:41:19 +00:00
Stanislav Mekhanoshin
44c99af37e [AMDGPU] Split v32i32 loads
Differential Revision: https://reviews.llvm.org/D51555

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341266 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 22:43:36 +00:00
Matt Arsenault
6241a83e35 AMDGPU: Restrict extract_vector_elt combine to loads
The intention is to enable the extract_vector_elt load combine,
and doing this for other operations interferes with more
useful optimizations on vectors.

Handle any type of load since in principle we should do the
same combine for the various load intrinsics.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341219 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 15:39:52 +00:00
Matt Arsenault
c8c005cb52 AMDGPU: Stop forcing internalize at -O0
This doesn't really matter if clang is always emitting
the visibility as hidden by default.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341168 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 06:02:36 +00:00