1687 Commits

Author SHA1 Message Date
Matt Arsenault
40cf5b3d29 AMDGPU: Fix crash when disassembling VOP3 mac
The unused dummy src2_modifiers is missing, so it crashes
when trying to print it.

I tried to fully remove src2_modifiers, but there are some
irritations in the places where it is converted to mad since
it starts to require modifying use lists while iterating over
them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299861 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-10 17:58:06 +00:00
Matt Arsenault
1ad9b2d946 AMDGPU: Actually write nops for writeNopData
Before this was just writing 0s, which ends up looking like a
v_cndmask_b32 v0, s0, v0, vcc. Write out an encoded s_nop instead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299816 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-08 21:28:38 +00:00
Stanislav Mekhanoshin
abcd91992d [AMDGPU] Unroll more to eliminate phis and conditions
Increase threshold to unroll a loop which contains an "if" statement
whose condition defined by a PHI belonging to the loop. This may help
to eliminate if region and potentially even PHI itself, saving on
both divergence and registers used for the PHI.

Add a small bonus for each of such "if" statements.

Differential Revision: https://reviews.llvm.org/D31693

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299779 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-07 16:26:28 +00:00
Dmitry Preobrazhensky
7bf2a5770d [AMDGPU][MC] Fix for Bug 28211 + LIT tests
- corrected DS_GWS_* opcodes (see VI_Shader_Programming#16.pdf for detailed description)
  - address operand is not used
  - several opcodes have data operand
  - all opcodes have offset modifier
- DS_AND_SRC2_B32: corrected typo in mnemo
- DS_WRAP_RTN_F32 replaced with DS_WRAP_RTN_B32
- added CI/VI opcodes:
  - DS_CONDXCHG32_RTN_B64
  - DS_GWS_SEMA_RELEASE_ALL
- added VI opcodes:
  - DS_CONSUME
  - DS_APPEND
  - DS_ORDERED_COUNT

Differential Revision: https://reviews.llvm.org/D31707

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299767 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-07 13:07:13 +00:00
Sam Kolton
218a5a7e27 [AMDGPU] Move SiShrinkInstruction and SDWAPeephole to SSAOptimization passes
Summary:
Difference beetween PreRegAlloc() and MachineSSAOptimization() are that the former is run despite of -O0 optimization level. In my undestanding SiShrinkInstructions and SDWAPeephole shouldn't run when optimizations are disabled.
With this change order of passes will not change.

Reviewers: arsenm, vpykhtin, rampitec

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31705

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299757 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-07 10:53:12 +00:00
Konstantin Zhuravlyov
4d40c97796 AMDGPU/GFX9: Fix shared and private aperture queries
Differential Revision: https://reviews.llvm.org/D31786


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299727 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 23:02:33 +00:00
Matt Arsenault
c82755f01b AMDGPU: Diagnose illegal SGPR to VGPR copies
This is possible in ways that are not compiler bugs,
so stop asserting on them.

This emits an extra error when emitting objects when it
can't encode the new pseudo, but I'm not sure that matters.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299712 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 21:09:53 +00:00
Matt Arsenault
34d5677726 AMDGPU: Replace fp16SrcZerosHighBits with a whitelist
FCOPYSIGN is lowered to bit operations which don't clear the high
bits.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299708 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 20:58:30 +00:00
Yaxun Liu
e4f931e960 [AMDGPU] Temporarily change constant address space from 4 to 2
Our final address space mapping is to let constant address space to be 4 to match nvptx.
However for now we will make it 2 to avoid unnecessary work in FE/BE/devlib
about intrinsics returning constant pointers.

Differential Revision: https://reviews.llvm.org/D31770


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299690 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 19:17:32 +00:00
Matt Arsenault
0719006e7e AMDGPU: Stop using CCAssignToRegWithShadow
This does not do what it is attempting to use it for
and requires working around in LowerFormalArguments.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299667 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 17:37:27 +00:00
Stanislav Mekhanoshin
3b10e5fb8d [AMDGPU] Eliminate barrier if workgroup size is not greater than wavefront size
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guarantied
to come to the same point at the same time.

Differential Revision: https://reviews.llvm.org/D31731

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299659 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 16:48:30 +00:00
Sam Kolton
3481b0868e [AMDGPU] Resubmit SDWA peephole: enable by default
Reviewers: vpykhtin, rampitec, arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31671

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299654 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-06 15:03:28 +00:00
Ivan Krasin
f836c7dbde Revert r299536. [AMDGPU] SDWA peephole: enable by default.
Reason: breaks multiple bots:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/3988
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/1173

Original Review URL: https://reviews.llvm.org/D31671



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299583 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-05 19:58:12 +00:00
Dmitry Preobrazhensky
4b7a5e5aa1 [AMDGPU][MC] Fix for Bug 28158 + LIT tests
Added support of the following instructions:
- s_cbranch_cdbgsys
- s_cbranch_cdbgsys_and_user
- s_cbranch_cdbgsys_or_user
- s_cbranch_cdbguser
- s_setkill

Reviewers: vpykhtin

Differential Revision: https://reviews.llvm.org/D31469

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299567 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-05 17:26:45 +00:00
Dmitry Preobrazhensky
8067b87e5f [AMDGPU][MC] Fix for Bug 28167 + LIT tests
Corrected src0 for v_writelane_b32:
- Enabled inline constants and literals for SI/CI (VOP2)
- Enabled inline constants for VI (VOP3)

Reviewers: vpykhtin, arsenm

https://reviews.llvm.org/D31463

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299555 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-05 16:08:21 +00:00
Sam Kolton
4523389543 [AMDGPU] SDWA peephole: enable by default
Reviewers: vpykhtin, rampitec, arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31671

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299536 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-05 12:00:45 +00:00
Alex Bradbury
ff1254b6f8 Add MCContext argument to MCAsmBackend::applyFixup for error reporting
A number of backends (AArch64, MIPS, ARM) have been using
MCContext::reportError to report issues such as out-of-range fixup values in
their TgtAsmBackend. This is great, but because MCContext couldn't easily be
threaded through to the adjustFixupValue helper function from its usual
callsite (applyFixup), these backends ended up adding an MCContext* argument
and adding another call to applyFixup to processFixupValue. Adding an
MCContext parameter to applyFixup makes this unnecessary, and even better -
applyFixup can take a reference to MCContext rather than a potentially null
pointer.

Differential Revision: https://reviews.llvm.org/D30264


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299529 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-05 10:16:14 +00:00
Matt Arsenault
452506a655 AMDGPU: Remove legacy export intrinsic
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299444 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-04 16:34:39 +00:00
Matt Arsenault
f486610d1f AMDGPU: Remove legacy image intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299443 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-04 16:34:35 +00:00
Matt Arsenault
513e714dfd AMDGPU: Remove llvm.SI.vs.load.input
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299391 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-03 21:45:13 +00:00
Matt Arsenault
cd7c9c3178 AMDGPU: Remove legacy bfe intrinsics
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299372 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-03 18:08:08 +00:00
Davide Italiano
0b798207e7 [AMDGPU] Garbage collect now unused dead code. NFCI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299310 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-01 19:30:17 +00:00
Stanislav Mekhanoshin
b09e7cd9e9 [AMDGPU] Remove assumption that vector and scalar types do not alias
Differential Revision: https://reviews.llvm.org/D31547

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299250 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-31 20:16:54 +00:00
Matt Arsenault
c4de629ce2 AMDGPU: Remove unnecessary ands when f16 is legal
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.

Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299246 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-31 19:53:03 +00:00
Jan Vesely
1abd9ecbfc AMDGPU/R600: Fix amdgpu alias analysis pass.
R600 uses higher AS number to access kernel parameters

Fixes: r298846
Differential Revision: https://reviews.llvm.org/D31520

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299245 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-31 19:26:23 +00:00
Simon Pilgrim
9fc191fd45 [DAGCombiner] Add vector demanded elements support to ComputeNumSignBits
Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course.

Followup to D25691.

Differential Revision: https://reviews.llvm.org/D31311

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299219 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-31 13:54:09 +00:00
Sam Kolton
ff45a18189 [AMDGPU] SDWA Peephole: improve search for immediates in SDWA patterns
Previously compiler often extracted common immediates into specific register, e.g.:
```
%vreg0 = S_MOV_B32 0xff;
%vreg2 = V_AND_B32_e32 %vreg0, %vreg1
%vreg4 = V_AND_B32_e32 %vreg0, %vreg3
```
Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands:
```
SDWA src: %vreg2 src_sel:BYTE_0
SDWA src: %vreg4 src_sel:BYTE_0
```
With this change peephole check if operand is either immediate or register that is copy of immediate.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299202 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-31 11:42:43 +00:00
Simon Pilgrim
07898901df [DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.

Differential Revision: https://reviews.llvm.org/D31249

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299201 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-31 11:24:16 +00:00
Matt Arsenault
839e869207 AMDGPU: Rename isKernel
What we really want to do is distinguish functions that may
be called by other functions, and graphics shaders are not
called kernels.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299140 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-30 23:58:04 +00:00
Matt Arsenault
3c1dcddf86 AMDGPU: Add all atomicrmw fields to atomic.inc/dec
Add scope, order, isVolatile

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299122 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-30 22:21:40 +00:00
Stanislav Mekhanoshin
38e381b5a3 [AMDGPU] Add GlobalOpt parameter to Always Inliner pass
If set to false it does not remove global aliases. With this parameter
set to false it should be safe to run the pass before link.

Differential Revision: https://reviews.llvm.org/D31489

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299108 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-30 20:16:02 +00:00
Simon Pilgrim
70a5705cf0 [AMDGPU] Tidy up computeKnownBitsForTargetNode/ComputeNumSignBitsForTargetNode arguments. NFCI.
Based on comment in D31249.


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298991 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-29 12:09:25 +00:00
Stanislav Mekhanoshin
e42e44c7e3 [AMDGPU] Boost unroll threshold for loops reading local memory
This is less important than increase threshold for private memory,
but still brings performance improvements in a wide range of tests.
Unrolling more for local memory serves three purposes: it allows
to combine ds operations if offset becomes static, saves registers
used for offsets in case of static offsets, and allows better lds
latency hiding.

Differential Revision: https://reviews.llvm.org/D31412

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298948 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-28 22:13:51 +00:00
Stanislav Mekhanoshin
7a9c51436f [AMDGPU] Fix recorded region boundaries in max-occupancy scheduler
This is incorrect to record region boundaries before scheduling,
it may change after scheduling. As a result second pass may see less
instructions to schedule than it should.

Differential Revision: https://reviews.llvm.org/D31434

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298945 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-28 21:48:54 +00:00
Stanislav Mekhanoshin
88f7810574 [AMDGPU] Split -amdgpu-early-inline-all option
Previously it was covered by the internalization. It turns out we cannot
run internalizer in FE, it break separate compilation tests. Thus early
inliner gets its own option.

Differential Revision: https://reviews.llvm.org/D31429

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298935 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-28 18:23:24 +00:00
Valery Pykhtin
072955ea76 [AMDGPU] Update SI scheduler colorHighLatenciesGroups
Depends on rL298896: MachineScheduler/ScheduleDAG: Add support for GetSubGraph

Patch by Axel Davy (axel.davy@normalesup.org)

Differential revision: https://reviews.llvm.org/D30152

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298902 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-28 07:19:48 +00:00
Valery Pykhtin
bc04e5d6d2 [AMDGPU] SISched: Detect dependency types between blocks
Patch by Axel Davy (axel.davy@normalesup.org)

Differential revision: https://reviews.llvm.org/D30153

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298872 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-27 18:22:39 +00:00
Valery Pykhtin
96f1d455e7 [AMDGPU] SISched: Update colorEndsAccordingToDependencies
Patch by Axel Davy (axel.davy@normalesup.org)

Differential revision: https://reviews.llvm.org/D30150

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298861 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-27 17:26:40 +00:00
Valery Pykhtin
2a867a816e [AMDGPU] Fix SI scheduler LiveOut Refcount issue
Patch by Axel Davy (axel.davy@normalesup.org)

Differential revision: https://reviews.llvm.org/D30145

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298857 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-27 17:06:36 +00:00
Dmitry Preobrazhensky
cb5431a931 [AMDGPU][MC] Fix for Bug 28207 + LIT tests
Enabled clamp and omod for v_cvt_* opcodes which have src0 of an integer type

Reviewers: vpykhtin, arsenm

Differential Revision: https://reviews.llvm.org/D31327

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298852 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-27 15:57:17 +00:00
Yaxun Liu
ab3be33d40 [AMDGPU] Get address space mapping by target triple environment
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.

The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.

Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.

Differential Revision: https://reviews.llvm.org/D31284


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298846 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-27 14:04:01 +00:00
Yaxun Liu
d81243066b [AMDGPU] Switch data layout by triple environment amdgiz
Switch data layout by target triple environment amdgiz and amdgizcl indicating using of an address space mapping in which generic address space is 0.

amdgiz is for non-OpenCL environment where generic address space is 0.

amdgizcl is for OpenCL environment where generic address space is 0.

Differential Revision: https://reviews.llvm.org/D31211


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298758 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-25 02:05:44 +00:00
Matt Arsenault
a959750939 AMDGPU: Fix annotating loops with nested loop conditions
If the branch condition for a loop was a phi which itself
was fed from a phi from a loop, it isn't safe to try
to delete the phi until after the loop is handled.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298737 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-24 20:57:10 +00:00
Matt Arsenault
d4f6485173 AMDGPU: Implement f16 fround
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298730 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-24 20:04:18 +00:00
Matt Arsenault
876bc45420 AMDGPU: Unify divergent function exits.
StructurizeCFG can't handle cases with multiple
returns creating regions with multiple exits.
Create a copy of UnifyFunctionExitNodes that only
unifies exit nodes that skips exit nodes
with uniform branch sources.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298729 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-24 19:52:05 +00:00
Stanislav Mekhanoshin
a332f465b2 [AMDGPU] Fold V_CNDMASK with identical source operands
Such instructions sometimes appear after lowering and folding.

Differential Revision: https://reviews.llvm.org/D31318

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298723 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-24 18:55:20 +00:00
Konstantin Zhuravlyov
1007ee7060 [AMDGPU] Rename Kind to ValueKind in metadata to be consistent
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298722 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-24 18:43:15 +00:00
Stanislav Mekhanoshin
30b1056d8f [AMDGPU] Add AMDGPUAliasAnalysis to opt pipeline
Previously it was added only to the BE.

Differential Revision: https://reviews.llvm.org/D31323

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298721 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-24 18:01:14 +00:00
Benjamin Kramer
65bb8eff35 [AMDGPU] Don't enforce constexpr, there are still old standard libraries around that don't have a constexpr std::pair.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298719 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-24 17:53:06 +00:00
Valery Pykhtin
80aca9b9aa [AMDGPU] Remove double map lookups in SI scheduler
Patch by Axel Davy (axel.davy@normalesup.org)

Differential revision: https://reviews.llvm.org/D30382

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298718 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-24 17:49:05 +00:00