Commit Graph

676 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
64620b1c31 [AMDGPU] Fix multiple vreg definitions in si-lower-control-flow
Differential Revision: https://reviews.llvm.org/D26939

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287608 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-22 01:42:34 +00:00
Matt Arsenault
8b1782955d DAG: Ignore call site attributes when emitting target intrinsic
A target intrinsic may be defined as possibly reading memory,
but the call site may have additional knowledge that it doesn't read
memory. The intrinsic lowering will expect the pessimistic
assumption of the intrinsic definition, so the chain should
still be used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287593 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-21 22:56:42 +00:00
Konstantin Zhuravlyov
5527d64b74 [AMDGPU] Change frexp.exp intrinsic to return i16 for f16 input
Differential Revision: https://reviews.llvm.org/D26862


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287389 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-18 22:31:08 +00:00
Tom Stellard
a006842d47 AMDGPU/SI: Remove zero_extend patterns for i16 ops selected to 32-bit insts
Summary:
The 32-bit instructions don't zero the high 16-bits like the 16-bit
instructions do.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26828

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287342 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-18 13:53:34 +00:00
Nicolai Haehnle
1e27a618c6 AMDGPU: Fix legalization of MUBUF instructions in shaders
Summary:
The addr64-based legalization is incorrect for MUBUF instructions with idxen
set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions.  This affects
e.g.  shaders that access buffer textures.

Since we never actually need the addr64-legalization in shaders, this patch
takes the easy route and keys off the calling convention.  If this ever
affects (non-OpenGL) compute, the type of legalization needs to be chosen
based on some TSFlag.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26747

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287339 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-18 11:55:52 +00:00
Matt Arsenault
cbafc5829d AMDGPU: Fix crash on illegal type for inlineasm
There are still crashes on non-MVT types in other
places.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287310 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-18 04:42:57 +00:00
Konstantin Zhuravlyov
1d609512ed Revert "AMDGPU: Enable ConstrainCopy DAG mutation"
This reverts commit r287146.

This breaks few conformance tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287233 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-17 16:41:49 +00:00
Konstantin Zhuravlyov
44a5962ac5 [AMDGPU] Add missing test for rL287203
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287204 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-17 04:33:20 +00:00
Konstantin Zhuravlyov
8fd8772fcd [AMDGPU] Promote f16/i16 conversions to f32/i32
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287201 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-17 04:00:46 +00:00
Konstantin Zhuravlyov
54556b5c81 [AMDGPU] Expand br_cc for f16
Differential Revision: https://reviews.llvm.org/D26732


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287199 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-17 03:49:01 +00:00
Matt Arsenault
4fbd908949 AMDGPU: Enable ConstrainCopy DAG mutation
This fixes a probably unintended divergence from the default
scheduler behavior.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287146 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-16 20:35:23 +00:00
Tom Stellard
ae5ecee3ec AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass
Summary:
1. Don't try to copy values to and from the same register class.
2. Replace copies with of registers with immediate values with v_mov/s_mov
   instructions.

The main purpose of this change is to make MachineSink do a better job of
determining when it is beneficial to split a critical edge, since the pass
assumes that copies will become move instructions.

This prevents a regression in uniform-cfg.ll if we enable critical edge
splitting for AMDGPU.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23408

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287131 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-16 18:42:17 +00:00
Konstantin Zhuravlyov
7f4d4fdf3f [AMDGPU] Handle f16 select{_cc}
- Select `select` to `v_cndmask_b32`
- Expand `select_cc`
- Refactor patterns

Differential Revision: https://reviews.llvm.org/D26714


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287074 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-16 03:16:26 +00:00
Jan Vesely
ea73d16588 AMDGPU/GCN: Exit early in hazard recognizer if there is no vreg argument
wbinvl.* are vector instruction that do not sue vector registers.

v2: check only M?BUF instructions

Differential Revision: https://reviews.llvm.org/D26633

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287056 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-15 23:55:15 +00:00
Tom Stellard
1d353ca419 AMDGPU/SI: Fix pattern for i16 = sign_extend i1
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26670

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287035 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-15 21:25:56 +00:00
Matt Arsenault
7036e8dad1 AMDGPU: Enable store clustering
Also respect the TII hook for these like the generic code does
in case we want a flag later to disable this.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287021 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-15 20:22:55 +00:00
Matt Arsenault
339cf19d8c AMDGPU: Analyze mubuf with immediate soffset
Fixes giving up on clustering common addr64 accesses with
constant 0 soffset.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287018 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-15 20:14:27 +00:00
Stanislav Mekhanoshin
79f84ef39d [AMDGPU] Add wave barrier builtin
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.

Differential Revision: https://reviews.llvm.org/D26585

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287007 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-15 19:00:15 +00:00
Matt Arsenault
9de96caccf AMDGPU: Fix f16 fabs/fneg
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286931 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-15 02:25:28 +00:00
Matt Arsenault
856f36957c AMDGPU: Fix formatting of 1/2pi immediate
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286912 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-15 00:04:33 +00:00
Changpeng Fang
af76145600 AMDGPU/SI: Support data types other than V4f32 in image intrinsics
Summary:
  Extend image intrinsics to support data types of V1F32 and V2F32.

  TODO: we should define a mapping table to change the opcode for data type of V2F32 but just one channel is active,
  even though such case should be very rare.

Reviewers:
  tstellarAMD

Differential Revision:
  http://reviews.llvm.org/D26472

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286860 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-14 18:33:18 +00:00
Matt Arsenault
4404d0d6e3 AMDGPU: Implement SGPR spilling with scalar stores
nThis avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.

This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.

The cache also needs to be flushed before wave termination
if a scalar store is used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286766 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-13 18:20:54 +00:00
Konstantin Zhuravlyov
9027123253 [AMDGPU] Add f16 support (VI+)
Differential Revision: https://reviews.llvm.org/D25975


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286753 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-13 07:01:11 +00:00
Tom Stellard
da5a5c79fa AMDGPU/SI: Promote i16 = fp_[us]int f32 for VI
Summary: This fixes a regression caused by r286464.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D26570

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286687 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-12 00:19:11 +00:00
Tom Stellard
6687aabcf5 AMDGPU/SI: Fix visit order assumption in SIFixSGPRCopies
Summary:
This pass was assuming that when a PHI instruction defined a register
used by another PHI instruction that the defining insstruction would
be legalized before the using instruction.

This assumption was causing the pass to not legalize some PHI nodes
within divergent flow-control.

This fixes a bug that was uncovered by r285762.

Reviewers: nhaehnle, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26303

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286676 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-11 23:35:42 +00:00
Matthias Braun
ee5205bfae ScheduleDAGInstrs: Add condjump deps to addSchedBarrierDeps()
addSchedBarrierDeps() is supposed to add use operands to the ExitSU
node. The current implementation adds uses for calls/barrier instruction
and the MBB live-outs in all other cases. The use
operands of conditional jump instructions were missed.

Also added code to macrofusion to set the latencies between nodes to
zero to avoid problems with the fusing nodes lingering around in the
pending list now.

Differential Revision: https://reviews.llvm.org/D25140

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286544 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-11 01:34:21 +00:00
Stanislav Mekhanoshin
a0c045c407 Revert "[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies"
This reverts commit r286171, it breaks piglit test fs-discard-exit-2

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286530 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-11 00:22:34 +00:00
Yaxun Liu
a2ee7d2991 AMDGPU: Emit runtime metadata as a note element in .note section
Currently runtime metadata is emitted as an ELF section with name .AMDGPU.runtime_metadata.

However there is a standard way to convey vendor specific information about how to run an ELF binary, which is called vendor-specific note element (http://www.netbsd.org/docs/kernel/elf-notes.html).

This patch lets AMDGPU backend emits runtime metadata as a note element in .note section.

Differential Revision: https://reviews.llvm.org/D25781


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286502 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-10 21:18:49 +00:00
Tom Stellard
0deee390af AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286464 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-10 16:02:37 +00:00
Stanislav Mekhanoshin
c312996e7a [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies
Codegen prepare sinks comparisons close to a user is we have only one register
for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions.
Changed BE to report we have many condition registers. That way IR LICM pass
would hoist an invariant comparison out of a loop and codegen prepare will not
sink it.

With that done a condition is calculated in one block and used in another.
Current behavior is to store workitem's condition in a VGPR using v_cndmask
and then restore it with yet another v_cmp instruction from that v_cndmask's
result. To mitigate the issue a forward propagation of a v_cmp 64 bit result
to an user is implemented. Additional side effect of this is that we may
consume less VGPRs in a cost of more SGPRs in case if holding of multiple
conditions is needed, and that is a clear win in most cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286171 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 23:04:50 +00:00
Matt Arsenault
f577de357a AMDGPU: Remove unnecessary and on conditional branch
The comment explaining why this was necessary is incorrect
in its description of v_cmp's behavior for inactive workitems.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286134 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 19:09:33 +00:00
Tom Stellard
ae152bde49 Revert "AMDGPU: Add VI i16 support"
This reverts commit r285939 and r285948.  These broke some conformance tests.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285995 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-04 13:06:34 +00:00
Tom Stellard
7c173dd5fa AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285939 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-03 17:13:50 +00:00
Alexander Timofeev
d767830f39 [AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.
hange explores the fact that LDS reads may be reordered even if access
the same location.

Prior the change, algorithm immediately stops as soon as any memory
access encountered between loads that are expected to be merged
together. Although, Read-After-Read conflict cannot affect execution
correctness.

Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%.
Also improvement expected on any massive sequences of reads from LDS.

Differential Revision: https://reviews.llvm.org/D25944

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285919 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-03 14:37:13 +00:00
Matt Arsenault
0786e7941b AMDGPU: Cleanup some xfailed tests
Some of these are already fixed or tested somewhere else.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285840 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-02 17:24:54 +00:00
Matt Arsenault
0a892bb4c2 BranchRelaxation: Fix computing indirect branch block size
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285828 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-02 16:18:29 +00:00
Matt Arsenault
f8e2c0ab06 AMDGPU: Use brev for materializing SGPR constants
This is already done with VGPR immediates and saves 4 bytes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285765 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 23:14:20 +00:00
Matt Arsenault
e9a23c4c57 AMDGPU: Default to using scalar mov to materialize immediate
This is the conservatively correct way because it's easy to
move or replace a scalar immediate. This was incorrect in the case
when the register class wasn't known from the static instruction
definition, but still needed to be an SGPR. The main example of this
is inlineasm has an SGPR constraint.

Also start verifying the register classes of inlineasm operands.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285762 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 22:55:07 +00:00
Konstantin Zhuravlyov
a2791e7e20 [AMDGPU] Check if type transforms to i16 (VI+) when getting AMDGPUISD::FFBH_U32
This will prevent following regression when enabling i16 support (D18049):

test/CodeGen/AMDGPU/ctlz.ll
test/CodeGen/AMDGPU/ctlz_zero_undef.ll

Differential Revision: https://reviews.llvm.org/D25802


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285716 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 17:49:33 +00:00
Tom Stellard
2b70009a94 AMDGPU: Implement expansion of f16 = FP_TO_FP16 f64
I wanted to implement this as a target independent expansion, however when
targets say they want to expand FP_TO_FP16 what they actually want is
the unsafe math expansion when possible and expansion to a libcall in all
other cases.

The only way to make this work as a target independent would be to add logic
to target's TargetLowering construction to mark theses nodes as Expand when
LegalizeDAG can use the unsafe expansion and mark them as LibCall when it
cannot.  I think this would be possible, but I think it would be too fragile
and complex as it would require targets to keep their expansion logic up
to date with the code in LegalizeDAG.

Reviewers: bogner, ab, t.p.northover, arsenm

Subscribers: wdng, llvm-commits, nhaehnle

Differential Revision: https://reviews.llvm.org/D25999

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285704 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 16:31:48 +00:00
Valery Pykhtin
a66e032afa [AMDGPU] Expand vector mulhu/mulhs
Differential revision: https://reviews.llvm.org/D26077

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285684 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 10:26:48 +00:00
Matt Arsenault
ac5efca3f0 AMDGPU: Use 1/2pi inline imm on VI
I'm guessing at how it is supposed to be printed

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285490 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-29 04:05:06 +00:00
Matt Arsenault
d6028cdcc7 AMDGPU: Add definitions for scalar store instructions
Also add glc bit to the scalar loads since they exist on VI
and change the caching behavior.

This currently has an assembler bug where the glc bit is incorrectly
accepted on SI/CI which do not have it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285463 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 21:55:15 +00:00
Matt Arsenault
0e18bbf16a AMDGPU: Change check prefix in test
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285449 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 20:33:01 +00:00
Matt Arsenault
6cabc8f486 AMDGPU: Diagnose using too many SGPRs
This is possible when using inline asm.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285447 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 20:31:47 +00:00
Matt Arsenault
2d7bc6b1e1 AMDGPU: Fix using incorrect private resource with no allocation
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285435 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 19:43:31 +00:00
Nicolai Haehnle
ce0a5230f5 AMDGPU: Fix SILoadStoreOptimizer when writes cannot be merged due register dependencies
Summary:
When finding a match for a merge and collecting the instructions that must
be moved, keep in mind that the instruction we merge might actually use one
of the defs that are being moved.

Fixes piglit spec/arb_enhanced_layouts/execution/component-layout/vs-tcs-load-output[-indirect].

The fact that the ds_read in the test case is not eliminated suggests that
there might be another problem related to alias analysis, but that's a
separate problem: this pass should still work correctly even when earlier
optimization passes missed something or were disabled.

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25829

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285273 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-27 08:15:07 +00:00
Yaxun Liu
c93e472923 AMDGPU: Refactor processor definition to use ISA version features
Add missing ISA versions 7.0.2/8.0.4/8.1.0. to backend.

Refactor processor definition to use ISA version features.

Fixed ISA version for stoney.

Based on Laurent Morichetti's patch.

Differential Revision: https://reviews.llvm.org/D25919

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285210 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-26 16:37:56 +00:00
Matt Arsenault
f63894ba9e Reapply "AMDGPU: Don't use offen if it is 0"
This reverts r283003

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285203 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-26 15:08:16 +00:00
Tom Stellard
1a633d108a AMDGPU/SI: Don't emit multi-dword flat memory ops when they might access scratch
Summary:
A single flat memory operations that might access the scratch buffer
can only access MaxPrivateElementSize bytes.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25788

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285198 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-26 14:38:47 +00:00