75 Commits

Author SHA1 Message Date
Matt Arsenault
40cf5b3d29 AMDGPU: Fix crash when disassembling VOP3 mac
The unused dummy src2_modifiers is missing, so it crashes
when trying to print it.

I tried to fully remove src2_modifiers, but there are some
irritations in the places where it is converted to mad since
it starts to require modifying use lists while iterating over
them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299861 91177308-0d34-0410-b5e6-96231b3b80d8
2017-04-10 17:58:06 +00:00
Yaxun Liu
ab3be33d40 [AMDGPU] Get address space mapping by target triple environment
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.

The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.

Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.

Differential Revision: https://reviews.llvm.org/D31284


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298846 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-27 14:04:01 +00:00
Matt Arsenault
87fd70245a AMDGPU: Add VOP3P instruction format
Add a few non-VOP3P but instructions related to packed.

Includes hack with dummy operands for the benefit of the assembler

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296368 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-27 18:49:11 +00:00
Matt Arsenault
aac82e218f AMDGPU: Redefine clamp node as clamp 0.0-1.0
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.

Also allow using clamp with f16, and use knowledge
of dx10_clamp.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295788 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-21 23:35:48 +00:00
Matt Arsenault
a418139e85 AMDGPU: Fix assembler subtarget predicate for gfx9
This was accepting GFX9 instructions on VI.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295557 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-18 19:12:26 +00:00
Matt Arsenault
83c857cd3a AMDGPU: Merge initial gfx9 support
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295554 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-18 18:29:53 +00:00
Alexander Timofeev
23db8abf86 Revert "[AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track"
This reverts commit ce06d9cb99298eb844b66e117f5108a06747c907.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295054 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-14 14:29:05 +00:00
Wei Ding
c75c94d0eb AMDGPU : Add trap handler support.
Differential Revision: http://reviews.llvm.org/D26010

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294692 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-10 02:15:29 +00:00
Konstantin Zhuravlyov
6270090e5c [AMDGPU] Calculate number of min/max SGPRs/VGPRs for WavesPerEU instead of using switch statement
Differential Revision: https://reviews.llvm.org/D29741


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294627 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-09 21:33:23 +00:00
Konstantin Zhuravlyov
017228cd76 [AMDGPU] Add target information that is required by tools to metadata
Differential Revision: https://reviews.llvm.org/D28760#fb670e28


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294449 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-08 14:05:23 +00:00
Konstantin Zhuravlyov
f33b956e6c [AMDGPU][NFC] De-tabify
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294445 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-08 13:29:23 +00:00
Konstantin Zhuravlyov
c478d3544a [AMDGPU] Move register related queries to subtarget class
Differential Revision: https://reviews.llvm.org/D29318


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294440 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-08 13:02:33 +00:00
Alexander Timofeev
ce06d9cb99 [AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track
lane masks.

	 Differential revision: https://reviews.llvm.org/D29442

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@294324 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-07 17:57:48 +00:00
Stanislav Mekhanoshin
a1d4ee75a4 [AMDGPU] Account workgroup size in LDS occupancy limits
Functions matching LDS use to occupancy return results for a workgroup
of 64 workitems. The numbers has to be adjusted for bigger workgroups.
For example a workgroup of size 256 already occupies 4 waves just by
itself. Given that all numbers of LDS use in the compiler are per
workgroup, occupancy shall be multiplied by 4 in this case. Each 64
workitems still limited by the same number, but 4 subrgoups 64 workitems
each can afford 4 times more LDS to get the same occupancy.

In addition change initializes LDS size in the subtarget to a real value
for SI+ targets. This is required since LDS size is a variable in these
calculations.

Differential Revision: https://reviews.llvm.org/D29423

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293837 91177308-0d34-0410-b5e6-96231b3b80d8
2017-02-01 22:59:50 +00:00
Matt Arsenault
1aa15b49aa AMDGPU: Enable FeatureFlatForGlobal on Volcanic Islands
Accomplishes what r292982 was supposed to, which ended up
only really making the necessary test changes.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293310 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-27 17:42:26 +00:00
Tom Stellard
d367f44048 AMDGPU add support for spilling to a user sgpr pointed buffers
Summary:
This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1].

Patch By: Dave Airlie

Reviewers: nhaehnle, arsenm, tstellarAMD

Reviewed By: arsenm

Subscribers: mareko, llvm-commits, kzhuravl, wdng, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D25428

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293000 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-25 01:25:13 +00:00
Matt Arsenault
d019e8638a Enable FeatureFlatForGlobal on Volcanic Islands
This switches to the workaround that HSA defaults to
for the mesa path.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292982 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-24 22:02:15 +00:00
Matt Arsenault
f3a691f0b0 AMDGPU: Combine fp16/fp64 subtarget features
The same control register controls both, and are set to
the same defaults. Keep the old names around as aliases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292837 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-23 22:31:03 +00:00
Benjamin Kramer
b7b123cc24 Pacify -Wreorder.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292599 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-20 10:37:53 +00:00
Sam Kolton
1b647e664a [AMDGPU] Add subtarget features for SDWA/DPP
Reviewers: vpykhtin, artem.tamazov, tstellarAMD

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28900

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@292596 91177308-0d34-0410-b5e6-96231b3b80d8
2017-01-20 10:01:25 +00:00
Eugene Zelenko
359c877504 [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289475 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-12 22:23:53 +00:00
Alexander Timofeev
9ae2b48b84 [AMDGPU] Scalarization of global uniform loads.
Summary:
LC can currently select scalar load for uniform memory access
basing on readonly memory address space only. This restriction
originated from the fact that in HW prior to VI vector and scalar caches
are not coherent. With MemoryDependenceAnalysis we can check that the
memory location corresponding to the memory operand of the LOAD is not
clobbered along the all paths from the function entry.

Reviewers: rampitec, tstellarAMD, arsenm

Subscribers: wdng, arsenm, nhaehnle

Differential Revision: https://reviews.llvm.org/D26917

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289076 91177308-0d34-0410-b5e6-96231b3b80d8
2016-12-08 17:28:47 +00:00
Konstantin Zhuravlyov
9027123253 [AMDGPU] Add f16 support (VI+)
Differential Revision: https://reviews.llvm.org/D25975


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286753 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-13 07:01:11 +00:00
Matt Arsenault
ac5efca3f0 AMDGPU: Use 1/2pi inline imm on VI
I'm guessing at how it is supposed to be printed

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285490 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-29 04:05:06 +00:00
Matt Arsenault
6cabc8f486 AMDGPU: Diagnose using too many SGPRs
This is possible when using inline asm.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285447 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 20:31:47 +00:00
Tom Stellard
a9c6165732 AMDGPU/SI: Don't allow unaligned scratch access
Summary: The hardware doesn't support this.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25523

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284257 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-14 18:10:39 +00:00
Matt Arsenault
7e2ade4213 AMDGPU: Add instruction definitions for VGPR indexing
VI added a second method of indexing into VGPRs
besides using v_movrel*

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284027 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-12 18:00:51 +00:00
Tom Stellard
bf101a6e08 AMDGPU/SI: Include implicit arguments in kernarg_segment_byte_size
Reviewers: arsenm

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D24835

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282223 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-23 01:33:26 +00:00
Konstantin Zhuravlyov
1f99c41083 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280747 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-06 20:22:28 +00:00
Tom Stellard
4a5c408c28 AMDGPU/SI: Implement a custom MachineSchedStrategy
Summary:
GCNSchedStrategy re-uses most of GenericScheduler, it's just uses
a different method to compute the excess and critical register
pressure limits.

It's not enabled by default, to enable it you need to pass -misched=gcn
to llc.

Shader DB stats:

32464 shaders in 17874 tests
Totals:
SGPRS: 1542846 -> 1643125 (6.50 %)
VGPRS: 1005595 -> 904653 (-10.04 %)
Spilled SGPRs: 29929 -> 27745 (-7.30 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 36688188 -> 37034900 (0.95 %) bytes
LDS: 1913 -> 1913 (0.00 %) blocks
Max Waves: 254101 -> 265125 (4.34 %)
Wait states: 0 -> 0 (0.00 %)

Totals from affected shaders:
SGPRS: 1338220 -> 1438499 (7.49 %)
VGPRS: 886221 -> 785279 (-11.39 %)
Spilled SGPRs: 29869 -> 27685 (-7.31 %)
Spilled VGPRs: 334 -> 352 (5.39 %)
Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread
Code Size: 34315716 -> 34662428 (1.01 %) bytes
LDS: 1551 -> 1551 (0.00 %) blocks
Max Waves: 188127 -> 199151 (5.86 %)
Wait states: 0 -> 0 (0.00 %)

Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23688

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279995 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-29 19:42:52 +00:00
Matt Arsenault
d751c97ce5 AMDGPU: Fix crashes on memory functions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278369 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-11 17:31:42 +00:00
Matt Arsenault
5895e79530 AMDGPU: Delete dead code
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276675 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-25 19:06:25 +00:00
Matt Arsenault
6e4fd1497d AMDGPU: Add feature for unaligned access
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274398 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-01 23:03:44 +00:00
Duncan P. N. Exon Smith
a354e21338 Target: Remove unused arguments from overrideSchedPolicy, NFC
TargetSubtargetInfo::overrideSchedPolicy takes two MachineInstr*
arguments (begin and end) that invite implicit conversions from
MachineInstrBundleIterator.  One option would be to change their type to
an iterator, but since they don't seem to have been used since the API
was added in 2010, I'm deleting the dead code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274304 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-01 00:23:27 +00:00
Matt Arsenault
364fc298eb AMDGPU: Fix global isel crashes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274039 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-28 17:42:09 +00:00
Matt Arsenault
b0b8d0af0c AMDGPU: Fix global isel build
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273964 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-28 00:11:26 +00:00
Matt Arsenault
d35aece639 AMDGPU: Implement per-function subtargets
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273940 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-27 20:48:03 +00:00
Matt Arsenault
dca409d5ad AMDGPU: Move subtarget feature checks into passes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273937 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-27 20:32:13 +00:00
Konstantin Zhuravlyov
20c7a48718 [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header
Debugger prologue is emitted if -mattr=+amdgpu-debugger-emit-prologue.

Debugger prologue writes work group IDs and work item IDs to scratch memory at fixed location in the following format:
  - offset 0: work group ID x
  - offset 4: work group ID y
  - offset 8: work group ID z
  - offset 16: work item ID x
  - offset 20: work item ID y
  - offset 24: work item ID z

Set
  - amd_kernel_code_t::debug_wavefront_private_segment_offset_sgpr to scratch wave offset reg
  - amd_kernel_code_t::debug_private_segment_buffer_sgpr to scratch rsrc reg
  - amd_kernel_code_t::is_debug_supported to true if all debugger features are enabled

Differential Revision: http://reviews.llvm.org/D20335


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273769 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-25 03:11:28 +00:00
Matt Arsenault
9af2418e41 AMDGPU: Remove disable-irstructurizer subtarget feature
The only real reason to use it is for testing, so replace
it with a command line option instead of a potentially function
dependent feature.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273653 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-24 06:30:22 +00:00
Matt Arsenault
759ed7e410 AMDGPU: Cleanup subtarget handling.
Split AMDGPUSubtarget into amdgcn/r600 specific subclasses.
This removes most of the static_casting of the basic codegen
classes everywhere, and tries to restrict the features
visible on the wrong target.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273652 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-24 06:30:11 +00:00
Matt Arsenault
6ef04515db AMDGPU: Make FrameLowering stack alignment 16
We don't need it to be that high. The natural alignment
for a single workitem's stack is 16.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@273448 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-22 17:47:39 +00:00
Matt Arsenault
9d6fa96f46 AMDGPU: Fix crashes on unknown processor name
If the processor name failed to parse for amdgcn,
the resulting output would have R600 ISA in it.

If the processor name was missing or invalid for R600,
the wavefront size would not be set and there would be
crashes from missing itinerary data.

Fixes crashes in future commit caused by dividing by the unset/0
wavefront size.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@271561 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-02 18:37:16 +00:00
Changpeng Fang
faf7289db3 AMDGPU/SI: Enable load-store-opt by default.
Summary: Enable load-store-opt by default, and update LIT tests.

Reviewers: arsenm

Differential Revision: http://reviews.llvm.org/D20694

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270894 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-26 19:35:29 +00:00
Konstantin Zhuravlyov
d7b9b912dd [AMDGPU][NFC] Rename ReserveTrapVGPRs -> ReserveRegs
Differential Revision: http://reviews.llvm.org/D20081


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@270594 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-24 18:37:18 +00:00
Matt Arsenault
7985e4be56 AMDGPU: Fix promote alloca pass creating huge arrays
This was assuming it could use all memory before, which is
a bad decision because it restricts occupancy.

By default, only try to use enough space that could reduce
occupancy to 7, an arbitrarily chosen limit.

Based on the exist LDS usage, try to round up to the limit
in the current tier instead of further hurting occupancy.
This isn't ideal, because it doesn't accurately know how much
space is going to be used for alignment padding.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269708 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-16 21:19:59 +00:00
Matt Arsenault
3e43181f87 AMDGPU: Change private_element_size to 4
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@269145 91177308-0d34-0410-b5e6-96231b3b80d8
2016-05-11 00:28:54 +00:00
Konstantin Zhuravlyov
d714ad3a0f [AMDGPU] Reserve VGPRs for trap handler usage if instructed
Differential Revision: http://reviews.llvm.org/D19235


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@267563 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-26 15:43:14 +00:00
Konstantin Zhuravlyov
5d42fbaf4c [AMDGPU] Add insert nops pass based on subtarget features instead of cl::opt
Also,
- Skip pass if machine module does not have debug info
- Minor comment changes
- Added test

Differential Revision: http://reviews.llvm.org/D19079


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266626 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-18 16:28:23 +00:00
Tom Stellard
65b55414cc AMDGPU: Add skeleton GlobalIsel implementation
Summary:
This adds the necessary target code to be able to run the ir translator.
Lowering function arguments and returns is a nop and there is no support
for RegBankSelect.

Reviewers: arsenm, qcolombet

Subscribers: arsenm, joker.eph, vkalintiris, llvm-commits

Differential Revision: http://reviews.llvm.org/D19077

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266356 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-14 19:09:28 +00:00