10 Commits

Author SHA1 Message Date
Matt Arsenault
d706d030af AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298444 91177308-0d34-0410-b5e6-96231b3b80d8
2017-03-21 21:39:51 +00:00
Tom Stellard
55792f024b AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the scheduler
Summary:
The SILoadStoreOptimizer can now look ahead more then one instruction when
looking for instructions to merge, which greatly improves the number of
loads/stores that we are able to merge.

Moving the pass before scheduling avoids increasing register pressure after
the scheduler, so that the scheduler's register pressure estimates will be
more accurate.  It also gives more consistent results, since it is no longer
affected by minor scheduling changes.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23814

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279991 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-29 19:15:22 +00:00
Matt Arsenault
e5d9c7f0c4 AMDGPU: Remove superfluous string attributes from tests
Also fix v_mac.ll not testing right thing for fneg

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275129 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-11 23:35:48 +00:00
Matt Arsenault
b26a693dfd AMDGPU: Add volatile to test loads and stores
When the memory vectorizer is enabled, these tests break.
These tests don't really care about the memory instructions,
and it's easier to write check lines with the unmerged loads.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@266071 91177308-0d34-0410-b5e6-96231b3b80d8
2016-04-12 13:38:18 +00:00
Tom Stellard
d3adac51fc AMDGPU/SI: Enable lanemask tracking in misched
Summary:
This results in higher register usage, but should make it easier for
the compiler to hide latency.

This pass is a prerequisite for some more scheduler improvements, and I
think the increase register usage with this patch is acceptable, because
when combined with the scheduler improvements, the total register usage
will decrease.

shader-db stats:

2382 shaders in 478 tests
Totals:
SGPRS: 48672 -> 49088 (0.85 %)
VGPRS: 34148 -> 34847 (2.05 %)
Code Size: 1285816 -> 1289128 (0.26 %) bytes
LDS: 28 -> 28 (0.00 %) blocks
Scratch: 492544 -> 573440 (16.42 %) bytes per wave
Max Waves: 6856 -> 6846 (-0.15 %)
Wait states: 0 -> 0 (0.00 %)

Depends on D18451

Reviewers: nhaehnle, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18452

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@264876 91177308-0d34-0410-b5e6-96231b3b80d8
2016-03-30 16:35:09 +00:00
Matt Arsenault
fae18e933b AMDGPU: Remove some old intrinsic uses from tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@260493 91177308-0d34-0410-b5e6-96231b3b80d8
2016-02-11 06:02:01 +00:00
Matt Arsenault
7aed0ccd46 AMDGPU: Switch barrier intrinsics to using convergent
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256075 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-19 01:46:41 +00:00
Matt Arsenault
b617c550dc AMDGPU: Make v2i64/v2f64 legal types.
They can be loaded and stored, so count them as legal. This is
mostly to fix a number of common cases for load/store merging.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254086 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-25 19:58:34 +00:00
Matt Arsenault
3aa0d7cb53 AMDGPU/SI: Fix read2 merging into a super register.
If the read2 produced was supposed to be writing into a
super register, it would use the wrong subregister indices.
Fix this by inserting copies, so we only ever write to a vreg_64.
Run the register coalescer again to clean this up, although this
isn't ideal and often does result in an extra move.

Also remove the assert that offset1 > offset0.

There isn't a real reason to not allow this other than a minor
convenience in the compiler, and it doesn't seem worth the effort
of avoiding it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@242174 91177308-0d34-0410-b5e6-96231b3b80d8
2015-07-14 17:57:36 +00:00
Tom Stellard
953c681473 R600 -> AMDGPU rename
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@239657 91177308-0d34-0410-b5e6-96231b3b80d8
2015-06-13 03:28:10 +00:00