18 Commits

Author SHA1 Message Date
Matt Arsenault
1f7c1e174f AMDGPU: Add some function return test cases
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@366591 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-19 16:45:48 +00:00
Matt Arsenault
59762a4b80 AMDGPU: Make s34 the FP register
Make the FP register callee saved.

This is tricky because now the FP needs to be spilled in the prolog
relative to the incoming SP register, rather than the frame register
used throughout the rest of the function. I don't like how this
bypassess the standard mechanism for CSR spills just to get the
correct insert point. I may look for a better solution, since all CSR
VGPRs may also need to have all lanes activated. Another option might
be to make getFrameIndexReference change the base register if the
frame index is a CSR, and then try to figure out the right insertion
point in emitProlog.

If there is a free VGPR lane available for SGPR spilling, try to use
it for the FP. If that would require intrtoducing a new VGPR spill,
try to use a free call clobbered SGPR. Only fallback to introducing a
new VGPR spill as a last resort.

This also doesn't attempt to handle SGPR spilling with scalar stores.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365372 91177308-0d34-0410-b5e6-96231b3b80d8
2019-07-08 19:03:38 +00:00
Matt Arsenault
2e84e8180a AMDGPU: Always use s33 for global scratch wave offset
Every called function could possibly need this to calculate the
absolute address of stack objectst, and this avoids inserting a copy
around every call site in the kernel. It's also somewhat cleaner to
keep this in a callee saved SGPR.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@363990 91177308-0d34-0410-b5e6-96231b3b80d8
2019-06-20 21:58:24 +00:00
Matt Arsenault
32a10ab5dd AMDGPU: Correct maximum possible private allocation size
We were assuming a much larger possible per-wave visible stack
allocation than is possible:

faa3ae5138/src/core/runtime/amd_gpu_agent.cpp (L70)

Based on this, we can assume the high 15 bits of a frame index or sret
are 0. The frame index value is the per-lane offset, so the maximum
frame index value is MAX_WAVE_SCRATCH / wavesize.

Remove the corresponding subtarget feature and option that made
this configurable.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@361541 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-23 19:38:14 +00:00
Tim Renouf
a047778b62 [AMDGPU] Support for v3i32/v3f32
Added support for dwordx3 for most load/store types, but not DS, and not
intrinsics yet.

SI (gfx6) does not have dwordx3 instructions, so they are not enabled
there.

Some of this patch is from Matt Arsenault, also of AMD.

Differential Revision: https://reviews.llvm.org/D58902

Change-Id: I913ef54f1433a7149da8d72f4af54dbb13436bd9

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@356659 91177308-0d34-0410-b5e6-96231b3b80d8
2019-03-21 12:01:21 +00:00
Matt Arsenault
2390ffbd78 DAG: Handle odd vector sizes in calling conv splitting
This already worked if only one register piece was used,
but didn't if a type was split into multiple, unequal
sized pieces.

Fixes not splitting 3i16/v3f16 into two registers for
AMDGPU.

This will also allow fixing the ABI for 16-bit vectors
in a future commit so that it's the same for all subtargets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341801 91177308-0d34-0410-b5e6-96231b3b80d8
2018-09-10 11:49:23 +00:00
Matt Arsenault
7e212e4168 AMDGPU: Remove remnants of old address space mapping
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@341165 91177308-0d34-0410-b5e6-96231b3b80d8
2018-08-31 05:49:54 +00:00
Matt Arsenault
c358be353a AMDGPU: Stop wasting argument registers with v3i32/v3f32
SelectionDAGBuilder widens v3i32/v3f32 arguments to
to v4i32/v4f32 which consume an additional register.
In addition to wasting argument space, this produces extra
instructions since now it appears the 4th vector component has
a meaningful value to most combines.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@338197 91177308-0d34-0410-b5e6-96231b3b80d8
2018-07-28 14:11:34 +00:00
Matt Arsenault
9e41f5314e AMDGPU: Make v4i16/v4f16 legal
Some image loads return these, and it's awkward working
around them not being legal.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334835 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-15 15:15:46 +00:00
Matt Arsenault
4dea9c2811 AMDGPU: Fix not including v2f64 in SReg_128
Fixes assertion with calls returning v2f64.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334189 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-07 12:16:31 +00:00
Matt Arsenault
0c5ab47e3b AMDGPU: Use more custom insert/extract_vector_elt lowering
Apply to i8 vectors.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@334044 91177308-0d34-0410-b5e6-96231b3b80d8
2018-06-05 19:52:46 +00:00
Matt Arsenault
e2f4d3fdb7 AMDGPU: Add combine for trunc of bitcast from build_vector
If the truncate is only accessing the first element of the vector,
we can use the original source value.

This helps with some combine ordering issues after operations are
lowered to integer operations between bitcasts of build_vector.
In particular it stops unnecessarily materializing the unused
top half of a vector in some cases.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@331909 91177308-0d34-0410-b5e6-96231b3b80d8
2018-05-09 18:37:39 +00:00
Yaxun Liu
2930e5c52d [AMDGPU] Change constant addr space to 4
Differential Revision: https://reviews.llvm.org/D43170


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@325030 91177308-0d34-0410-b5e6-96231b3b80d8
2018-02-13 18:00:25 +00:00
Yaxun Liu
8b47f7bedd CodeGen: Fix SelectionDAGISel::LowerArguments for sret addr space
SelectionDAGISel::LowerArguments assumes sret addr space is 0, which is
not true for amdgcn---amdgiz target.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D40255


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319630 91177308-0d34-0410-b5e6-96231b3b80d8
2017-12-03 03:31:45 +00:00
Matt Arsenault
1de313a465 AMDGPU: Allow negative MUBUF vaddr for gfx9
GFX9 does not enable bounds checking for the resource descriptors
used for private access, so it should be OK to use vaddr with
a potentially negative value.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@319393 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-30 00:52:40 +00:00
Dmitry Preobrazhensky
6bc93b9a3b [AMDGPU][MC][GFX8][GFX9] Corrected names of integer v_{add/addc/sub/subrev/subb/subbrev}
See bug 34765: https://bugs.llvm.org//show_bug.cgi?id=34765

Reviewers: tamazov, SamWot, arsenm, vpykhtin

Differential Revision: https://reviews.llvm.org/D40088

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318675 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-20 18:24:21 +00:00
Matt Arsenault
bc9fb908bc AMDGPU: Don't use MUBUF vaddr if address may overflow
Effectively revert r263964. Before we would not
allow this if vaddr was not known to be positive.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@318240 91177308-0d34-0410-b5e6-96231b3b80d8
2017-11-15 00:45:43 +00:00
Matt Arsenault
a0540d3468 AMDGPU: Start defining a calling convention
Partially implement callee-side for arguments and return values.
byval doesn't work properly, and most likely sret or other on-stack
return values most as well.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303308 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-17 21:56:25 +00:00