Commit Graph

169 Commits

Author SHA1 Message Date
Nicolai Haehnle
1e27a618c6 AMDGPU: Fix legalization of MUBUF instructions in shaders
Summary:
The addr64-based legalization is incorrect for MUBUF instructions with idxen
set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions.  This affects
e.g.  shaders that access buffer textures.

Since we never actually need the addr64-legalization in shaders, this patch
takes the easy route and keys off the calling convention.  If this ever
affects (non-OpenGL) compute, the type of legalization needs to be chosen
based on some TSFlag.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D26747

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287339 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-18 11:55:52 +00:00
Tom Stellard
ae5ecee3ec AMDGPU/SI: Avoid creating unnecessary copies in the SIFixSGPRCopies pass
Summary:
1. Don't try to copy values to and from the same register class.
2. Replace copies with of registers with immediate values with v_mov/s_mov
   instructions.

The main purpose of this change is to make MachineSink do a better job of
determining when it is beneficial to split a critical edge, since the pass
assumes that copies will become move instructions.

This prevents a regression in uniform-cfg.ll if we enable critical edge
splitting for AMDGPU.

Reviewers: arsenm

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: https://reviews.llvm.org/D23408

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287131 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-16 18:42:17 +00:00
Matt Arsenault
339cf19d8c AMDGPU: Analyze mubuf with immediate soffset
Fixes giving up on clustering common addr64 accesses with
constant 0 soffset.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287018 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-15 20:14:27 +00:00
Stanislav Mekhanoshin
79f84ef39d [AMDGPU] Add wave barrier builtin
The wave barrier represents the discardable barrier. Its main purpose is to
carry convergent attribute, thus preventing illegal CFG optimizations. All lanes
in a wave come to convergence point simultaneously with SIMT, thus no special
instruction is needed in the ISA. The barrier is discarded during code generation.

Differential Revision: https://reviews.llvm.org/D26585

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@287007 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-15 19:00:15 +00:00
Matt Arsenault
4404d0d6e3 AMDGPU: Implement SGPR spilling with scalar stores
nThis avoids the nasty problems caused by using
memory instructions that read the exec mask while
spilling / restoring registers used for control flow
masking, but only for VI when these were added.

This always uses the scalar stores when enabled currently,
but it may be better to still try to spill to a VGPR
and use this on the fallback memory path.

The cache also needs to be flushed before wave termination
if a scalar store is used.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286766 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-13 18:20:54 +00:00
Konstantin Zhuravlyov
9027123253 [AMDGPU] Add f16 support (VI+)
Differential Revision: https://reviews.llvm.org/D25975


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286753 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-13 07:01:11 +00:00
Matt Arsenault
e5fd9c09ad AMDGPU: Preserve vcc undef flags when inverting branch
If the branch was on a read-undef of vcc, passes that used
analyzeBranch to invert the branch condition wouldn't preserve
the undef flag resulting in a verifier error.

Fixes verifier failures in a future commit.

Also fix verifier error when inserting copy for vccz
corruption bug.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286133 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 19:09:27 +00:00
Matt Arsenault
c6533e305b AMDGPU: Refactor copyPhysReg
Separate the subregister splitting logic to re-use later.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286118 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-07 16:39:22 +00:00
Nicolai Haehnle
08f7b24149 AMDGPU: Allow additional implicit operands on MOVRELS instructions
Summary:
The post-RA scheduler occasionally uses additional implicit operands when
the vector implicit operand as a whole is killed, but some subregisters
are still live because they are directly referenced later. Unfortunately,
this seems incredibly subtle to reproduce.

Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test
and others.

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25656

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285835 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-02 17:03:11 +00:00
Matt Arsenault
e9a23c4c57 AMDGPU: Default to using scalar mov to materialize immediate
This is the conservatively correct way because it's easy to
move or replace a scalar immediate. This was incorrect in the case
when the register class wasn't known from the static instruction
definition, but still needed to be an SGPR. The main example of this
is inlineasm has an SGPR constraint.

Also start verifying the register classes of inlineasm operands.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285762 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 22:55:07 +00:00
Matt Arsenault
5ec7e2e1d1 AMDGPU: Workaround for instruction size with literals
Instructions with a 32-bit base encoding with an optional
32-bit literal encoded after them report their size as 4
for the disassembler. Consider these when computing the
MachineInstr size. This fixes problems caused by size estimate
consistency in BranchRelaxation.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285743 91177308-0d34-0410-b5e6-96231b3b80d8
2016-11-01 20:42:24 +00:00
Matt Arsenault
ac5efca3f0 AMDGPU: Use 1/2pi inline imm on VI
I'm guessing at how it is supposed to be printed

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285490 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-29 04:05:06 +00:00
Tom Stellard
b15bbca10c AMDGPU/SI: Don't use non-0 waitcnt values when waiting on Flat instructions
Summary:
Flat instruction can return out of order, so we need always need to wait
for all the outstanding flat operations.

Reviewers: tony-tye, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl

Differential Revision: https://reviews.llvm.org/D25998

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285479 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 23:53:48 +00:00
Matt Arsenault
d6028cdcc7 AMDGPU: Add definitions for scalar store instructions
Also add glc bit to the scalar loads since they exist on VI
and change the caching behavior.

This currently has an assembler bug where the glc bit is incorrectly
accepted on SI/CI which do not have it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285463 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 21:55:15 +00:00
Matt Arsenault
2d7bc6b1e1 AMDGPU: Fix using incorrect private resource with no allocation
It's possible to have a use of the private resource descriptor or
scratch wave offset registers even though there are no allocated
stack objects. This would result in continuing to use the maximum
number reserved registers. This could go over the number of SGPRs
available on VI, or violate the SGPR limit requested by
the function attributes.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285435 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-28 19:43:31 +00:00
Matt Arsenault
44de456371 AMDGPU: Fix counting si_mask_branch as 4 bytes
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@285202 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-26 14:53:54 +00:00
Nicolai Haehnle
e07033aa42 AMDGPU: Fix Two Address problems with v_movreld
Summary:
The v_movreld machine instruction is used with three operands that are
in a sense tied to each other (the explicit VGPR_32 def and the implicit
VGPR_NN def and use). There is no way to express that using the currently
available operand bits, and indeed there are cases where the Two Address
instructions pass does the wrong thing.

This patch introduces a new set of pseudo instructions that are identical
in intended semantics as v_movreld, but they only have two tied operands.

Having to add a new set of pseudo instructions is admittedly annoying, but
it's a fairly straightforward and solid approach. The only alternative I
see is to try to teach the Two Address instructions pass about Three Address
instructions, and I'm afraid that's trickier and is going to end up more
fragile.

Note that v_movrels does not suffer from this problem, and so this patch
does not touch it.

This fixes several GL45-CTS.shaders.indexing.* tests.

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25633

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284980 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-24 14:56:02 +00:00
Konstantin Zhuravlyov
a91924b28a [AMDGPU] Emit 32-bit lo/hi got and pc relative variant kinds for external and global address space variables
Differential Revision: https://reviews.llvm.org/D25562


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284196 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-14 04:37:34 +00:00
Matt Arsenault
7b6a558f24 AMDGPU: Initial implementation of VGPR indexing mode
This is the most basic handling of the indirect access
pseudos using GPR indexing mode. This currently only enables
the mode for a single v_mov_b32 and then disables it.
This is much more complicated to use than the movrel instructions,
so a new optimization pass is probably needed to fold the access
into the uses and keep the mode enabled for them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@284031 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-12 18:49:05 +00:00
Matt Arsenault
ecc6c2b633 BranchRelaxation: Support expanding unconditional branches
AMDGPU needs to expand unconditional branches in a new
block with an indirect branch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@283464 91177308-0d34-0410-b5e6-96231b3b80d8
2016-10-06 16:20:41 +00:00
Matt Arsenault
7eba65d30c AMDGPU: Use unsigned compare for eq/ne
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282832 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-30 01:50:20 +00:00
Matt Arsenault
0461ece2ce AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@282667 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-29 01:44:16 +00:00
Matt Arsenault
982faf27a3 AMDGPU: Use i64 scalar compare instructions
VI added eq/ne for i64, so use them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281800 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-17 02:02:19 +00:00
Matt Arsenault
3a74bac021 AMDGPU: Use SOPK compare instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281780 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-16 21:41:16 +00:00
Matt Arsenault
93e6e5414d Finish renaming remaining analyzeBranch functions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281535 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 20:43:16 +00:00
Matt Arsenault
b1a710d5f0 Make analyzeBranch family of instruction names consistent
analyzeBranch was renamed to use lowercase first, rename
the related set to match.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281506 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 17:24:15 +00:00
Matt Arsenault
ab302cda5e AArch64: Use TTI branch functions in branch relaxation
The main change is to return the code size from
InsertBranch/RemoveBranch.

Patch mostly by Tim Northover

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281505 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-14 17:23:48 +00:00
Matt Arsenault
ec4f2a0c81 AMDGPU: Support commuting a FrameIndex operand
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281369 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-13 19:03:12 +00:00
Nicolai Haehnle
01a133c760 AMDGPU: Do not clobber SCC in SIWholeQuadMode
Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D22198

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281230 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-12 16:25:20 +00:00
Matt Arsenault
10eb5d500c AMDGPU: Implement is{LoadFrom|StoreTo}FrameIndex
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281128 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-10 01:20:33 +00:00
Matt Arsenault
3843a1382e AMDGPU: Fix immediate folding logic when shrinking instructions
If the literal is being folded into src0, it doesn't matter
if it's an SGPR because it's being replaced with the literal.

Also fixes initially selecting 32-bit versions of some instructions
which also confused commuting.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281117 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-09 23:32:53 +00:00
Sam Kolton
03d317688d AMDGPU] Assembler: better support for immediate literals in assembler.
Summary:
Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals.
E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least.

With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction).
Here are rules how we convert literals:
    - We parsed fp literal:
        - Instruction expects 64-bit operand:
            - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5)
                - then we do nothing this literal
            - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5)
                - report error
            - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant
                - If instruction expect fp operand type (f64)
                    - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5)
                        - If so then do nothing
                    - Else (e.g. v_fract_f64 v[0:1], 3.1415)
                        - report warning that low 32 bits will be set to zeroes and precision will be lost
                        - set low 32 bits of literal to zeroes
                - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5)
                    - report error as it is unclear how to encode this literal
        - Instruction expects 32-bit operand:
            - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow
            - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5)
                - do nothing
                - Else report error
            - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0)
    - Parsed binary literal:
        - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35)
            - do nothing
        - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35)
            - report error
        - Else, literal is not-inlinable and we are not required to inline it
            - Are high 32 bit of literal zeroes or same as sign bit (32 bit)
                - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef)
            - Else
                - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0)

For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types:
'''
enum OperandType {
    OPERAND_REG_IMM32_INT,
    OPERAND_REG_IMM32_FP,
    OPERAND_REG_INLINE_C_INT,
    OPERAND_REG_INLINE_C_FP,
}
'''

This is not working yet:
    - Several tests are failing
    - Problems with predicate methods for inline immediates
    - LLVM generated assembler parts try to select e64 encoding before e32.
More changes are required for several AsmOperands.

Reviewers: vpykhtin, tstellarAMD

Subscribers: arsenm, kzhuravl, artem.tamazov

Differential Revision: https://reviews.llvm.org/D22922

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@281050 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-09 14:44:04 +00:00
Matt Arsenault
1bc1003b7a AMDGPU: Sign extend constants when splitting them
This will confuse later passes which try to look at the
immediate value and don't truncate first.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280974 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-08 17:44:36 +00:00
Matt Arsenault
d764af3c4e AMDGPU: Support commuting with immediate in src0
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280970 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-08 17:19:29 +00:00
Konstantin Zhuravlyov
1f99c41083 [AMDGPU] Wave and register controls
- Implemented amdgpu-flat-work-group-size attribute
- Implemented amdgpu-num-active-waves-per-eu attribute
- Implemented amdgpu-num-sgpr attribute
- Implemented amdgpu-num-vgpr attribute
- Dynamic LDS constraints are in a separate patch

Patch by Tom Stellard and Konstantin Zhuravlyov

Differential Revision: https://reviews.llvm.org/D21562



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280747 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-06 20:22:28 +00:00
Tom Stellard
2ec1430711 AMDGPU/SI: Teach SIInstrInfo::FoldImmediate() to fold immediates into copies
Summary:
I put this code here, because I want to re-use it in a few other places.
This supersedes some of the immediate folding code we have in SIFoldOperands.
I think the peephole optimizers is probably a better place for folding
immediates into copies, since it does some register coalescing in the same time.

This will also make it easier to transition SIFoldOperands into a smarter pass,
where it looks at all uses of instruction at once to determine the optimal way to
fold operands.  Right now, the pass just considers one operand at a time.

Reviewers: arsenm

Subscribers: wdng, nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23402

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280744 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-06 20:00:26 +00:00
Matt Arsenault
d55f31515f AMDGPU: Set sizes of spill pseudos
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280595 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-03 17:25:44 +00:00
Matt Arsenault
c2c79a10fb AMDGPU: Fix spilling of m0
readlane/writelane do not support using m0 as the output/input.
Constrain the register class of spill vregs to try to avoid this,
but also handle spilling of the physreg when necessary by inserting
an additional copy to a normal SGPR.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280584 91177308-0d34-0410-b5e6-96231b3b80d8
2016-09-03 06:57:55 +00:00
Tom Stellard
675d6994e8 AMDGPU/SI: Query AA, if available, in areMemAccessesTriviallyDisjoint()
Summary:
The SILoadStoreOptimizer will need to use AliasAnalysis here in order to
move it before scheduling.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D23813

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279963 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-29 12:05:32 +00:00
Matt Arsenault
e52dfc95ef AMDGPU: Move cndmask pseudo to be isel pseudo
There's only one use of this for the convenience
of a pattern. I think v_mov_b64_pseudo should also be
moved, but SIFoldOperands does currently make use of it.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@279901 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-27 01:00:37 +00:00
Justin Bogner
6673ea81f6 Replace "fallthrough" comments with LLVM_FALLTHROUGH
This is a mechanical change of comments in switches like fallthrough,
fall-through, or fall-thru to use the LLVM_FALLTHROUGH macro instead.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278902 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-17 05:10:15 +00:00
Matt Arsenault
e8fc9abe31 AMDGPU: Fix not estimating MBB operand sizes correctly
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278590 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-13 01:43:54 +00:00
Matt Arsenault
0576028ed1 AMDGPU: Remove unnecessary cast
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@278274 91177308-0d34-0410-b5e6-96231b3b80d8
2016-08-10 19:11:45 +00:00
Matthias Braun
f79c57a412 MachineFunction: Return reference for getFrameInfo(); NFC
getFrameInfo() never returns nullptr so we should use a reference
instead of a pointer.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@277017 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 18:40:00 +00:00
Tom Stellard
04cc0adf58 AMDGPU/SI: Don't use reserved VGPRs for SGPR spilling
Summary:
We were using reserved VGPRs for SGPR spilling and this was causing
some programs with a workgroup size of 1024 to use more than 64
registers, which is illegal.

Reviewers: arsenm, mareko, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits, kzhuravl

Differential Revision: https://reviews.llvm.org/D22032

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276980 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-28 14:30:43 +00:00
Matt Arsenault
d506595769 AMDGPU: Make AMDGPUMachineFunction fields private
ABIArgOffset is a problem because properly fsetting the
KernArgSize requires that the reserved area before the
real kernel arguments be correctly aligned, which requires
fixing clover.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276766 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-26 16:45:58 +00:00
Matt Arsenault
4cead0b564 AMDGPU: Expand register indexing pseudos in custom inserter
This is to help moveSILowerControlFlow to before regalloc.
There are a couple of tradeoffs with this. The complete CFG
is visible to more passes, the loop body avoids an extra copy of m0,
vcc isn't required, and immediate offsets can be shrunk into s_movk_i32.

The disadvantage is the register allocator doesn't understand that
the single lane's vector is dead within the loop body, so an extra
register is used to outlive the loop block when expanding the
VGPR -> m0 loop. This also now results in worse waitcnt insertion
before the loop instead of after for pending operations at the point
of the indexing, but that should be fixed by future improvements to
cross block waitcnt insertion.

v_movreld_b32's operands are now modeled more correctly since vdst
is not a true output. This is kind of a hack to treat vdst as a
use operand. Extra checking is required in the verifier since
I can't seem to get tablegen to emit an implicit operand for a
virtual register.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275934 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-19 00:35:03 +00:00
Matt Arsenault
e066e581b1 AMDGPU: Fix verifier error from partially undef copy
In this situation:

%VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11,
%VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use>
%VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3
%VGPR4<def> = COPY %VGPR2

The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1,
but VGPR4 is defined immediately after this copy.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275635 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 22:32:02 +00:00
Jacques Pienaar
48ed4ab2d6 Rename AnalyzeBranch* to analyzeBranch*.
Summary: NFC. Rename AnalyzeBranch/AnalyzeBranchPredicate to analyzeBranch/analyzeBranchPredicate to follow LLVM coding style and be consistent with TargetInstrInfo's analyzeCompare and analyzeSelect.

Reviewers: tstellarAMD, mcrosier

Subscribers: mcrosier, jholewinski, jfb, arsenm, dschuff, jyknight, dsanders, nemanjai

Differential Revision: https://reviews.llvm.org/D22409

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275564 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-15 14:41:04 +00:00
Matt Arsenault
27c36c3430 AMDGPU: Cleanup pseudoinstructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@275133 91177308-0d34-0410-b5e6-96231b3b80d8
2016-07-12 00:23:17 +00:00