Commit Graph

71 Commits

Author SHA1 Message Date
Tom Stellard
adf1294087 Merging r266825:
------------------------------------------------------------------------
r266825 | nhaehnle | 2016-04-19 14:58:22 -0700 (Tue, 19 Apr 2016) | 12 lines

AMDGPU: Guard VOPC instructions against incorrect commute

Summary:
The added testcase, which triggered this, was derived from a shader-db case
via bugpoint. A separate question is why scalar branching wasn't used.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19208

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_38@271767 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-04 03:43:02 +00:00
Tom Stellard
0afb7d7e71 Merging r266152:
------------------------------------------------------------------------
r266152 | thomas.stellard | 2016-04-12 16:57:30 -0700 (Tue, 12 Apr 2016) | 13 lines

AMDGPU/SI: Fix spilling of 96-bit registers

Summary:
It seems like this was broken in r252327.  I thought we had test cases
for this, but it's really hard to tirgger spills of this exact register
size since they aren't used very much.

Reviewers: arsenm, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19021

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_38@271735 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 20:48:40 +00:00
Tom Stellard
47bf9db963 Merging r262732:
------------------------------------------------------------------------
r262732 | thomas.stellard | 2016-03-04 10:31:18 -0800 (Fri, 04 Mar 2016) | 12 lines

AMDGPU/SI: Add support for spiling SGPRs to scratch buffer

Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17592

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_38@271722 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 20:22:44 +00:00
Tom Stellard
d2741da1cd Merging r261385:
------------------------------------------------------------------------
r261385 | thomas.stellard | 2016-02-19 16:37:25 -0800 (Fri, 19 Feb 2016) | 20 lines

AMDGPU/SI: Use v_readfirstlane to legalize SMRD with VGPR base pointer

Summary:
Instead of trying to replace SMRD instructions with a VGPR base pointer
with an equivalent MUBUF instruction, we now copy the base pointer to
SGPRs using v_readfirstlane.

This is safe to do, because any load selected as an SMRD instruction
has been proven to have a uniform base pointer, so each thread in the
wave will have the same pointer value in VGPRs.

This will fix some errors on VI from trying to replace SMRD instructions
with addr64-enabled MUBUF instructions that don't exist.

Reviewers: arsenm, cfang, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17305

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_38@271700 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 18:16:01 +00:00
Tom Stellard
737edaf048 Merging r260651:
------------------------------------------------------------------------
r260651 | Matthew.Arsenault | 2016-02-11 18:40:47 -0800 (Thu, 11 Feb 2016) | 7 lines

AMDGPU: Set element_size in private resource descriptor

Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_38@271679 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 15:58:20 +00:00
Tom Stellard
d1c65e8935 Merging r260599:
------------------------------------------------------------------------
r260599 | thomas.stellard | 2016-02-11 13:45:07 -0800 (Thu, 11 Feb 2016) | 14 lines

AMDGPU/SI: Make sure MIMG descriptors and samplers stay in SGPRs

Summary:
It's possible to have resource descriptors and samplers stored in
VGPRs, either by a VMEM instruction or in the case of samplers,
floating-point calculations.  When this happens, we need to use
v_readfirstlane to copy these values back to sgprs.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17102

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_38@271642 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 09:50:12 +00:00
Tom Stellard
ab4b667eea Merging r260588:
------------------------------------------------------------------------
r260588 | thomas.stellard | 2016-02-11 13:14:34 -0800 (Thu, 11 Feb 2016) | 20 lines

AMDGPU/SI: When splitting SMRD instructions, add its users to VALU worklist

Summary:
When we split SMRD instructions into two MUBUFs we were adding the users
of the newly created MUBUFs to the VALU worklist.  However, the only
users these instructions had was the REG_SEQUENCE that was inserted
by splitSMRD when the original SMRD instruction was split.

We need to make sure to add the users of the original SMRD to the VALU
worklist before it is split.

I have a test case, but it requires one other bug fix, so it will be
added in a later commt.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D17101

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_38@271641 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 09:50:09 +00:00
Tom Stellard
873623bac3 Merging r260495:
------------------------------------------------------------------------
r260495 | Matthew.Arsenault | 2016-02-10 22:15:39 -0800 (Wed, 10 Feb 2016) | 9 lines

AMDGPU: Fix constant bus use check with subregisters

If the two operands to an instruction were both
subregisters of the same super register, it would incorrectly
think this counted as the same constant bus use.

This fixes the verifier error in fmin_legacy.ll which
was missing -verify-machineinstrs.

------------------------------------------------------------------------

git-svn-id: https://llvm.org/svn/llvm-project/llvm/branches/release_38@271640 91177308-0d34-0410-b5e6-96231b3b80d8
2016-06-03 09:50:08 +00:00
Nicolai Haehnle
cead1b4a6d AMDGPU/SI: Add SI Machine Scheduler
Summary:
It is off by default, but can be used
with --misched=si

Patch by: Axel Davy

Reviewers: arsenm, tstellarAMD, nhaehnle

Subscribers: nhaehnle, solenskiner, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D11885

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257609 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-13 16:10:10 +00:00
Nicolai Haehnle
702b589510 AMDGPU/SI: Fold operands with sub-registers
Summary:
Multi-dword constant loads generated unnecessary moves from SGPRs into VGPRs,
increasing the code size and VGPR pressure. These moves are now folded away.

Note that this lack of operand folding was not a problem for VMEM loads,
because COPY nodes from VReg_Nnn to VGPR32 are eliminated by the register
coalescer.

Some tests are updated, note that the fsub.ll test explicitly checks that
the move is elided.

With the IR generated by current Mesa, the changes are obviously relatively
minor:

7063 shaders in 3531 tests
Totals:
SGPRS: 351872 -> 352560 (0.20 %)
VGPRS: 199984 -> 200732 (0.37 %)
Code Size: 9876968 -> 9881112 (0.04 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Scratch: 1779712 -> 1767424 (-0.69 %) bytes per wave
Wait states: 295164 -> 295337 (0.06 %)

Totals from affected shaders:
SGPRS: 65784 -> 66472 (1.05 %)
VGPRS: 38064 -> 38812 (1.97 %)
Code Size: 1993828 -> 1997972 (0.21 %) bytes
LDS: 42 -> 42 (0.00 %) blocks
Scratch: 795648 -> 783360 (-1.54 %) bytes per wave
Wait states: 54026 -> 54199 (0.32 %)

Reviewers: tstellarAMD, arsenm, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15875

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@257074 91177308-0d34-0410-b5e6-96231b3b80d8
2016-01-07 17:10:29 +00:00
Nicolai Haehnle
0589c22ae9 AMDGPU/SI: use S_MOV_B64 for larger copies in copyPhysReg
Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15629

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256073 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-19 01:36:26 +00:00
Nicolai Haehnle
710bb5a598 AMDGPU: fix overlapping copies in copyPhysReg
Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.

Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)

Together with r255906, this fixes a VM fault in Unreal Elemental Demo.

While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15622

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256072 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-19 01:16:06 +00:00
Changpeng Fang
cd00b72f32 AMDGPU/SI: Test commit
Summary: This is just my first commit. Test!

    Reviewers: none

    Subscribers: none

    Differential Revision: none

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256022 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-18 20:04:28 +00:00
Changpeng Fang
4990717b55 Revert "AMDGPU/SI: Test commit"
This reverts commit a493cb636e0152ad28210934a47c6c44b1437193.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256021 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-18 20:04:26 +00:00
Changpeng Fang
dce96eea12 AMDGPU/SI: Test commit
Summary: This is just my first commit. Test!

Reviewers: none

Subscribers: none

Differential Revision: none

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256020 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-18 19:57:41 +00:00
Nicolai Haehnle
7c502030bf AMDGPU: Fix off-by-one in SIRegisterInfo::eliminateFrameIndex
Summary:
The method insertNOPs expected the number of wait states to be passed as
parameter, while eliminateFrameIndex passed the immediate argument for the
S_NOP, leading to an off-by-one error. Rename the method to make the
meaning of its parameter clearer. The number of 4 / 5 wait states (which
is what the method has always _tried_ to do according to the comment) is
correct according to the hardware docs.

I stumbled upon this while trying to track down the cause of
https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed,
this patch unfortunately does not fix that bug...

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15542

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255906 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-17 16:46:42 +00:00
Tom Stellard
7d2a810fef AMDGPU/SI: Emit constant arrays in the .text section
Summary:
This allows us to remove the END_OF_TEXT_LABEL hack we had been using
and simplifies the fixups used to compute the address of constant
arrays.

Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D15257

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@255204 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-10 02:13:01 +00:00
Matt Arsenault
dc53fde2a4 AMDGPU: Optimize VOP2 operand legalization
Don't use commuteInstruction, and don't commute if
doing so will not improve legality. Skip the more
complex checks for literal operands and constant bus restrictions,
which are not a concern for VOP2 instructions because src1
does not accept SGPRs or constants and few implicitly
read vcc.

This gets called quite a few times and the
attempts at commuting are a significant fraction
of the time spent in SIFixSGPRCopies, so it's
somewhat worthwhile to optimize. With this patch and others
leading up to it, this reduces the compile time of SIFixSGPRCopies
on some of the LuxMark 2 kernels from ~8ms to ~5ms on my system.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254452 91177308-0d34-0410-b5e6-96231b3b80d8
2015-12-01 19:57:17 +00:00
Matt Arsenault
0f1b95f818 AMDGPU: Rework how private buffer passed for HSA
If we know we have stack objects, we reserve the registers
that the private buffer resource and wave offset are passed
and use them directly.

If not, reserve the last 5 SGPRs just in case we need to spill.
After register allocation, try to pick the next available registers
instead of the last SGPRs, and then insert copies from the inputs
to the reserved registers in the progloue.

This also only selectively enables all of the input registers
which are really required instead of always enabling them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254331 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-30 21:16:03 +00:00
Matt Arsenault
d4a0a430cc AMDGPU: Rename enums to be consistent with HSA code object terminology
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254330 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-30 21:15:57 +00:00
Matt Arsenault
956f59ab56 AMDGPU: Remove SIPrepareScratchRegs
It does not work because of emergency stack slots.
This pass was supposed to eliminate dummy registers for the
spill instructions, but the register scavenger can introduce
more during PrologEpilogInserter, so some would end up
left behind if they were needed.

The potential for spilling the scratch resource descriptor
and offset register makes doing something like this
overly complicated. Reserve registers to use for the resource
descriptor and use them directly in eliminateFrameIndex.

Also removes creating another scratch resource descriptor
when directly selecting scratch MUBUF instructions.

The choice of which registers are reserved is temporary.
For now it attempts to pick the next available registers
after the user and system SGPRs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254329 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-30 21:15:53 +00:00
Marek Olsak
73f0848ca2 AMDGPU/SI: select S_ABS_I32 when possible (v2)
v2: added more tests, moved the SALU->VALU conversion to a separate function

It looks like it's not possible to get subregisters in the S_ABS lowering
code, and I don't feel like guessing without testing what the correct code
would look like.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@254095 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-25 21:22:45 +00:00
Matt Arsenault
ade9b95acb AMDGPU: Create emergency stack slots during frame lowering
Test has a bogus verifier error which will be fixed by later commits.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252327 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-06 18:17:45 +00:00
Matt Arsenault
2b642eb437 AMDGPU: Remove unused scratch resource operands
The SGPR spill pseudos don't actually use them.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252324 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-06 18:07:53 +00:00
Matt Arsenault
af4bb57907 AMDGPU: Fix hardcoded alignment of spill.
Instead of forcing 4 alignment when spilled, set register class
alignments.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252322 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-06 17:54:47 +00:00
Matt Arsenault
26c74838a7 AMDGPU: Also track whether SGPRs were spilled
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252145 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-05 05:27:10 +00:00
Matt Arsenault
76b6b15dcd AMDGPU: Fix assert when legalizing atomic operands
The operand layout is slightly different for the atomic
opcodes from the usual MUBUF loads and stores.

This should only fix it on SI/CI. VI is still broken
because it still emits the addr64 replacement.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@252140 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-05 02:46:56 +00:00
Matt Arsenault
4447636b49 AMDGPU: Make findUsedSGPR more readable
Add more comments etc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@251996 91177308-0d34-0410-b5e6-96231b3b80d8
2015-11-03 22:30:15 +00:00
Matt Arsenault
110f55db52 AMDGPU: Simplify VOP3 operand legalization.
This was checking for a variety of situations that should
never happen. This saves a tiny bit of compile time.

We should not be selecting instructions with invalid operands in the
first place. Most of the time for registers copys are inserted
to the correct operand register class.

For VOP3, since all operand types are supported and literal
constants never are, we just need to verify the constant bus
requirements (all immediates should be legal inline ones).

The only possibly tricky case to maybe worry about is if when
legalizing operands in moveToVALU with s_add_i32 and similar
instructions. If the original s_add_i32 had a literal constant
and we need to replace it with v_add_i32_e64 we would have an
unsupported literal operand.  However, I don't think we should worry
about that because SIFoldOperands should handle folding literal
constant operands into the SALU instructions based on the uses.
At SIFoldOperands time, the legality and profitability of
operand types is a bit different.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250951 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-21 21:51:02 +00:00
Matt Arsenault
ca4c86d2fd AMDGPU: Fix not checking implicit operands in verifyInstruction
When verifying constant bus restrictions, this wasn't catching
uses in implicit operands.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250948 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-21 21:15:01 +00:00
Matt Arsenault
d2643e2ff9 AMDGPU: Add MachineInstr overloads for instruction format tests
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@250797 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-20 04:35:43 +00:00
Matt Arsenault
3f7c35a966 AMDGPU: Use explicit register size indirect pseudos
This stops using an unknown reg class operand.

Currently build_vector selection has a broken looking check
where it tries to use a VGPR reg class and an SGPR one if it
sees an SGPR use.

With the source operand has an explicit VGPR class,
illegal copies will be inserted that SIFixSGPRCopies will take care
of normally later, which will allow removing the weird check
of build_vector users. Without this, when removed v_movrels_b32 would
still be emitted even though all of the values were only stored in
SGPRs.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249494 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-07 00:42:51 +00:00
Matt Arsenault
29467e755f AMDGPU/SI: Add verifier check for exec reads
Make sure we aren't accidentally not setting
these in the instruction definitions.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@249170 91177308-0d34-0410-b5e6-96231b3b80d8
2015-10-02 18:58:37 +00:00
Marek Olsak
bc68baa694 AMDGPU/SI: Don't set DATA_FORMAT if ADD_TID_ENABLE is set
to prevent setting a huge stride, because DATA_FORMAT has a different
meaning if ADD_TID_ENABLE is set.

This is a candidate for stable llvm 3.7.

Tested-and-Reviewed-by: Christian König <christian.koenig@amd.com>

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248858 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-29 23:37:32 +00:00
Matt Arsenault
e706695c2f AMDGPU: Factor switch into separate function
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248742 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-28 20:54:57 +00:00
Matt Arsenault
3443ffa833 AMDGPU: Fix splitting x16 SMRD loads
When used recursively, this would set the kill flag
on the intermediate step from first splitting
x16 to x8.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248741 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-28 20:54:52 +00:00
Matt Arsenault
33d8695b88 AMDGPU: Fix moving SMRD loads with literal offsets on CI
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248740 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-28 20:54:46 +00:00
Matt Arsenault
9ed2f31125 AMDGPU: Fix splitting SMRD with large offset
The splitting of > 4 dword SMRD instructions
if using an offset in an SGPR instead of an immediate
was not setting the destination register,
resulting an an instruction missing an operand
which would assert later.

Test will be included in a following commit
which fixes a related issue.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248739 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-28 20:54:42 +00:00
Andrew Kaylor
aac3c943f3 Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing.
Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com)

Differential Revision: http://reviews.llvm.org/D11370



git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248735 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-28 20:33:22 +00:00
Matt Arsenault
728cde2865 AMDGPU: Construct new buffer instruction when moving SMRD
It's easier to understand creating a full instruction
than the current situation where sometimes a new
instruction is created and sometimes it is awkwardly
mutated in place.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248627 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-25 22:21:19 +00:00
Matt Arsenault
323c9fbce2 AMDGPU: Re-justify workaround and fix worked around problem
When buffer resource descriptors were built, the upper two components
of the descriptor were first composed into a 64-bit register because
legalizeOperands assumed all operands had the same register class.
Fix that problem, but keep the workaround. I'm not sure anything
actually is actually emitting such a REG_SEQUENCE now.

If multiple resource descriptors are set up with different base
pointers, this is copied with a single s_mov_b64. We probably
should fix this better by recognizing a pair of s_mov_b32 later,
but for now delete the dead code.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248585 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-25 17:08:42 +00:00
Matt Arsenault
7ba1878629 AMDGPU: Don't create REG_SEQUENCE with SGPR dest and VGPR sources
This avoids needting to re-legalize the new REG_SEQUENCE.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248584 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-25 17:08:40 +00:00
Matt Arsenault
a5e772ea93 AMDGPU: Return after instruction is processed.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248476 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-24 07:51:28 +00:00
Matt Arsenault
e7de900cec AMDGPU: Remove another unnecessary check from commuteInstruction
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248475 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-24 07:51:25 +00:00
Matt Arsenault
bb9c0afde5 AMDGPU: Reduce number of copies emitted
Instead of always inserting a copy in case
the super register is itself a subregister,
only extract to the super reg class if this is
actually the case.

This shouldn't really change codegen, but
makes looking at the output of SIFixSGPRCopies
easier to read.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248467 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-24 07:16:37 +00:00
Matt Arsenault
d89d4bccff AMDGPU: Remove unnecessary check
If the instruction doesn't have enough operands, it
either shouldn't be marked as isCommutable or is malformed.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@248242 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-22 04:17:45 +00:00
Matt Arsenault
3a2cec85a7 AMDGPU/SI: Fix more cases of losing exec operands
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247230 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-10 01:23:28 +00:00
Matt Arsenault
92a899b660 AMDGPU: Extract full 64-bit subregister and use subregs
Instead of extracting both 32-bit components from the 128-bit
register. This produces fewer copies and is easier for
the copy peephole optimizer to understand and see the actual uses
as extracts from a reg_sequence.

This avoids needing to handle subregister composing in the
PeepholeOptimizer's ValueTracker for this case.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@247162 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-09 17:03:29 +00:00
Matt Arsenault
6bf871423e AMDGPU: Fix adding redundant implicit operands
These are already added during the MachineInstr construction,
so this was adding the implicit registers twice.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246525 91177308-0d34-0410-b5e6-96231b3b80d8
2015-09-01 02:02:21 +00:00
Matt Arsenault
fe59e8ecf3 AMDGPU: Set mem operands for spill instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@246357 91177308-0d34-0410-b5e6-96231b3b80d8
2015-08-29 06:48:57 +00:00