1068 Commits

Author SHA1 Message Date
Stanislav Mekhanoshin
64373efcab [AMDGPU] Fix illegal shrink of V_SUBB_U32 and V_ADDC_U32
If there is an immediate operand we shall not shrink V_SUBB_U32
and V_ADDC_U32, it does not fit e32 encoding.

Differential Revison: https://reviews.llvm.org/D34291

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305840 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 20:33:44 +00:00
Matt Arsenault
5516bd1387 AMDGPU: Do operand folding in program order
Before it was possible to partially fold use instructions
before the defs. After the xor is folded into a copy, the same
mov can end up in the fold list twice, so on the second attempt
it will fail expecting to see a register to fold.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305821 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:56:32 +00:00
Matthias Braun
0cc137e051 RegisterScavenging: Followup to r305625
This does some improvements/cleanup to the recently introduced
scavengeRegisterBackwards() functionality:

- Rewrite findSurvivorBackwards algorithm to use the existing
  LiveRegUnit::accumulateBackward() code. This also avoids the Available
  and Candidates bitset and just need 1 LiveRegUnit instance
  (= 1 bitset).
- Pick registers in allocation order instead of register number order.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305817 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:43:14 +00:00
Matt Arsenault
73854fd751 AMDGPU: Preserve undef when folding register operands
If the source was a copy of an undef register, this would
produce a read of an undefined register which is a verifier
error.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305816 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:41:31 +00:00
Stanislav Mekhanoshin
423a449bd5 [AMDGPU] Eliminate SGPR to VGPR copy when possible
SGPRs are generally cheaper, so try to use them over VGPRs.

Differential Revision: https://reviews.llvm.org/D34130

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305815 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:32:42 +00:00
Matt Arsenault
4cabef582f AMDGPU: Fix crash with undef vreg input operand
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305814 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-20 18:28:02 +00:00
Matt Arsenault
5a7b3305c5 AMDGPU: Fix scratch wave offset relative FI expansion
The offset may not be an inline immediate, so this needs
to be materialized into a register. The post-RA run of
SIShrinkInstructions is able to fold it later if it can.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305761 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 23:47:21 +00:00
Stanislav Mekhanoshin
7c44c2a308 [AMDGPU] Add infer address spaces pass before SROA
It adds it for the target after inlining but before SROA where
we can get most out of it.

Differential Revision: https://reviews.llvm.org/D34366

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305759 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 23:17:36 +00:00
Tom Stellard
865802ce11 AMDGPU/GlobalISel: Mark G_BITCAST s32 <--> <2 x s16> legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D34129

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305692 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-19 13:15:45 +00:00
Matthias Braun
cd03942492 RegScavenging: Add scavengeRegisterBackwards()
Re-apply r276044/r279124/r305516. Fixed a problem where we would refuse
to place spills as the very first instruciton of a basic block and thus
artifically increase pressure (test in
test/CodeGen/PowerPC/scavenging.mir:spill_at_begin)

This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.

This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.

Differential Revision: http://reviews.llvm.org/D21885

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305625 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-17 02:08:18 +00:00
Matthias Braun
bd1a266898 Revert "RegScavenging: Add scavengeRegisterBackwards()"
Revert because of reports of some PPC input starting to spill when it
was predicted that it wouldn't and no spillslot was reserved.

This reverts commit r305516.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305566 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-16 17:48:08 +00:00
Matthias Braun
02688b00ef RegScavenging: Add scavengeRegisterBackwards()
Re-apply r276044/r279124. Trying to reproduce or disprove the ppc64
problems reported in the stage2 build last time, which I cannot
reproduce right now.

This is a variant of scavengeRegister() that works for
enterBasicBlockEnd()/backward(). The benefit of the backward mode is
that it is not affected by incomplete kill flags.

This patch also changes
PrologEpilogInserter::doScavengeFrameVirtualRegs() to use the register
scavenger in backwards mode.

Differential Revision: http://reviews.llvm.org/D21885

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305516 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 22:14:55 +00:00
Alexander Timofeev
7807f69e9b DivergencyAnalysis patch for review
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305494 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-15 19:33:10 +00:00
Tom Stellard
c746f23920 AMDGPU/GlobalISel: Mark 32-bit G_ADD as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D33992

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305232 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-12 20:54:56 +00:00
Matt Arsenault
24acabd9f9 AMDGPU: Teach isLegalAddressingMode about flat offsets
Also fix reporting r+r as a valid addressing mode without
offsets.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305203 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-12 17:06:35 +00:00
Matt Arsenault
23ef7ef4e3 AMDGPU: Start selecting flat instruction offsets
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305201 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-12 16:53:51 +00:00
Matt Arsenault
4923776ab2 AMDGPU: Start adding offset fields to flat instructions
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305194 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-12 15:55:58 +00:00
Wei Ding
2d0aa2e9ba AMDGPU : Fix ISA Version Definitions.
Differential Revision: http://reviews.llvm.org/D28531

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305137 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-10 03:53:19 +00:00
Stanislav Mekhanoshin
525f44047e [AMDGPU] Add intrinsics for alignbit and alignbyte instructions
Differential Revision: https://reviews.llvm.org/D34046

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305098 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-09 19:03:00 +00:00
David Stuttard
00d555d436 [AMDGPU] Fix for issue in alloca to vector promotion pass
Summary:
Alloca promotion pass not dealing with non-canonical input

Added some additional checks so the pass simply backs-off forms it can't deal with (non-canonical)

Also added some test cases in non-canonical form to check that it no longer crashes

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31710

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305079 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-09 14:16:22 +00:00
Matt Arsenault
271bf6ebf9 AMDGPU: Use correct register names in inline assembly
Fixes using physical registers in inline asm from clang.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@305004 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-08 19:03:20 +00:00
Mark Searles
c5293f3fa4 [AMDGPU] Force qsads instrs to use different dest register than source registers
The V_MQSAD_PK_U16_U8, V_QSAD_PK_U16_U8, and V_MQSAD_U32_U8 take more than 1 pass in hardware. For these three instructions, the destination registers must be different than all sources, so that the first pass does not overwrite sources for the following passes.

Differential Revision: https://reviews.llvm.org/D33783

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304998 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-08 18:21:19 +00:00
Tom Stellard
55d2b21ff7 AMDGPU/GlobalISel: Mark 32-bit G_SELECT as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33949

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304910 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-07 13:54:51 +00:00
Tom Stellard
5d24d88bc7 AMDGPU/GlobalISel: Mark 32-bit G_ICMP as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33890

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304797 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-06 14:16:50 +00:00
Vivek Pandya
de22782d75 [Improve CodeGen Testing] This patch renables MIRPrinter print fields which have value equal to its default.
If -simplify-mir option is passed then MIRPrinter will not print such fields.
This change also required some lit test cases in CodeGen directory to be changed.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D32304


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304779 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-06 08:16:19 +00:00
Matt Arsenault
323e6e9ede RenameIndependentSubregs: Fix handling of undef tied operands
If a tied source operand was undef, it would be replaced but not
update the other tied operand, which would end up using different
virtual registers.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304747 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-05 22:58:57 +00:00
Stanislav Mekhanoshin
ca0adcb320 [AMDGPU] Fix SIFoldOperands crash with clamp
Fixes bug #33302. Pass did not account that Src1 of max instruction
can be an immediate.

Differential Revision: https://reviews.llvm.org/D33884

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304696 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-05 01:03:04 +00:00
Stanislav Mekhanoshin
110a2bc818 [AMDGPU] Untangle SDWA pass from SIShrinkInstructions
Remove dependency of SDWA pass on SIShrinkInstructions.
The goal is to move SDWA even higher in the stack to avoid second run
of MachineLICM, MachineCSE and SIFoldOperands.

Also added handling to preserve original src modifiers.

Differential Revision: https://reviews.llvm.org/D33860

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304665 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-03 17:39:47 +00:00
Tom Stellard
c9a1489af2 AMDGPU/GlobalISel: Mark 1-bit integer constants as legal
Summary:
These are mostly legal, but will probably need special lowering for some
cases.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D33791

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304628 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-03 01:13:33 +00:00
Stanislav Mekhanoshin
ced381c038 [AMDGPU] Preserve operand order in SIFoldOperands
SIFoldOperands can commute operands even if no folding was done.
This change is to preserve IR is no folding was done.

Differential Revision: https://reviews.llvm.org/D33802

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304625 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-03 00:41:52 +00:00
Konstantin Zhuravlyov
e0fcf72467 AMDGPU: Make auto waitcnt before barrier a feature
Differential Revision: https://reviews.llvm.org/D33793


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304571 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-02 17:40:26 +00:00
Alexander Timofeev
dfdb788875 AMDGPUAnnotateUniformValue should always treat volatile loads as divergent
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304554 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-02 15:25:52 +00:00
Mark Searles
48e0515b4f [AMDGPU] Turn on the new waitcnt insertion pass. Adjust tests.
-enable-si-insert-waitcnts=1 becomes the default
-enable-si-insert-waitcnts=0 to use old pass

Differential Revision: https://reviews.llvm.org/D33730

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304551 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-02 14:19:25 +00:00
Yaxun Liu
2bb8e7e8cc [AMDGPU] Fix kernel arg segment size for amdgizcl
Differential Revision: https://reviews.llvm.org/D33307


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304482 91177308-0d34-0410-b5e6-96231b3b80d8
2017-06-01 21:31:53 +00:00
Mark Searles
fa784827e1 [AMDGPU] Fix bugs in new waitcnt pass. Add test.
- new waitcnt pass remains off by default; -enable-si-insert-waitcnts=1 to enable it
- fix handling of PERMUTE ops
- fix insertion of waitcnt instrs at function begin/end ( port of analogous code that was added to old waitcnt pass )
- add new test

  Differential Revision: https://reviews.llvm.org/D33114

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304311 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-31 16:44:23 +00:00
Dmitry Preobrazhensky
4a31d77be2 [AMDGPU][MC] New syntax for ds_swizzle_b32 offset
See Bug 28601: https://bugs.llvm.org//show_bug.cgi?id=28601

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D33542

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304309 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-31 16:26:47 +00:00
Tim Northover
3fd4db31a7 MIR: update test for noVRegs removal.
I think I hadn't git pulled recently enough to bring it in.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304250 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-30 22:02:19 +00:00
Tim Northover
837e2e977f MIR: remove explicit "noVRegs" property.
We can infer this from the incoming MIR, so there's no reason to
represent it with a special flag.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304246 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-30 21:28:57 +00:00
Stanislav Mekhanoshin
f0c3d71794 [AMDGPU] Allow SDWA in instructions with immediates and SGPRs
An encoding does not allow to use SDWA in an instruction with
scalar operands, either literals or SGPRs. That is however possible
to copy these operands into a VGPR first.

Several copies of the value are produced if multiple SDWA conversions
were done. To cleanup MachineLICM (to hoist copies out of loops),
MachineCSE (to remove duplicate copies) and SIFoldOperands (to replace
SGPR to VGPR copy with immediate copy right to the VGPR) runs are added
after the SDWA pass.

Differential Revision: https://reviews.llvm.org/D33583

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304219 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-30 16:49:24 +00:00
Mark Searles
6781296ab7 [AMDGPU] Require waitcnt before barrier for all targets; adjust tests.
Differential Revision: https://reviews.llvm.org/D33576

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304217 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-30 16:22:43 +00:00
Konstantin Zhuravlyov
071c0d91bb Resubmit r303859 with test fixed.
[AMDGPU] add intrinsic for s_getpc

Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc.

Patch by Tim Corringham


git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304031 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-26 20:38:26 +00:00
Tom Stellard
f696b32065 AMDGPU/GlobalISel: Mark 32-bit float constants as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D33212

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304003 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-26 16:40:03 +00:00
Matthias Braun
94c4904dc5 CodeGen: Rename DEBUG_TYPE to match passnames
Rename the DEBUG_TYPE to match the names of corresponding passes where
it makes sense. Also establish the pattern of simply referencing
DEBUG_TYPE instead of repeating the passname where possible.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303921 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-25 21:26:32 +00:00
Nico Weber
b6e91f4292 Revert r303859, CodeGen/AMDGPU/llvm.amdgcn.s.getpc.ll fails on bots.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303902 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-25 19:19:29 +00:00
Tim Corringham
b88c01b1f7 [AMDGPU] add intrinsic for s_getpc
Summary: The s_getpc instruction is exposed as intrinsic llvm.amdgcn.s.getpc.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D32862

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303859 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-25 14:04:14 +00:00
Simon Pilgrim
6b1d32c6d7 [AMDGPU] Add INDIRECT_BASE_ADDR to R600_Reg32 class (PR33045)
This fixes 17 of the 41 -verify-machineinstrs test failures identified in PR33045

Differential Revision: https://reviews.llvm.org/D33451

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303691 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-23 21:27:15 +00:00
Changpeng Fang
6b7bd0e1f9 AMDGPU/SI: Move the local memory usage related checking after calling convention checking in PromoteAlloca
Summary:
  Promoting Alloca to Vector and Promoting Alloca to LDS are two independent handling of Alloca and should not affect each other.
As a result, we should not give up promoting to vector if there is not enough LDS. This patch factors out the local memory usage
related checking out and replace it after the calling convention checking.

Reviewer:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D33139

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303684 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-23 20:25:41 +00:00
Stanislav Mekhanoshin
6ff1a723f7 [AMDGPU] Combine and (srl) into shl (bfe)
Perform DAG combine:
and (srl x, c), mask => shl (bfe x, nb + c, mask >> nb), nb
Where nb is a number of trailing zeroes in mask.

It replaces two instructions with two and BFE is generally a more
expensive one. However this is only done if we are selecting a byte
or word at an aligned boundary which results in a proper SDWA
operand pattern. It is only done if SDWA is supported.

TODO: improve SDWA pass to actually convert this pattern. It is not
done now because we have an immediate in the instruction, which has
be moved into a VGPR.

Differential Revision: https://reviews.llvm.org/D33455

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303681 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-23 19:54:48 +00:00
Stanislav Mekhanoshin
ddde657138 [AMDGPU] Convert shl (add) into add (shl)
shl (or|add x, c2), c1 => or|add (shl x, c1), (c2 << c1)
This allows to fold a constant into an address in some cases as
well as to eliminate second shift if the expression is used as
an address and second shift is a result of a GEP.

Differential Revision: https://reviews.llvm.org/D33432

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303641 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-23 15:59:58 +00:00
Stanislav Mekhanoshin
69edad7913 [AMDGPU] Narrow lshl from 64 to 32 bit if possible
Turn expensive 64 bit shift into 32 bit if shift does not overflow int:
shl (ext x) => zext (shl x)

Differential Revision: https://reviews.llvm.org/D33367

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@303569 91177308-0d34-0410-b5e6-96231b3b80d8
2017-05-22 16:58:10 +00:00