32 Commits

Author SHA1 Message Date
Carl Ritson
fb4b457dbd [AMDGPU] Move kill lowering to WQM pass and add live mask tracking
Move implementation of kill intrinsics to WQM pass. Add live lane
tracking by updating a stored exec mask when lanes are killed.
Use live lane tracking to enable early termination of shader
at any point in control flow.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D94746
2021-02-11 20:31:29 +09:00
dfukalov
f3ae5b9b8c [NFC][AMDGPU] Split AMDGPUSubtarget.h to R600 and GCN subtargets
... to reduce headers dependency.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D95036
2021-01-20 22:22:45 +03:00
dfukalov
d069b95364 [NFC][AMDGPU] Reduce include files dependency.
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D93813
2021-01-07 22:22:05 +03:00
Carl Ritson
5ece11e6b7 [AMDGPU] Fix lowering of S_MOV_{B32,B64}_term
If the source of S_MOV_{B32,B64}_term is an immediate then it
cannot be lowered to a COPY.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90451
2020-11-10 12:16:31 +09:00
Matt Arsenault
22a074ddad AMDGPU: Don't sometimes allow instructions before lowered si_end_cf
Since 6524a7a2b9ca072bd7f7b4355d1230e70c679d2f, this would sometimes
not emit the or to exec at the beginning of the block, where it really
has to be. If there is an instruction that defines one of the source
operands, split the block and turn the si_end_cf into a terminator.

This avoids regressions when regalloc fast is switched to inserting
reloads at the beginning of the block, instead of spills at the end of
the block.

In a future change, this should always split the block.
2020-09-18 13:43:01 -04:00
Matt Arsenault
f7d5ce97f6 AMDGPU: Optimize copies to exec with other insts after exec def
It's possible to have terminator instructions after a write to exec,
so skip over them to find it.
2020-07-28 21:34:50 -04:00
Matt Arsenault
9df19f7114 AMDGPU: Don't assume there is only one terminator copy
This would stop on the first in reverse order, failing the verifier if
there were more earlier in the block.
2020-07-28 20:22:38 -04:00
Matt Arsenault
7377805a5a AMDGPU: Use Register 2019-12-27 16:53:21 -05:00
Reid Kleckner
68092989f3 Sink all InitializePasses.h includes
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.

I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
  recompiles    touches affected_files  header
  342380        95      3604    llvm/include/llvm/ADT/STLExtras.h
  314730        234     1345    llvm/include/llvm/InitializePasses.h
  307036        118     2602    llvm/include/llvm/ADT/APInt.h
  213049        59      3611    llvm/include/llvm/Support/MathExtras.h
  170422        47      3626    llvm/include/llvm/Support/Compiler.h
  162225        45      3605    llvm/include/llvm/ADT/Optional.h
  158319        63      2513    llvm/include/llvm/ADT/Triple.h
  140322        39      3598    llvm/include/llvm/ADT/StringRef.h
  137647        59      2333    llvm/include/llvm/Support/Error.h
  131619        73      1803    llvm/include/llvm/Support/FileSystem.h

Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.

Reviewers: bkramer, asbirlea, bollu, jdoerfert

Differential Revision: https://reviews.llvm.org/D70211
2019-11-13 16:34:37 -08:00
Matt Arsenault
9dc720a012 Revert "AMDGPU: Fix iterator error when lowering SI_END_CF"
This reverts r367500 and r369203. This is causing various test
failures.

llvm-svn: 369417
2019-08-20 17:45:25 +00:00
Daniel Sanders
f8a414589e Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00
Matt Arsenault
2aa9ce0a05 Reapply "AMDGPU: Split block for si_end_cf"
This reverts commit r359363, reapplying r357634

llvm-svn: 367500
2019-08-01 01:25:27 +00:00
Stanislav Mekhanoshin
38d5b257db [AMDGPU] gfx908 mfma support
Differential Revision: https://reviews.llvm.org/D64584

llvm-svn: 365824
2019-07-11 21:19:33 +00:00
Stanislav Mekhanoshin
cefbe0fc94 [AMDGPU] gfx10 conditional registers handling
This is cpp source part of wave32 support, excluding overriden
getRegClass().

Differential Revision: https://reviews.llvm.org/D63351

llvm-svn: 363513
2019-06-16 17:13:09 +00:00
Mark Searles
f4a7cf93fd Revert "AMDGPU: Split block for si_end_cf"
This reverts commit 7a6ef3004655dd86d722199c471ae78c28e31bb4.

We discovered some internal test failures, so reverting for now.

Differential Revision: https://reviews.llvm.org/D61213

llvm-svn: 359363
2019-04-27 00:51:18 +00:00
Matt Arsenault
13d22d7325 AMDGPU: Split block for si_end_cf
Relying on no spill or other code being inserted before this was
precarious. It relied on code diligently checking isBasicBlockPrologue
which is likely to be forgotten.

Ideally this could be done earlier, but this doesn't work because of
phis. Any other instruction can't be placed before them, so we have to
accept the position being incorrect during SSA.

This avoids regressions in the fast register allocator rewrite from
inverting the direction.

llvm-svn: 357634
2019-04-03 20:53:20 +00:00
Chandler Carruth
ae65e281f3 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Tom Stellard
e236141513 AMDGPU: Refactor Subtarget classes
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
  AMDGPUSubtarget::Generation.

Reviewers: arsenm, jvesely

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D49037

llvm-svn: 336851
2018-07-11 20:59:01 +00:00
Tom Stellard
6f27d8c6b3 AMDGPU: Remove #include "MCTargetDesc/AMDGPUMCTargetDesc.h" from common headers
Summary:
MCTargetDesc/AMDGPUMCTargetDesc.h contains enums for all the instuction
and register defintions, which are huge so we only want to include
them where needed.

This will also make it easier if we want to split the R600 and GCN
definitions into separate tablegenerated files.

I was unable to remove AMDGPUMCTargetDesc.h from SIMachineFunctionInfo.h
because it uses some enums from the header to initialize default values
for the SIMachineFunction class, so I ended up having to remove includes of
SIMachineFunctionInfo.h from headers too.

Reviewers: arsenm, nhaehnle

Reviewed By: nhaehnle

Subscribers: MatzeB, kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D46272

llvm-svn: 332930
2018-05-22 02:03:23 +00:00
Nicola Zaghen
9667127c14 Rename DEBUG macro to LLVM_DEBUG.
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.

In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.

Differential Revision: https://reviews.llvm.org/D43624

llvm-svn: 332240
2018-05-14 12:53:11 +00:00
Nicolai Haehnle
626212884c AMDGPU: Fix a corner case crash in SIOptimizeExecMasking
Summary:
See the new test case; this is really unlikely to happen with real code,
but I ran into this while attempting to bugpoint-reduce a different issue.

Change-Id: I9ade1dc1aa8fd9c4d9fc83661d7b80e310b5c4a6

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D45885

llvm-svn: 330585
2018-04-23 13:05:50 +00:00
Geoff Berry
097bf66bf4 [MachineOperand][Target] MachineOperand::isRenamable semantics changes
Summary:
Add a target option AllowRegisterRenaming that is used to opt in to
post-register-allocation renaming of registers.  This is set to 0 by
default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq
fields of all opcodes to be set to 1, causing
MachineOperand::isRenamable to always return false.

Set the AllowRegisterRenaming flag to 1 for all in-tree targets that
have lit tests that were effected by enabling COPY forwarding in
MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC,
RISCV, Sparc, SystemZ and X86).

Add some more comments describing the semantics of the
MachineOperand::isRenamable function and how it is set and maintained.

Change isRenamable to check the operand's opcode
hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of
relying on it being consistently reflected in the IsRenamable bit
setting.

Clear the IsRenamable bit when changing an operand's register value.

Remove target code that was clearing the IsRenamable bit when changing
registers/opcodes now that this is done conservatively by default.

Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in
one place covering all opcodes that have constant pipe read limit
restrictions.

Reviewers: qcolombet, MatzeB

Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits

Differential Revision: https://reviews.llvm.org/D43042

llvm-svn: 325931
2018-02-23 18:25:08 +00:00
Geoff Berry
85049ee936 [AMDGPU][X86][Mips] Make sure renamable bit not set for reserved regs
Summary:
Fix a few places that were modifying code after register
allocation to set the renamable bit correctly to avoid failing the
validation added in D42449.

llvm-svn: 323675
2018-01-29 18:47:48 +00:00
Matthias Braun
ddd8ed6709 MachineFunction: Return reference from getFunction(); NFC
The Function can never be nullptr so we can return a reference.

llvm-svn: 320884
2017-12-15 22:22:58 +00:00
Francis Visoiu Mistrih
eba849e869 [CodeGen] Rename functions PrintReg* to printReg*
LLVM Coding Standards:
  Function names should be verb phrases (as they represent actions), and
  command-like function should be imperative. The name should be camel
  case, and start with a lower case letter (e.g. openFile() or isFoo()).

Differential Revision: https://reviews.llvm.org/D40416

llvm-svn: 319168
2017-11-28 12:42:37 +00:00
Matt Arsenault
285994c1a5 AMDGPU: Fix producing saveexec when the copy is spilled
If the register from the copy from exec was spilled,
the copy before the spill was deleted leaving a spill
of undefined register verifier error and miscompiling.
Check for other use instructions of the copy register.

llvm-svn: 318132
2017-11-14 02:16:54 +00:00
Matt Arsenault
135357bbf8 AMDGPU: Fix missing skipFunction calls
llvm-svn: 315361
2017-10-10 20:48:36 +00:00
Stanislav Mekhanoshin
4a9ea2a58e [AMDGPU] Turn s_and_saveexec_b64 into s_and_b64 if result is unused
With SI_END_CF elimination for some nested control flow we can now
eliminate saved exec register completely by turning a saveexec version
of instruction into just a logical instruction.

Differential Revision: https://reviews.llvm.org/D36007

llvm-svn: 309766
2017-08-01 23:44:35 +00:00
Nicolai Haehnle
65a5ad0126 AMDGPU: Fix use-after-free in SIOptimizeExecMasking
Summary:
There was a bug with sequences like

   s_mov_b64 s[0:1], exec
   s_and_b64 s[2:3]<def>, s[0:1], s[2:3]<kill>
   ...
   s_mov_b64_term exec, s[2:3]

because s[2:3] was defined and used in the same instruction, ending up with
SaveExecInst inside OtherUseInsts.

Note that the test case also exposes an unrelated bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98028

Reviewers: tstellarAMD, arsenm

Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D25306

llvm-svn: 283528
2016-10-07 08:40:14 +00:00
Konstantin Zhuravlyov
2231c0c9d2 [AMDGPU] Remove unused variables from SIOptimizeExecMasking
Differential Revision: https://reviews.llvm.org/D25110

llvm-svn: 283087
2016-10-03 04:43:22 +00:00
Mehdi Amini
1fef2dd6b7 Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
2016-10-01 02:56:57 +00:00
Matt Arsenault
c8493c6153 AMDGPU: Partially fix control flow at -O0
Fixes to allow spilling all registers at the end of the block
work with exec modifications. Don't emit s_and_saveexec_b64 for
if lowering, and instead emit copies. Mark control flow mask
instructions as terminators to get correct spill code placement
with fast regalloc, and then have a separate optimization pass
form the saveexec.

This should work if SGPRs are spilled to VGPRs, but
will likely fail in the case that an SGPR spills to memory
and no workitem takes a divergent branch.

llvm-svn: 282667
2016-09-29 01:44:16 +00:00