Previously we would use a bundle to hint the register allocator to not
overwrite the pointers in a sequence of loads to avoid breaking soft
clauses. This bundling was based on a fuzzy register pressure
heuristic, so we could not guarantee using more registers than are
really available. This would result in register allocator failing on
unsatisfiable bundles. Use a kill to artificially extend the live
ranges, so we can always succeed at register allocation even if it
means extra spills in the worst case.
This seems to capture most of the benefit of the bundle while avoiding
most of the risk presented by the bundle. However the lit tests do
show a handful of regressions. In some cases with sequences of
volatile loads, unused load components end up getting reallocated to
the next load which forces a wait between. There are also a few small
scheduling regressions where a hazard used to be avoided, and one
spill torture test which for some reason nearly doubles the stack
usage. There is also a bit of noise from leftover kills (it may make
sense for post-RA pseudos to strip all of these out).
We are using AtomicNoRet map in multiple places to determine
if an instruction atomic, rtn or nortn atomic. This method
does not work always since we have some instructions which
only has rtn or nortn version.
One such instruction is ds_wrxchg_rtn_b32 which does not have
nortn version. This has caused changes in memory legalizer
tests.
Differential Revision: https://reviews.llvm.org/D96639
Fixes a testcase that was overcommitting large register tuples to a
bundle, which the register allocator could not possibly satisfy. This
was producing a bundle which used nearly all of the available SGPRs
with a series of 16-dword loads (not all of which are freely available
to use).
This is a quick hack for some deeper issues with how the clause
bundler tracks register pressure.
Overall the pressure tracking used here doesn't make sense and is too
imprecise for what it needs to avoid the allocator failing. The
pressure estimate does not account for the alignment requirements of
large SGPR tuples, so this was really underestimating the pressure
impact. This also ignores the impact of the extended live range of the
use registers after the bundle is introduced. Additionally, it didn't
account for some wide tuples not being available due to reserved
registers.
This regresses a few cases. These end up introducing more
spilling. This is also a function of the global pressure being used in
the decision to bundle, not the local pressure impact of the bundle
itself.
This should only consider whether the pressure impact of the bundle at
the given point in the program will decrease the occupancy. High VGPR
pressure was incorrectly blocking the formation of scalar bundles, and
vice versa. This was also blocking bundling from high pressure
situations at other points in the program.
DBG_VALUES placed between memory instructions would change
codegen. Skip over these and re-insert them after the bundle instead
of giving up on bundling.
Summary:
RPTracker::reset(MI) is a very expensive call when the number of virtual registers is huge.
We observed a long compilation time issue when RPT::reset() is called once for each cluster.
In this work, we call RPT.reset() only at the first seen cluster, and use advance() to get
the register pressure for the later clusters in the same basic block. This could effectively reduce the number
of the expensive calls and thus reduce the compile time.
Reviewers:
rampitec
Fixes:
SWDEV-239161
Differential Revision:
https://reviews.llvm.org/D95273
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).
Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor
Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&
Depends on D65919
Reviewers: arsenm, bogner, craig.topper, RKSimon
Reviewed By: arsenm
Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65962
llvm-svn: 369041
With LLPC, previous investigation has suggested that si-scheduler
interacts badly with SiFormMemoryClauses on an XNACK target in some
games.
That needs further investigation in the future. In the meantime, this
commit adds a target-specific attribute to allow us to disable
SIFormMemoryClauses by setting it to 1 on a per-function basis for LLPC
to use.
Differential Revision: https://reviews.llvm.org/D62572
Change-Id: Ia0ca12ce79093cbbe86caded723ffb13384ede92
llvm-svn: 362127
Summary:
With XNACK, an smem load whose result is coalesced with an operand (thus
it overwrites its own operand) cannot appear in a clause, because some
other instruction might XNACK and restart the whole clause.
The clause breaker already realized that an smem that overwrites an
operand cannot appear in a clause, and broke the clause. The problem
that this commit fixes is that the SIFormMemoryClauses optimization
formed a bundle with early clobber, which caused the earlier code that
set up the coalesced operand to be removed as dead.
Differential Revision: https://reviews.llvm.org/D57008
Change-Id: I703c4d5b0bf7d6060222bec491f45c18bb3c0016
llvm-svn: 351950
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Summary:
This is a follow-up to r335942.
- Merge SISubtarget into AMDGPUSubtarget and rename to GCNSubtarget
- Rename AMDGPUCommonSubtarget to AMDGPUSubtarget
- Merge R600Subtarget::Generation and GCNSubtarget::Generation into
AMDGPUSubtarget::Generation.
Reviewers: arsenm, jvesely
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D49037
llvm-svn: 336851
Memory clauses are formed into bundles in presence of xnack.
Their source operands are marked as early-clobber.
This allows to allocate distinct source and destination registers
within a clause and prevent breaking the clause with s_nop in the
hazard recognizer.
Clauses are undone before post-RA scheduler to allow some rescheduling,
which will not break the clause since artificial edges are created in
the dag to keep memory operations together. Yet this allows a better
ILP in some cases.
Differential Revision: https://reviews.llvm.org/D47511
llvm-svn: 333691