This patch adds a new ReadAdvance definition named ReadInt2Fpu.
ReadInt2Fpu allows x86 scheduling models to accurately describe delays caused by
data transfers from the integer unit to the floating point unit.
ReadInt2Fpu currently defaults to a delay of zero cycles (i.e. no delay) for all
x86 models excluding BtVer2. That means, this patch is only a functional change
for the Jaguar cpu model only.
Tablegen definitions for instructions (V)PINSR* have been updated to account for
the new ReadInt2Fpu. That read is mapped to the the GPR input operand.
On Jaguar, int-to-fpu transfers are modeled as a +6cy delay. Before this patch,
that extra delay was added to the opcode latency. In practice, the insert opcode
only executes for 1cy. Most of the actual latency is actually contributed by the
so-called operand-latency. According to the AMD SOG for family 16h, (V)PINSR*
latency is defined by expression f+1, where f is defined as a forwarding delay
from the integer unit to the fpu.
When printing instruction latency from MCA (see InstructionInfoView.cpp) and LLC
(only when flag -print-schedule is speified), we now need to account for any
extra forwarding delays. We do this by checking if scheduling classes declare
any negative ReadAdvance entries. Quoting a code comment in TargetSchedule.td:
"A negative advance effectively increases latency, which may be used for
cross-domain stalls". When computing the instruction latency for the purpose of
our scheduling tests, we now add any extra delay to the formula. This avoids
regressing existing codegen and mca schedule tests. It comes with the cost of an
extra (but very simple) hook in MCSchedModel.
Differential Revision: https://reviews.llvm.org/D57056
llvm-svn: 351965
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
This is a fix for the problem arising in D47374 (PR37678):
https://bugs.llvm.org/show_bug.cgi?id=37678
We may not have throughput info because it's not specified in the model
or it's not available with variant scheduling, so assume that those
instructions can execute/complete at max-issue-width.
Differential Revision: https://reviews.llvm.org/D47723
llvm-svn: 334055
This patch extends the MCSchedModel API with new methods that can be used to
obtain the latency and reciprocal througput information for an MCInst.
Scheduling models have recently gained the ability to resolve variant scheduling
classes associated with MCInst objects. Before, models were only able to resolve
a variant scheduling class from a MachineInstr object.
This patch is mainly required by D47374 to avoid regressing a pair of x86
specific -print-schedule tests for btver2. Patch D47374 introduces a new variant
class to teach the btver scheduling model (x86 target) how to correctly compute
the latency profile for some zero-idioms using the new scheduling predicates.
The new methods added by this patch would be mainly used by llc when flag
-print-schedule is specified. In particular, tests that contain inline assembly
require that code is parsed at code emission stage into a sequence of MCInst.
That forces the print-schedule functionality to query the latency/rthroughput
information for MCInst instructions too. If we don't expose this new API, then
we lose "-print-schedule" test coverage as soon as variant scheduling classes
are added to the x86 models.
The tablegen SubtargetEmitter changes teaches how to query latency profile
information using a object that derives from TargetSubtargetInfo. Note that this
should really have been part of r333286. To avoid code duplication, the logic
that "resolves" variant scheduling classes for MCInst, has been moved to a
common place in MC. That logic is used by the "resolveVariantSchedClass" methods
redefined in override by the tablegen'd GenSubtargetInfo classes.
Differential Revision: https://reviews.llvm.org/D47536
llvm-svn: 333650
TargetSchedModel now always delegates to MCSchedModel the computation of
instruction latency and reciprocal throughput.
No functional change intended.
llvm-svn: 330099
This should fix the problem reported by the lld buildbots:
- Builder lld-x86_64-darwin13, Build #19782
- Builder lld-perf-testsuite, Build #1419
llvm-svn: 329068
The goal is to make the reciprocal throughput computation accessible through the
MCSchedModel interface. This is particularly important for llvm-mca because it
can only query the MCSchedModel interface.
No functional change intended.
Differential Revision: https://reviews.llvm.org/D44392
llvm-svn: 327420
The goal is to make the latency information accessible through the MCSchedModel
interface. This is particularly important for tools like llvm-mca that only have
access to the MCSchedModel API.
This partially fixes PR36676.
No functional change intended.
Differential Revision: https://reviews.llvm.org/D44383
llvm-svn: 327406
`MCSchedModel` is large. Make `MCSchedModel::GetDefaultSchedModel()`
return by-reference instead of by-value, so we can store a pointer in
`MCSubtargetInfo::CPUSchedModel` instead of a copy.
Note: since `MCSchedModel` is POD, this doesn't create a static
constructor.
llvm-svn: 241947