mirror of
https://github.com/RPCS3/llvm.git
synced 2025-01-06 03:38:34 +00:00
aeef83c6af
a TargetMachine to construct (and thus isn't always available), to an analysis group that supports layered implementations much like AliasAnalysis does. This is a pretty massive change, with a few parts that I was unable to easily separate (sorry), so I'll walk through it. The first step of this conversion was to make TargetTransformInfo an analysis group, and to sink the nonce implementations in ScalarTargetTransformInfo and VectorTargetTranformInfo into a NoTargetTransformInfo pass. This allows other passes to add a hard requirement on TTI, and assume they will always get at least on implementation. The TargetTransformInfo analysis group leverages the delegation chaining trick that AliasAnalysis uses, where the base class for the analysis group delegates to the previous analysis *pass*, allowing all but tho NoFoo analysis passes to only implement the parts of the interfaces they support. It also introduces a new trick where each pass in the group retains a pointer to the top-most pass that has been initialized. This allows passes to implement one API in terms of another API and benefit when some other pass above them in the stack has more precise results for the second API. The second step of this conversion is to create a pass that implements the TargetTransformInfo analysis using the target-independent abstractions in the code generator. This replaces the ScalarTargetTransformImpl and VectorTargetTransformImpl classes in lib/Target with a single pass in lib/CodeGen called BasicTargetTransformInfo. This class actually provides most of the TTI functionality, basing it upon the TargetLowering abstraction and other information in the target independent code generator. The third step of the conversion adds support to all TargetMachines to register custom analysis passes. This allows building those passes with access to TargetLowering or other target-specific classes, and it also allows each target to customize the set of analysis passes desired in the pass manager. The baseline LLVMTargetMachine implements this interface to add the BasicTTI pass to the pass manager, and all of the tools that want to support target-aware TTI passes call this routine on whatever target machine they end up with to add the appropriate passes. The fourth step of the conversion created target-specific TTI analysis passes for the X86 and ARM backends. These passes contain the custom logic that was previously in their extensions of the ScalarTargetTransformInfo and VectorTargetTransformInfo interfaces. I separated them into their own file, as now all of the interface bits are private and they just expose a function to create the pass itself. Then I extended these target machines to set up a custom set of analysis passes, first adding BasicTTI as a fallback, and then adding their customized TTI implementations. The fourth step required logic that was shared between the target independent layer and the specific targets to move to a different interface, as they no longer derive from each other. As a consequence, a helper functions were added to TargetLowering representing the common logic needed both in the target implementation and the codegen implementation of the TTI pass. While technically this is the only change that could have been committed separately, it would have been a nightmare to extract. The final step of the conversion was just to delete all the old boilerplate. This got rid of the ScalarTargetTransformInfo and VectorTargetTransformInfo classes, all of the support in all of the targets for producing instances of them, and all of the support in the tools for manually constructing a pass based around them. Now that TTI is a relatively normal analysis group, two things become straightforward. First, we can sink it into lib/Analysis which is a more natural layer for it to live. Second, clients of this interface can depend on it *always* being available which will simplify their code and behavior. These (and other) simplifications will follow in subsequent commits, this one is clearly big enough. Finally, I'm very aware that much of the comments and documentation needs to be updated. As soon as I had this working, and plausibly well commented, I wanted to get it committed and in front of the build bots. I'll be doing a few passes over documentation later if it sticks. Commits to update DragonEgg and Clang will be made presently. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@171681 91177308-0d34-0410-b5e6-96231b3b80d8
356 lines
11 KiB
C++
356 lines
11 KiB
C++
//===-- X86TargetTransformInfo.cpp - X86 specific TTI pass ----------------===//
|
|
//
|
|
// The LLVM Compiler Infrastructure
|
|
//
|
|
// This file is distributed under the University of Illinois Open Source
|
|
// License. See LICENSE.TXT for details.
|
|
//
|
|
//===----------------------------------------------------------------------===//
|
|
/// \file
|
|
/// This file implements a TargetTransformInfo analysis pass specific to the
|
|
/// X86 target machine. It uses the target's detailed information to provide
|
|
/// more precise answers to certain TTI queries, while letting the target
|
|
/// independent and default TTI implementations handle the rest.
|
|
///
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
#define DEBUG_TYPE "x86tti"
|
|
#include "X86.h"
|
|
#include "X86TargetMachine.h"
|
|
#include "llvm/Support/Debug.h"
|
|
#include "llvm/Target/TargetLowering.h"
|
|
#include "llvm/TargetTransformInfo.h"
|
|
using namespace llvm;
|
|
|
|
// Declare the pass initialization routine locally as target-specific passes
|
|
// don't havve a target-wide initialization entry point, and so we rely on the
|
|
// pass constructor initialization.
|
|
namespace llvm {
|
|
void initializeX86TTIPass(PassRegistry &);
|
|
}
|
|
|
|
namespace {
|
|
|
|
class X86TTI : public ImmutablePass, public TargetTransformInfo {
|
|
const X86TargetMachine *TM;
|
|
const X86Subtarget *ST;
|
|
const X86TargetLowering *TLI;
|
|
|
|
/// Estimate the overhead of scalarizing an instruction. Insert and Extract
|
|
/// are set if the result needs to be inserted and/or extracted from vectors.
|
|
unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const;
|
|
|
|
public:
|
|
X86TTI() : ImmutablePass(ID), TM(0), ST(0), TLI(0) {
|
|
llvm_unreachable("This pass cannot be directly constructed");
|
|
}
|
|
|
|
X86TTI(const X86TargetMachine *TM)
|
|
: ImmutablePass(ID), TM(TM), ST(TM->getSubtargetImpl()),
|
|
TLI(TM->getTargetLowering()) {
|
|
initializeX86TTIPass(*PassRegistry::getPassRegistry());
|
|
}
|
|
|
|
virtual void initializePass() {
|
|
pushTTIStack(this);
|
|
}
|
|
|
|
virtual void finalizePass() {
|
|
popTTIStack();
|
|
}
|
|
|
|
virtual void getAnalysisUsage(AnalysisUsage &AU) const {
|
|
TargetTransformInfo::getAnalysisUsage(AU);
|
|
}
|
|
|
|
/// Pass identification.
|
|
static char ID;
|
|
|
|
/// Provide necessary pointer adjustments for the two base classes.
|
|
virtual void *getAdjustedAnalysisPointer(const void *ID) {
|
|
if (ID == &TargetTransformInfo::ID)
|
|
return (TargetTransformInfo*)this;
|
|
return this;
|
|
}
|
|
|
|
/// \name Scalar TTI Implementations
|
|
/// @{
|
|
|
|
virtual PopcntHwSupport getPopcntHwSupport(unsigned TyWidth) const;
|
|
|
|
/// @}
|
|
|
|
/// \name Vector TTI Implementations
|
|
/// @{
|
|
|
|
virtual unsigned getNumberOfRegisters(bool Vector) const;
|
|
virtual unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty) const;
|
|
virtual unsigned getShuffleCost(ShuffleKind Kind, Type *Tp,
|
|
int Index, Type *SubTp) const;
|
|
virtual unsigned getCastInstrCost(unsigned Opcode, Type *Dst,
|
|
Type *Src) const;
|
|
virtual unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
|
|
Type *CondTy) const;
|
|
virtual unsigned getVectorInstrCost(unsigned Opcode, Type *Val,
|
|
unsigned Index) const;
|
|
virtual unsigned getMemoryOpCost(unsigned Opcode, Type *Src,
|
|
unsigned Alignment,
|
|
unsigned AddressSpace) const;
|
|
|
|
/// @}
|
|
};
|
|
|
|
} // end anonymous namespace
|
|
|
|
INITIALIZE_AG_PASS(X86TTI, TargetTransformInfo, "x86tti",
|
|
"X86 Target Transform Info", true, true, false)
|
|
char X86TTI::ID = 0;
|
|
|
|
ImmutablePass *
|
|
llvm::createX86TargetTransformInfoPass(const X86TargetMachine *TM) {
|
|
return new X86TTI(TM);
|
|
}
|
|
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
//
|
|
// X86 cost model.
|
|
//
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
namespace {
|
|
struct X86CostTblEntry {
|
|
int ISD;
|
|
MVT Type;
|
|
unsigned Cost;
|
|
};
|
|
}
|
|
|
|
static int
|
|
FindInTable(const X86CostTblEntry *Tbl, unsigned len, int ISD, MVT Ty) {
|
|
for (unsigned int i = 0; i < len; ++i)
|
|
if (Tbl[i].ISD == ISD && Tbl[i].Type == Ty)
|
|
return i;
|
|
|
|
// Could not find an entry.
|
|
return -1;
|
|
}
|
|
|
|
namespace {
|
|
struct X86TypeConversionCostTblEntry {
|
|
int ISD;
|
|
MVT Dst;
|
|
MVT Src;
|
|
unsigned Cost;
|
|
};
|
|
}
|
|
|
|
static int
|
|
FindInConvertTable(const X86TypeConversionCostTblEntry *Tbl, unsigned len,
|
|
int ISD, MVT Dst, MVT Src) {
|
|
for (unsigned int i = 0; i < len; ++i)
|
|
if (Tbl[i].ISD == ISD && Tbl[i].Src == Src && Tbl[i].Dst == Dst)
|
|
return i;
|
|
|
|
// Could not find an entry.
|
|
return -1;
|
|
}
|
|
|
|
|
|
X86TTI::PopcntHwSupport X86TTI::getPopcntHwSupport(unsigned TyWidth) const {
|
|
assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");
|
|
// TODO: Currently the __builtin_popcount() implementation using SSE3
|
|
// instructions is inefficient. Once the problem is fixed, we should
|
|
// call ST->hasSSE3() instead of ST->hasSSE4().
|
|
return ST->hasSSE41() ? Fast : None;
|
|
}
|
|
|
|
unsigned X86TTI::getNumberOfRegisters(bool Vector) const {
|
|
if (ST->is64Bit())
|
|
return 16;
|
|
return 8;
|
|
}
|
|
|
|
unsigned X86TTI::getArithmeticInstrCost(unsigned Opcode, Type *Ty) const {
|
|
// Legalize the type.
|
|
std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(Ty);
|
|
|
|
int ISD = TLI->InstructionOpcodeToISD(Opcode);
|
|
assert(ISD && "Invalid opcode");
|
|
|
|
static const X86CostTblEntry AVX1CostTable[] = {
|
|
// We don't have to scalarize unsupported ops. We can issue two half-sized
|
|
// operations and we only need to extract the upper YMM half.
|
|
// Two ops + 1 extract + 1 insert = 4.
|
|
{ ISD::MUL, MVT::v8i32, 4 },
|
|
{ ISD::SUB, MVT::v8i32, 4 },
|
|
{ ISD::ADD, MVT::v8i32, 4 },
|
|
{ ISD::MUL, MVT::v4i64, 4 },
|
|
{ ISD::SUB, MVT::v4i64, 4 },
|
|
{ ISD::ADD, MVT::v4i64, 4 },
|
|
};
|
|
|
|
// Look for AVX1 lowering tricks.
|
|
if (ST->hasAVX()) {
|
|
int Idx = FindInTable(AVX1CostTable, array_lengthof(AVX1CostTable), ISD,
|
|
LT.second);
|
|
if (Idx != -1)
|
|
return LT.first * AVX1CostTable[Idx].Cost;
|
|
}
|
|
// Fallback to the default implementation.
|
|
return TargetTransformInfo::getArithmeticInstrCost(Opcode, Ty);
|
|
}
|
|
|
|
unsigned X86TTI::getShuffleCost(ShuffleKind Kind, Type *Tp, int Index,
|
|
Type *SubTp) const {
|
|
// We only estimate the cost of reverse shuffles.
|
|
if (Kind != Reverse)
|
|
return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp);
|
|
|
|
std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(Tp);
|
|
unsigned Cost = 1;
|
|
if (LT.second.getSizeInBits() > 128)
|
|
Cost = 3; // Extract + insert + copy.
|
|
|
|
// Multiple by the number of parts.
|
|
return Cost * LT.first;
|
|
}
|
|
|
|
unsigned X86TTI::getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src) const {
|
|
int ISD = TLI->InstructionOpcodeToISD(Opcode);
|
|
assert(ISD && "Invalid opcode");
|
|
|
|
EVT SrcTy = TLI->getValueType(Src);
|
|
EVT DstTy = TLI->getValueType(Dst);
|
|
|
|
if (!SrcTy.isSimple() || !DstTy.isSimple())
|
|
return TargetTransformInfo::getCastInstrCost(Opcode, Dst, Src);
|
|
|
|
static const X86TypeConversionCostTblEntry AVXConversionTbl[] = {
|
|
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i16, 1 },
|
|
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i16, 1 },
|
|
{ ISD::SIGN_EXTEND, MVT::v4i64, MVT::v4i32, 1 },
|
|
{ ISD::ZERO_EXTEND, MVT::v4i64, MVT::v4i32, 1 },
|
|
{ ISD::TRUNCATE, MVT::v4i32, MVT::v4i64, 1 },
|
|
{ ISD::TRUNCATE, MVT::v8i16, MVT::v8i32, 1 },
|
|
{ ISD::SINT_TO_FP, MVT::v8f32, MVT::v8i8, 1 },
|
|
{ ISD::SINT_TO_FP, MVT::v4f32, MVT::v4i8, 1 },
|
|
{ ISD::UINT_TO_FP, MVT::v8f32, MVT::v8i8, 1 },
|
|
{ ISD::UINT_TO_FP, MVT::v4f32, MVT::v4i8, 1 },
|
|
{ ISD::FP_TO_SINT, MVT::v8i8, MVT::v8f32, 1 },
|
|
{ ISD::FP_TO_SINT, MVT::v4i8, MVT::v4f32, 1 },
|
|
{ ISD::ZERO_EXTEND, MVT::v8i32, MVT::v8i1, 6 },
|
|
{ ISD::SIGN_EXTEND, MVT::v8i32, MVT::v8i1, 9 },
|
|
{ ISD::TRUNCATE, MVT::v8i32, MVT::v8i64, 3 },
|
|
};
|
|
|
|
if (ST->hasAVX()) {
|
|
int Idx = FindInConvertTable(AVXConversionTbl,
|
|
array_lengthof(AVXConversionTbl),
|
|
ISD, DstTy.getSimpleVT(), SrcTy.getSimpleVT());
|
|
if (Idx != -1)
|
|
return AVXConversionTbl[Idx].Cost;
|
|
}
|
|
|
|
return TargetTransformInfo::getCastInstrCost(Opcode, Dst, Src);
|
|
}
|
|
|
|
unsigned X86TTI::getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
|
|
Type *CondTy) const {
|
|
// Legalize the type.
|
|
std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(ValTy);
|
|
|
|
MVT MTy = LT.second;
|
|
|
|
int ISD = TLI->InstructionOpcodeToISD(Opcode);
|
|
assert(ISD && "Invalid opcode");
|
|
|
|
static const X86CostTblEntry SSE42CostTbl[] = {
|
|
{ ISD::SETCC, MVT::v2f64, 1 },
|
|
{ ISD::SETCC, MVT::v4f32, 1 },
|
|
{ ISD::SETCC, MVT::v2i64, 1 },
|
|
{ ISD::SETCC, MVT::v4i32, 1 },
|
|
{ ISD::SETCC, MVT::v8i16, 1 },
|
|
{ ISD::SETCC, MVT::v16i8, 1 },
|
|
};
|
|
|
|
static const X86CostTblEntry AVX1CostTbl[] = {
|
|
{ ISD::SETCC, MVT::v4f64, 1 },
|
|
{ ISD::SETCC, MVT::v8f32, 1 },
|
|
// AVX1 does not support 8-wide integer compare.
|
|
{ ISD::SETCC, MVT::v4i64, 4 },
|
|
{ ISD::SETCC, MVT::v8i32, 4 },
|
|
{ ISD::SETCC, MVT::v16i16, 4 },
|
|
{ ISD::SETCC, MVT::v32i8, 4 },
|
|
};
|
|
|
|
static const X86CostTblEntry AVX2CostTbl[] = {
|
|
{ ISD::SETCC, MVT::v4i64, 1 },
|
|
{ ISD::SETCC, MVT::v8i32, 1 },
|
|
{ ISD::SETCC, MVT::v16i16, 1 },
|
|
{ ISD::SETCC, MVT::v32i8, 1 },
|
|
};
|
|
|
|
if (ST->hasAVX2()) {
|
|
int Idx = FindInTable(AVX2CostTbl, array_lengthof(AVX2CostTbl), ISD, MTy);
|
|
if (Idx != -1)
|
|
return LT.first * AVX2CostTbl[Idx].Cost;
|
|
}
|
|
|
|
if (ST->hasAVX()) {
|
|
int Idx = FindInTable(AVX1CostTbl, array_lengthof(AVX1CostTbl), ISD, MTy);
|
|
if (Idx != -1)
|
|
return LT.first * AVX1CostTbl[Idx].Cost;
|
|
}
|
|
|
|
if (ST->hasSSE42()) {
|
|
int Idx = FindInTable(SSE42CostTbl, array_lengthof(SSE42CostTbl), ISD, MTy);
|
|
if (Idx != -1)
|
|
return LT.first * SSE42CostTbl[Idx].Cost;
|
|
}
|
|
|
|
return TargetTransformInfo::getCmpSelInstrCost(Opcode, ValTy, CondTy);
|
|
}
|
|
|
|
unsigned X86TTI::getVectorInstrCost(unsigned Opcode, Type *Val,
|
|
unsigned Index) const {
|
|
assert(Val->isVectorTy() && "This must be a vector type");
|
|
|
|
if (Index != -1U) {
|
|
// Legalize the type.
|
|
std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(Val);
|
|
|
|
// This type is legalized to a scalar type.
|
|
if (!LT.second.isVector())
|
|
return 0;
|
|
|
|
// The type may be split. Normalize the index to the new type.
|
|
unsigned Width = LT.second.getVectorNumElements();
|
|
Index = Index % Width;
|
|
|
|
// Floating point scalars are already located in index #0.
|
|
if (Val->getScalarType()->isFloatingPointTy() && Index == 0)
|
|
return 0;
|
|
}
|
|
|
|
return TargetTransformInfo::getVectorInstrCost(Opcode, Val, Index);
|
|
}
|
|
|
|
unsigned X86TTI::getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,
|
|
unsigned AddressSpace) const {
|
|
// Legalize the type.
|
|
std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(Src);
|
|
assert((Opcode == Instruction::Load || Opcode == Instruction::Store) &&
|
|
"Invalid Opcode");
|
|
|
|
// Each load/store unit costs 1.
|
|
unsigned Cost = LT.first * 1;
|
|
|
|
// On Sandybridge 256bit load/stores are double pumped
|
|
// (but not on Haswell).
|
|
if (LT.second.getSizeInBits() > 128 && !ST->hasAVX2())
|
|
Cost*=2;
|
|
|
|
return Cost;
|
|
}
|