[X86][NFC] Rename variables/passes for EVEX compression optimization

RFC: https://discourse.llvm.org/t/rfc-design-for-apx-feature-egpr-and-ndd-support/73031

APX introduces EGPR, NDD and NF instructions. In addition to compressing
EVEX encoded AVX512 instructions into VEX encoding, we also have several
more possible optimizations.

a. Promoted instruction (EVEX space) -> pre-promotion instruction (legacy space)
b. NDD (EVEX space) -> non-NDD (legacy space)
c. NF_ND (EVEX space) -> NF (EVEX space)

The first two types of compression can usually reduce code size, while
the third type of compression can help hardware decode although the
instruction length remains unchanged.

So we do the renaming for the upcoming APX optimizations.

BTW, I clang-format the code in X86CompressEVEX.cpp,
X86CompressEVEXTablesEmitter.cpp.

This patch also extracts the NFC in #77065 into a separate commit.
This commit is contained in:
Shengchen Kan 2024-01-06 11:33:36 +08:00
parent ba3ef331b4
commit a5902a4d24
9 changed files with 105 additions and 93 deletions

View File

@ -8,7 +8,7 @@ tablegen(LLVM X86GenAsmWriter1.inc -gen-asm-writer -asmwriternum=1)
tablegen(LLVM X86GenCallingConv.inc -gen-callingconv)
tablegen(LLVM X86GenDAGISel.inc -gen-dag-isel)
tablegen(LLVM X86GenDisassemblerTables.inc -gen-disassembler)
tablegen(LLVM X86GenEVEX2VEXTables.inc -gen-x86-EVEX2VEX-tables)
tablegen(LLVM X86GenCompressEVEXTables.inc -gen-x86-compress-evex-tables)
tablegen(LLVM X86GenExegesis.inc -gen-exegesis)
tablegen(LLVM X86GenFastISel.inc -gen-fast-isel)
tablegen(LLVM X86GenGlobalISel.inc -gen-global-isel)
@ -61,7 +61,7 @@ set(sources
X86InstrFMA3Info.cpp
X86InstrFoldTables.cpp
X86InstrInfo.cpp
X86EvexToVex.cpp
X86CompressEVEX.cpp
X86LoadValueInjectionLoadHardening.cpp
X86LoadValueInjectionRetHardening.cpp
X86MCInstLower.cpp

View File

@ -131,9 +131,9 @@ FunctionPass *createX86FixupBWInsts();
/// to another, when profitable.
FunctionPass *createX86DomainReassignmentPass();
/// This pass replaces EVEX encoded of AVX-512 instructiosn by VEX
/// encoding when possible in order to reduce code size.
FunctionPass *createX86EvexToVexInsts();
/// This pass compress instructions from EVEX space to legacy/VEX/EVEX space when
/// possible in order to reduce code size or facilitate HW decoding.
FunctionPass *createX86CompressEVEXPass();
/// This pass creates the thunks for the retpoline feature.
FunctionPass *createX86IndirectThunksPass();
@ -167,7 +167,7 @@ FunctionPass *createX86SpeculativeLoadHardeningPass();
FunctionPass *createX86SpeculativeExecutionSideEffectSuppression();
FunctionPass *createX86ArgumentStackSlotPass();
void initializeEvexToVexInstPassPass(PassRegistry &);
void initializeCompressEVEXPassPass(PassRegistry &);
void initializeFPSPass(PassRegistry &);
void initializeFixupBWInstPassPass(PassRegistry &);
void initializeFixupLEAPassPass(PassRegistry &);

View File

@ -1,5 +1,4 @@
//===- X86EvexToVex.cpp ---------------------------------------------------===//
// Compress EVEX instructions to VEX encoding when possible to reduce code size
//===- X86CompressEVEX.cpp ------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
@ -7,17 +6,34 @@
//
//===----------------------------------------------------------------------===//
//
/// \file
/// This file defines the pass that goes over all AVX-512 instructions which
/// are encoded using the EVEX prefix and if possible replaces them by their
/// corresponding VEX encoding which is usually shorter by 2 bytes.
/// EVEX instructions may be encoded via the VEX prefix when the AVX-512
/// instruction has a corresponding AVX/AVX2 opcode, when vector length
/// accessed by instruction is less than 512 bits and when it does not use
// the xmm or the mask registers or xmm/ymm registers with indexes higher
// than 15.
/// The pass applies code reduction on the generated code for AVX-512 instrs.
// This pass compresses instructions from EVEX space to legacy/VEX/EVEX space
// when possible in order to reduce code size or facilitate HW decoding.
//
// Possible compression:
// a. AVX512 instruction (EVEX) -> AVX instruction (VEX)
// b. Promoted instruction (EVEX) -> pre-promotion instruction (legacy)
// c. NDD (EVEX) -> non-NDD (legacy)
// d. NF_ND (EVEX) -> NF (EVEX)
//
// Compression a, b and c always reduce code size (some exception)
// fourth type of compression can help hardware decode although the instruction
// length remains unchanged.
//
// Compression a, b and c can always reduce code size, with some exceptions
// such as promoted 16-bit CRC32 which is as long as the legacy version.
//
// legacy:
// crc32w %si, %eax ## encoding: [0x66,0xf2,0x0f,0x38,0xf1,0xc6]
// promoted:
// crc32w %si, %eax ## encoding: [0x62,0xf4,0x7d,0x08,0xf1,0xc6]
//
// From performance perspective, these should be same (same uops and same EXE
// ports). From a FMV perspective, an older legacy encoding is preferred b/c it
// can execute in more places (broader HW install base). So we will still do
// the compression.
//
// Compression d can help hardware decode (HW may skip reading the NDD
// register) although the instruction length remains unchanged.
//===----------------------------------------------------------------------===//
#include "MCTargetDesc/X86BaseInfo.h"
@ -38,37 +54,34 @@
using namespace llvm;
// Including the generated EVEX2VEX tables.
struct X86EvexToVexCompressTableEntry {
uint16_t EvexOpc;
uint16_t VexOpc;
// Including the generated EVEX compression tables.
struct X86CompressEVEXTableEntry {
uint16_t OldOpc;
uint16_t NewOpc;
bool operator<(const X86EvexToVexCompressTableEntry &RHS) const {
return EvexOpc < RHS.EvexOpc;
bool operator<(const X86CompressEVEXTableEntry &RHS) const {
return OldOpc < RHS.OldOpc;
}
friend bool operator<(const X86EvexToVexCompressTableEntry &TE,
unsigned Opc) {
return TE.EvexOpc < Opc;
friend bool operator<(const X86CompressEVEXTableEntry &TE, unsigned Opc) {
return TE.OldOpc < Opc;
}
};
#include "X86GenEVEX2VEXTables.inc"
#include "X86GenCompressEVEXTables.inc"
#define EVEX2VEX_DESC "Compressing EVEX instrs to VEX encoding when possible"
#define EVEX2VEX_NAME "x86-evex-to-vex-compress"
#define COMP_EVEX_DESC "Compressing EVEX instrs when possible"
#define COMP_EVEX_NAME "x86-compress-evex"
#define DEBUG_TYPE EVEX2VEX_NAME
#define DEBUG_TYPE COMP_EVEX_NAME
namespace {
class EvexToVexInstPass : public MachineFunctionPass {
class CompressEVEXPass : public MachineFunctionPass {
public:
static char ID;
EvexToVexInstPass() : MachineFunctionPass(ID) {}
StringRef getPassName() const override { return EVEX2VEX_DESC; }
CompressEVEXPass() : MachineFunctionPass(ID) {}
StringRef getPassName() const override { return COMP_EVEX_DESC; }
/// Loop over all of the basic blocks, replacing EVEX instructions
/// by equivalent VEX instructions when possible for reducing code size.
bool runOnMachineFunction(MachineFunction &MF) override;
// This pass runs after regalloc and doesn't support VReg operands.
@ -80,7 +93,7 @@ public:
} // end anonymous namespace
char EvexToVexInstPass::ID = 0;
char CompressEVEXPass::ID = 0;
static bool usesExtendedRegister(const MachineInstr &MI) {
auto isHiRegIdx = [](unsigned Reg) {
@ -112,8 +125,8 @@ static bool usesExtendedRegister(const MachineInstr &MI) {
return false;
}
static bool checkVEXInstPredicate(unsigned EvexOpc, const X86Subtarget &ST) {
switch (EvexOpc) {
static bool checkVEXInstPredicate(unsigned OldOpc, const X86Subtarget &ST) {
switch (OldOpc) {
default:
return true;
case X86::VCVTNEPS2BF16Z128rm:
@ -151,15 +164,15 @@ static bool checkVEXInstPredicate(unsigned EvexOpc, const X86Subtarget &ST) {
}
// Do any custom cleanup needed to finalize the conversion.
static bool performCustomAdjustments(MachineInstr &MI, unsigned VexOpc) {
(void)VexOpc;
static bool performCustomAdjustments(MachineInstr &MI, unsigned NewOpc) {
(void)NewOpc;
unsigned Opc = MI.getOpcode();
switch (Opc) {
case X86::VALIGNDZ128rri:
case X86::VALIGNDZ128rmi:
case X86::VALIGNQZ128rri:
case X86::VALIGNQZ128rmi: {
assert((VexOpc == X86::VPALIGNRrri || VexOpc == X86::VPALIGNRrmi) &&
assert((NewOpc == X86::VPALIGNRrri || NewOpc == X86::VPALIGNRrmi) &&
"Unexpected new opcode!");
unsigned Scale =
(Opc == X86::VALIGNQZ128rri || Opc == X86::VALIGNQZ128rmi) ? 8 : 4;
@ -175,8 +188,8 @@ static bool performCustomAdjustments(MachineInstr &MI, unsigned VexOpc) {
case X86::VSHUFI32X4Z256rri:
case X86::VSHUFI64X2Z256rmi:
case X86::VSHUFI64X2Z256rri: {
assert((VexOpc == X86::VPERM2F128rr || VexOpc == X86::VPERM2I128rr ||
VexOpc == X86::VPERM2F128rm || VexOpc == X86::VPERM2I128rm) &&
assert((NewOpc == X86::VPERM2F128rr || NewOpc == X86::VPERM2I128rr ||
NewOpc == X86::VPERM2F128rm || NewOpc == X86::VPERM2I128rm) &&
"Unexpected new opcode!");
MachineOperand &Imm = MI.getOperand(MI.getNumExplicitOperands() - 1);
int64_t ImmVal = Imm.getImm();
@ -200,7 +213,7 @@ static bool performCustomAdjustments(MachineInstr &MI, unsigned VexOpc) {
case X86::VRNDSCALESDZm_Int:
case X86::VRNDSCALESSZr_Int:
case X86::VRNDSCALESSZm_Int:
const MachineOperand &Imm = MI.getOperand(MI.getNumExplicitOperands()-1);
const MachineOperand &Imm = MI.getOperand(MI.getNumExplicitOperands() - 1);
int64_t ImmVal = Imm.getImm();
// Ensure that only bits 3:0 of the immediate are used.
if ((ImmVal & 0xf) != ImmVal)
@ -239,28 +252,28 @@ static bool CompressEvexToVexImpl(MachineInstr &MI, const X86Subtarget &ST) {
return false;
// Use the VEX.L bit to select the 128 or 256-bit table.
ArrayRef<X86EvexToVexCompressTableEntry> Table =
ArrayRef<X86CompressEVEXTableEntry> Table =
(Desc.TSFlags & X86II::VEX_L) ? ArrayRef(X86EvexToVex256CompressTable)
: ArrayRef(X86EvexToVex128CompressTable);
unsigned EvexOpc = MI.getOpcode();
const auto *I = llvm::lower_bound(Table, EvexOpc);
if (I == Table.end() || I->EvexOpc != EvexOpc)
unsigned Opc = MI.getOpcode();
const auto *I = llvm::lower_bound(Table, Opc);
if (I == Table.end() || I->OldOpc != Opc)
return false;
if (usesExtendedRegister(MI))
return false;
if (!checkVEXInstPredicate(EvexOpc, ST))
if (!checkVEXInstPredicate(Opc, ST))
return false;
if (!performCustomAdjustments(MI, I->VexOpc))
if (!performCustomAdjustments(MI, I->NewOpc))
return false;
MI.setDesc(ST.getInstrInfo()->get(I->VexOpc));
MI.setDesc(ST.getInstrInfo()->get(I->NewOpc));
MI.setAsmPrinterFlag(X86::AC_EVEX_2_VEX);
return true;
}
bool EvexToVexInstPass::runOnMachineFunction(MachineFunction &MF) {
bool CompressEVEXPass::runOnMachineFunction(MachineFunction &MF) {
#ifndef NDEBUG
// Make sure the tables are sorted.
static std::atomic<bool> TableChecked(false);
@ -289,8 +302,8 @@ bool EvexToVexInstPass::runOnMachineFunction(MachineFunction &MF) {
return Changed;
}
INITIALIZE_PASS(EvexToVexInstPass, EVEX2VEX_NAME, EVEX2VEX_DESC, false, false)
INITIALIZE_PASS(CompressEVEXPass, COMP_EVEX_NAME, COMP_EVEX_DESC, false, false)
FunctionPass *llvm::createX86EvexToVexInsts() {
return new EvexToVexInstPass();
FunctionPass *llvm::createX86CompressEVEXPass() {
return new CompressEVEXPass();
}

View File

@ -75,7 +75,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeX86Target() {
initializeGlobalISel(PR);
initializeWinEHStatePassPass(PR);
initializeFixupBWInstPassPass(PR);
initializeEvexToVexInstPassPass(PR);
initializeCompressEVEXPassPass(PR);
initializeFixupLEAPassPass(PR);
initializeFPSPass(PR);
initializeX86FixupSetCCPassPass(PR);
@ -575,7 +575,7 @@ void X86PassConfig::addPreEmitPass() {
addPass(createX86FixupInstTuning());
addPass(createX86FixupVectorConstants());
}
addPass(createX86EvexToVexInsts());
addPass(createX86CompressEVEXPass());
addPass(createX86DiscriminateMemOpsPass());
addPass(createX86InsertPrefetchPass());
addPass(createX86InsertX87waitPass());

View File

@ -68,7 +68,7 @@
; CHECK-NEXT: Implement the 'patchable-function' attribute
; CHECK-NEXT: X86 Indirect Branch Tracking
; CHECK-NEXT: X86 vzeroupper inserter
; CHECK-NEXT: Compressing EVEX instrs to VEX encoding when possibl
; CHECK-NEXT: Compressing EVEX instrs when possible
; CHECK-NEXT: X86 Discriminate Memory Operands
; CHECK-NEXT: X86 Insert Cache Prefetches
; CHECK-NEXT: X86 insert wait instruction

View File

@ -1,4 +1,4 @@
# RUN: llc -mtriple=x86_64-- -run-pass x86-evex-to-vex-compress -verify-machineinstrs -mcpu=skx -o - %s | FileCheck %s
# RUN: llc -mtriple=x86_64-- -run-pass x86-compress-evex -verify-machineinstrs -mcpu=skx -o - %s | FileCheck %s
# This test verifies VEX encoding for AVX-512 instructions that use registers of low indexes and
# do not use zmm or mask registers and have a corresponding AVX/AVX2 opcode

View File

@ -205,7 +205,7 @@
; CHECK-NEXT: X86 LEA Fixup
; CHECK-NEXT: X86 Fixup Inst Tuning
; CHECK-NEXT: X86 Fixup Vector Constants
; CHECK-NEXT: Compressing EVEX instrs to VEX encoding when possible
; CHECK-NEXT: Compressing EVEX instrs when possible
; CHECK-NEXT: X86 Discriminate Memory Operands
; CHECK-NEXT: X86 Insert Cache Prefetches
; CHECK-NEXT: X86 insert wait instruction

View File

@ -82,7 +82,7 @@ add_tablegen(llvm-tblgen LLVM
Types.cpp
VarLenCodeEmitterGen.cpp
X86DisassemblerTables.cpp
X86EVEX2VEXTablesEmitter.cpp
X86CompressEVEXTablesEmitter.cpp
X86FoldTablesEmitter.cpp
X86MnemonicTables.cpp
X86ModRMFilters.cpp

View File

@ -1,4 +1,4 @@
//===- utils/TableGen/X86EVEX2VEXTablesEmitter.cpp - X86 backend-*- C++ -*-===//
//==- utils/TableGen/X86CompressEVEXTablesEmitter.cpp - X86 backend-*- C++ -*-//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
@ -6,7 +6,7 @@
//
//===----------------------------------------------------------------------===//
///
/// This tablegen backend is responsible for emitting the X86 backend EVEX2VEX
/// This tablegen backend is responsible for emitting the X86 backend EVEX
/// compression tables.
///
//===----------------------------------------------------------------------===//
@ -23,15 +23,15 @@ using namespace X86Disassembler;
namespace {
class X86EVEX2VEXTablesEmitter {
class X86CompressEVEXTablesEmitter {
RecordKeeper &Records;
CodeGenTarget Target;
// Hold all non-masked & non-broadcasted EVEX encoded instructions
std::vector<const CodeGenInstruction *> EVEXInsts;
// Hold all VEX encoded instructions. Divided into groups with same opcodes
// Hold all pontentially compressible EVEX instructions
std::vector<const CodeGenInstruction *> PreCompressionInsts;
// Hold all compressed instructions. Divided into groups with same opcodes
// to make the search more efficient
std::map<uint64_t, std::vector<const CodeGenInstruction *>> VEXInsts;
std::map<uint64_t, std::vector<const CodeGenInstruction *>> CompressedInsts;
typedef std::pair<const CodeGenInstruction *, const CodeGenInstruction *>
Entry;
@ -41,25 +41,24 @@ class X86EVEX2VEXTablesEmitter {
std::vector<Entry> EVEX2VEX256;
public:
X86EVEX2VEXTablesEmitter(RecordKeeper &R) : Records(R), Target(R) {}
X86CompressEVEXTablesEmitter(RecordKeeper &R) : Records(R), Target(R) {}
// run - Output X86 EVEX2VEX tables.
// run - Output X86 EVEX compression tables.
void run(raw_ostream &OS);
private:
// Prints the given table as a C++ array of type
// X86EvexToVexCompressTableEntry
// Prints the given table as a C++ array of type X86CompressEVEXTableEntry
void printTable(const std::vector<Entry> &Table, raw_ostream &OS);
};
void X86EVEX2VEXTablesEmitter::printTable(const std::vector<Entry> &Table,
raw_ostream &OS) {
void X86CompressEVEXTablesEmitter::printTable(const std::vector<Entry> &Table,
raw_ostream &OS) {
StringRef Size = (Table == EVEX2VEX128) ? "128" : "256";
OS << "// X86 EVEX encoded instructions that have a VEX " << Size
<< " encoding\n"
<< "// (table format: <EVEX opcode, VEX-" << Size << " opcode>).\n"
<< "static const X86EvexToVexCompressTableEntry X86EvexToVex" << Size
<< "static const X86CompressEVEXTableEntry X86EvexToVex" << Size
<< "CompressTable[] = {\n"
<< " // EVEX scalar with corresponding VEX.\n";
@ -98,8 +97,8 @@ public:
RecognizableInstrBase EVEXRI(*EVEXInst);
bool VEX_W = VEXRI.HasREX_W;
bool EVEX_W = EVEXRI.HasREX_W;
bool VEX_WIG = VEXRI.IgnoresW;
bool EVEX_WIG = EVEXRI.IgnoresW;
bool VEX_WIG = VEXRI.IgnoresW;
bool EVEX_WIG = EVEXRI.IgnoresW;
bool EVEX_W1_VEX_W0 = EVEXInst->TheDef->getValueAsBit("EVEX_W1_VEX_W0");
if (VEXRI.IsCodeGenOnly != EVEXRI.IsCodeGenOnly ||
@ -145,8 +144,8 @@ public:
}
};
void X86EVEX2VEXTablesEmitter::run(raw_ostream &OS) {
emitSourceFileHeader("X86 EVEX2VEX tables", OS);
void X86CompressEVEXTablesEmitter::run(raw_ostream &OS) {
emitSourceFileHeader("X86 EVEX compression tables", OS);
ArrayRef<const CodeGenInstruction *> NumberedInstructions =
Target.getInstructionsByEnumValue();
@ -161,32 +160,32 @@ void X86EVEX2VEXTablesEmitter::run(raw_ostream &OS) {
continue;
RecognizableInstrBase RI(*Inst);
// Add VEX encoded instructions to one of VEXInsts vectors according to
// it's opcode.
// Add VEX encoded instructions to one of CompressedInsts vectors according
// to it's opcode.
if (RI.Encoding == X86Local::VEX)
VEXInsts[RI.Opcode].push_back(Inst);
// Add relevant EVEX encoded instructions to EVEXInsts
CompressedInsts[RI.Opcode].push_back(Inst);
// Add relevant EVEX encoded instructions to PreCompressionInsts
else if (RI.Encoding == X86Local::EVEX && !RI.HasEVEX_K && !RI.HasEVEX_B &&
!RI.HasEVEX_L2 && !Def->getValueAsBit("notEVEX2VEXConvertible"))
EVEXInsts.push_back(Inst);
PreCompressionInsts.push_back(Inst);
}
for (const CodeGenInstruction *EVEXInst : EVEXInsts) {
uint64_t Opcode = getValueFromBitsInit(EVEXInst->TheDef->
getValueAsBitsInit("Opcode"));
for (const CodeGenInstruction *EVEXInst : PreCompressionInsts) {
uint64_t Opcode =
getValueFromBitsInit(EVEXInst->TheDef->getValueAsBitsInit("Opcode"));
// For each EVEX instruction look for a VEX match in the appropriate vector
// (instructions with the same opcode) using function object IsMatch.
// Allow EVEX2VEXOverride to explicitly specify a match.
const CodeGenInstruction *VEXInst = nullptr;
if (!EVEXInst->TheDef->isValueUnset("EVEX2VEXOverride")) {
StringRef AltInstStr =
EVEXInst->TheDef->getValueAsString("EVEX2VEXOverride");
EVEXInst->TheDef->getValueAsString("EVEX2VEXOverride");
Record *AltInstRec = Records.getDef(AltInstStr);
assert(AltInstRec && "EVEX2VEXOverride instruction not found!");
VEXInst = &Target.getInstruction(AltInstRec);
} else {
auto Match = llvm::find_if(VEXInsts[Opcode], IsMatch(EVEXInst));
if (Match != VEXInsts[Opcode].end())
auto Match = llvm::find_if(CompressedInsts[Opcode], IsMatch(EVEXInst));
if (Match != CompressedInsts[Opcode].end())
VEXInst = *Match;
}
@ -206,5 +205,5 @@ void X86EVEX2VEXTablesEmitter::run(raw_ostream &OS) {
}
} // namespace
static TableGen::Emitter::OptClass<X86EVEX2VEXTablesEmitter>
X("gen-x86-EVEX2VEX-tables", "Generate X86 EVEX to VEX compress tables");
static TableGen::Emitter::OptClass<X86CompressEVEXTablesEmitter>
X("gen-x86-compress-evex-tables", "Generate X86 EVEX compression tables");