llvm-capstone/llvm/lib/CodeGen/BasicBlockSectionsProfileReader.cpp
Rahman Lavaee 3d6841b2b1 [Propeller] Use Fixed MBB ID instead of volatile MachineBasicBlock::Number.
Let Propeller use specialized IDs for basic blocks, instead of MBB number.

This allows optimizations not just prior to asm-printer, but throughout the entire codegen.
This patch only implements the functionality under the new `LLVM_BB_ADDR_MAP` version, but the old version is still being used. A later patch will change the used version.

####Background
Today Propeller uses machine basic block (MBB) numbers, which already exist, to map native assembly to machine IR.  This is done as follows.
    - Basic block addresses are captured and dumped into the `LLVM_BB_ADDR_MAP` section just before the AsmPrinter pass which writes out object files. This ensures that we have a mapping that is close to assembly.
    - Profiling mapping works by taking a virtual address of an instruction and looking up the `LLVM_BB_ADDR_MAP` section to find the MBB number it corresponds to.
    - While this works well today, we need to do better when we scale Propeller to target other Machine IR optimizations like spill code optimization.  Register allocation happens earlier in the Machine IR pipeline and we need an annotation mechanism that is valid at that point.
    - The current scheme will not work in this scenario because the MBB number of a particular basic block is not fixed and changes over the course of codegen (via renumbering, adding, and removing the basic blocks).
    - In other words, the volatile MBB numbers do not provide a one-to-one correspondence throughout the lifetime of Machine IR.  Profile annotation using MBB numbers is restricted to a fixed point; only valid at the exact point where it was dumped.
    - Further, the object file can only be dumped before AsmPrinter and cannot be dumped at an arbitrary point in the Machine IR pass pipeline.  Hence, MBB numbers are not suitable and we need something else.
####Solution
We propose using fixed unique incremental MBB IDs for basic blocks instead of volatile MBB numbers. These IDs are assigned upon the creation of machine basic blocks. We modify `MachineFunction::CreateMachineBasicBlock` to assign the fixed ID to every newly created basic block.  It assigns `MachineFunction::NextMBBID` to the MBB ID and then increments it, which ensures having unique IDs.

 To ensure correct profile attribution, multiple equivalent compilations must generate the same Propeller IDs. This is guaranteed as long as the MachineFunction passes run in the same order. Since the `NextBBID` variable is scoped to `MachineFunction`, interleaving of codegen for different functions won't cause any inconsistencies.

The new encoding is generated under the new version number 2 and we keep backward-compatibility with older versions.

####Impact on Size of the `LLVM_BB_ADDR_MAP` Section
Emitting the Propeller ID results in a 23% increase in the size of the `LLVM_BB_ADDR_MAP` section for the clang binary.

Reviewed By: tmsriram

Differential Revision: https://reviews.llvm.org/D100808
2023-01-17 15:25:29 -08:00

145 lines
5.4 KiB
C++

//===-- BasicBlockSectionsProfileReader.cpp -------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// Implementation of the basic block sections profile reader pass. It parses
// and stores the basic block sections profile file (which is specified via the
// `-basic-block-sections` flag).
//
//===----------------------------------------------------------------------===//
#include "llvm/CodeGen/BasicBlockSectionsProfileReader.h"
#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Error.h"
#include "llvm/Support/LineIterator.h"
#include "llvm/Support/MemoryBuffer.h"
using namespace llvm;
char BasicBlockSectionsProfileReader::ID = 0;
INITIALIZE_PASS(BasicBlockSectionsProfileReader, "bbsections-profile-reader",
"Reads and parses a basic block sections profile.", false,
false)
bool BasicBlockSectionsProfileReader::isFunctionHot(StringRef FuncName) const {
return getBBClusterInfoForFunction(FuncName).first;
}
std::pair<bool, SmallVector<BBClusterInfo>>
BasicBlockSectionsProfileReader::getBBClusterInfoForFunction(
StringRef FuncName) const {
std::pair<bool, SmallVector<BBClusterInfo>> cluster_info(false, {});
auto R = ProgramBBClusterInfo.find(getAliasName(FuncName));
if (R != ProgramBBClusterInfo.end()) {
cluster_info.second = R->second;
cluster_info.first = true;
}
return cluster_info;
}
// Basic Block Sections can be enabled for a subset of machine basic blocks.
// This is done by passing a file containing names of functions for which basic
// block sections are desired. Additionally, machine basic block ids of the
// functions can also be specified for a finer granularity. Moreover, a cluster
// of basic blocks could be assigned to the same section.
// A file with basic block sections for all of function main and three blocks
// for function foo (of which 1 and 2 are placed in a cluster) looks like this:
// ----------------------------
// list.txt:
// !main
// !foo
// !!1 2
// !!4
static Error getBBClusterInfo(const MemoryBuffer *MBuf,
ProgramBBClusterInfoMapTy &ProgramBBClusterInfo,
StringMap<StringRef> &FuncAliasMap) {
assert(MBuf);
line_iterator LineIt(*MBuf, /*SkipBlanks=*/true, /*CommentMarker=*/'#');
auto invalidProfileError = [&](auto Message) {
return make_error<StringError>(
Twine("Invalid profile " + MBuf->getBufferIdentifier() + " at line " +
Twine(LineIt.line_number()) + ": " + Message),
inconvertibleErrorCode());
};
auto FI = ProgramBBClusterInfo.end();
// Current cluster ID corresponding to this function.
unsigned CurrentCluster = 0;
// Current position in the current cluster.
unsigned CurrentPosition = 0;
// Temporary set to ensure every basic block ID appears once in the clusters
// of a function.
SmallSet<unsigned, 4> FuncBBIDs;
for (; !LineIt.is_at_eof(); ++LineIt) {
StringRef S(*LineIt);
if (S[0] == '@')
continue;
// Check for the leading "!"
if (!S.consume_front("!") || S.empty())
break;
// Check for second "!" which indicates a cluster of basic blocks.
if (S.consume_front("!")) {
if (FI == ProgramBBClusterInfo.end())
return invalidProfileError(
"Cluster list does not follow a function name specifier.");
SmallVector<StringRef, 4> BBIDs;
S.split(BBIDs, ' ');
// Reset current cluster position.
CurrentPosition = 0;
for (auto BBIDStr : BBIDs) {
unsigned long long BBID;
if (getAsUnsignedInteger(BBIDStr, 10, BBID))
return invalidProfileError(Twine("Unsigned integer expected: '") +
BBIDStr + "'.");
if (!FuncBBIDs.insert(BBID).second)
return invalidProfileError(Twine("Duplicate basic block id found '") +
BBIDStr + "'.");
if (BBID == 0 && CurrentPosition)
return invalidProfileError("Entry BB (0) does not begin a cluster.");
FI->second.emplace_back(
BBClusterInfo{((unsigned)BBID), CurrentCluster, CurrentPosition++});
}
CurrentCluster++;
} else { // This is a function name specifier.
// Function aliases are separated using '/'. We use the first function
// name for the cluster info mapping and delegate all other aliases to
// this one.
SmallVector<StringRef, 4> Aliases;
S.split(Aliases, '/');
for (size_t i = 1; i < Aliases.size(); ++i)
FuncAliasMap.try_emplace(Aliases[i], Aliases.front());
// Prepare for parsing clusters of this function name.
// Start a new cluster map for this function name.
FI = ProgramBBClusterInfo.try_emplace(Aliases.front()).first;
CurrentCluster = 0;
FuncBBIDs.clear();
}
}
return Error::success();
}
void BasicBlockSectionsProfileReader::initializePass() {
if (!MBuf)
return;
if (auto Err = getBBClusterInfo(MBuf, ProgramBBClusterInfo, FuncAliasMap))
report_fatal_error(std::move(Err));
}
ImmutablePass *
llvm::createBasicBlockSectionsProfileReaderPass(const MemoryBuffer *Buf) {
return new BasicBlockSectionsProfileReader(Buf);
}