Files
archived-llvm/include/llvm/Object/IRSymtab.h
Ben Dunbobbin 7398f4a5f8 [ELF] Implement Dependent Libraries Feature
This patch implements a limited form of autolinking primarily designed to allow
either the --dependent-library compiler option, or "comment lib" pragmas (
https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017) in
C/C++ e.g. #pragma comment(lib, "foo"), to cause an ELF linker to automatically
add the specified library to the link when processing the input file generated
by the compiler.

Currently this extension is unique to LLVM and LLD. However, care has been taken
to design this feature so that it could be supported by other ELF linkers.

The design goals were to provide:

- A simple linking model for developers to reason about.
- The ability to to override autolinking from the linker command line.
- Source code compatibility, where possible, with "comment lib" pragmas in other
  environments (MSVC in particular).

Dependent library support is implemented differently for ELF platforms than on
the other platforms. Primarily this difference is that on ELF we pass the
dependent library specifiers directly to the linker without manipulating them.
This is in contrast to other platforms where they are mapped to a specific
linker option by the compiler. This difference is a result of the greater
variety of ELF linkers and the fact that ELF linkers tend to handle libraries in
a more complicated fashion than on other platforms. This forces us to defer
handling the specifiers to the linker.

In order to achieve a level of source code compatibility with other platforms
we have restricted this feature to work with libraries that meet the following
"reasonable" requirements:

1. There are no competing defined symbols in a given set of libraries, or
   if they exist, the program owner doesn't care which is linked to their
   program.
2. There may be circular dependencies between libraries.

The binary representation is a mergeable string section (SHF_MERGE,
SHF_STRINGS), called .deplibs, with custom type SHT_LLVM_DEPENDENT_LIBRARIES
(0x6fff4c04). The compiler forms this section by concatenating the arguments of
the "comment lib" pragmas and --dependent-library options in the order they are
encountered. Partial (-r, -Ur) links are handled by concatenating .deplibs
sections with the normal mergeable string section rules. As an example, #pragma
comment(lib, "foo") would result in:

.section ".deplibs","MS",@llvm_dependent_libraries,1
         .asciz "foo"

For LTO, equivalent information to the contents of a the .deplibs section can be
retrieved by the LLD for bitcode input files.

LLD processes the dependent library specifiers in the following way:

1. Dependent libraries which are found from the specifiers in .deplibs sections
   of relocatable object files are added when the linker decides to include that
   file (which could itself be in a library) in the link. Dependent libraries
   behave as if they were appended to the command line after all other options. As
   a consequence the set of dependent libraries are searched last to resolve
   symbols.
2. It is an error if a file cannot be found for a given specifier.
3. Any command line options in effect at the end of the command line parsing apply
   to the dependent libraries, e.g. --whole-archive.
4. The linker tries to add a library or relocatable object file from each of the
   strings in a .deplibs section by; first, handling the string as if it was
   specified on the command line; second, by looking for the string in each of the
   library search paths in turn; third, by looking for a lib<string>.a or
   lib<string>.so (depending on the current mode of the linker) in each of the
   library search paths.
5. A new command line option --no-dependent-libraries tells LLD to ignore the
   dependent libraries.

Rationale for the above points:

1. Adding the dependent libraries last makes the process simple to understand
   from a developers perspective. All linkers are able to implement this scheme.
2. Error-ing for libraries that are not found seems like better behavior than
   failing the link during symbol resolution.
3. It seems useful for the user to be able to apply command line options which
   will affect all of the dependent libraries. There is a potential problem of
   surprise for developers, who might not realize that these options would apply
   to these "invisible" input files; however, despite the potential for surprise,
   this is easy for developers to reason about and gives developers the control
   that they may require.
4. This algorithm takes into account all of the different ways that ELF linkers
   find input files. The different search methods are tried by the linker in most
   obvious to least obvious order.
5. I considered adding finer grained control over which dependent libraries were
   ignored (e.g. MSVC has /nodefaultlib:<library>); however, I concluded that this
   is not necessary: if finer control is required developers can fall back to using
   the command line directly.

RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2019-March/131004.html.

Differential Revision: https://reviews.llvm.org/D60274

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@360984 91177308-0d34-0410-b5e6-96231b3b80d8
2019-05-17 03:44:15 +00:00

374 lines
11 KiB
C++

//===- IRSymtab.h - data definitions for IR symbol tables -------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
// This file contains data definitions and a reader and builder for a symbol
// table for LLVM IR. Its purpose is to allow linkers and other consumers of
// bitcode files to efficiently read the symbol table for symbol resolution
// purposes without needing to construct a module in memory.
//
// As with most object files the symbol table has two parts: the symbol table
// itself and a string table which is referenced by the symbol table.
//
// A symbol table corresponds to a single bitcode file, which may consist of
// multiple modules, so symbol tables may likewise contain symbols for multiple
// modules.
//
//===----------------------------------------------------------------------===//
#ifndef LLVM_OBJECT_IRSYMTAB_H
#define LLVM_OBJECT_IRSYMTAB_H
#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/iterator_range.h"
#include "llvm/IR/GlobalValue.h"
#include "llvm/Object/SymbolicFile.h"
#include "llvm/Support/Endian.h"
#include "llvm/Support/Error.h"
#include <cassert>
#include <cstdint>
#include <vector>
namespace llvm {
struct BitcodeFileContents;
class StringTableBuilder;
namespace irsymtab {
namespace storage {
// The data structures in this namespace define the low-level serialization
// format. Clients that just want to read a symbol table should use the
// irsymtab::Reader class.
using Word = support::ulittle32_t;
/// A reference to a string in the string table.
struct Str {
Word Offset, Size;
StringRef get(StringRef Strtab) const {
return {Strtab.data() + Offset, Size};
}
};
/// A reference to a range of objects in the symbol table.
template <typename T> struct Range {
Word Offset, Size;
ArrayRef<T> get(StringRef Symtab) const {
return {reinterpret_cast<const T *>(Symtab.data() + Offset), Size};
}
};
/// Describes the range of a particular module's symbols within the symbol
/// table.
struct Module {
Word Begin, End;
/// The index of the first Uncommon for this Module.
Word UncBegin;
};
/// This is equivalent to an IR comdat.
struct Comdat {
Str Name;
};
/// Contains the information needed by linkers for symbol resolution, as well as
/// by the LTO implementation itself.
struct Symbol {
/// The mangled symbol name.
Str Name;
/// The unmangled symbol name, or the empty string if this is not an IR
/// symbol.
Str IRName;
/// The index into Header::Comdats, or -1 if not a comdat member.
Word ComdatIndex;
Word Flags;
enum FlagBits {
FB_visibility, // 2 bits
FB_has_uncommon = FB_visibility + 2,
FB_undefined,
FB_weak,
FB_common,
FB_indirect,
FB_used,
FB_tls,
FB_may_omit,
FB_global,
FB_format_specific,
FB_unnamed_addr,
FB_executable,
};
};
/// This data structure contains rarely used symbol fields and is optionally
/// referenced by a Symbol.
struct Uncommon {
Word CommonSize, CommonAlign;
/// COFF-specific: the name of the symbol that a weak external resolves to
/// if not defined.
Str COFFWeakExternFallbackName;
/// Specified section name, if any.
Str SectionName;
};
struct Header {
/// Version number of the symtab format. This number should be incremented
/// when the format changes, but it does not need to be incremented if a
/// change to LLVM would cause it to create a different symbol table.
Word Version;
enum { kCurrentVersion = 2 };
/// The producer's version string (LLVM_VERSION_STRING " " LLVM_REVISION).
/// Consumers should rebuild the symbol table from IR if the producer's
/// version does not match the consumer's version due to potential differences
/// in symbol table format, symbol enumeration order and so on.
Str Producer;
Range<Module> Modules;
Range<Comdat> Comdats;
Range<Symbol> Symbols;
Range<Uncommon> Uncommons;
Str TargetTriple, SourceFileName;
/// COFF-specific: linker directives.
Str COFFLinkerOpts;
/// Dependent Library Specifiers
Range<Str> DependentLibraries;
};
} // end namespace storage
/// Fills in Symtab and StrtabBuilder with a valid symbol and string table for
/// Mods.
Error build(ArrayRef<Module *> Mods, SmallVector<char, 0> &Symtab,
StringTableBuilder &StrtabBuilder, BumpPtrAllocator &Alloc);
/// This represents a symbol that has been read from a storage::Symbol and
/// possibly a storage::Uncommon.
struct Symbol {
// Copied from storage::Symbol.
StringRef Name, IRName;
int ComdatIndex;
uint32_t Flags;
// Copied from storage::Uncommon.
uint32_t CommonSize, CommonAlign;
StringRef COFFWeakExternFallbackName;
StringRef SectionName;
/// Returns the mangled symbol name.
StringRef getName() const { return Name; }
/// Returns the unmangled symbol name, or the empty string if this is not an
/// IR symbol.
StringRef getIRName() const { return IRName; }
/// Returns the index into the comdat table (see Reader::getComdatTable()), or
/// -1 if not a comdat member.
int getComdatIndex() const { return ComdatIndex; }
using S = storage::Symbol;
GlobalValue::VisibilityTypes getVisibility() const {
return GlobalValue::VisibilityTypes((Flags >> S::FB_visibility) & 3);
}
bool isUndefined() const { return (Flags >> S::FB_undefined) & 1; }
bool isWeak() const { return (Flags >> S::FB_weak) & 1; }
bool isCommon() const { return (Flags >> S::FB_common) & 1; }
bool isIndirect() const { return (Flags >> S::FB_indirect) & 1; }
bool isUsed() const { return (Flags >> S::FB_used) & 1; }
bool isTLS() const { return (Flags >> S::FB_tls) & 1; }
bool canBeOmittedFromSymbolTable() const {
return (Flags >> S::FB_may_omit) & 1;
}
bool isGlobal() const { return (Flags >> S::FB_global) & 1; }
bool isFormatSpecific() const { return (Flags >> S::FB_format_specific) & 1; }
bool isUnnamedAddr() const { return (Flags >> S::FB_unnamed_addr) & 1; }
bool isExecutable() const { return (Flags >> S::FB_executable) & 1; }
uint64_t getCommonSize() const {
assert(isCommon());
return CommonSize;
}
uint32_t getCommonAlignment() const {
assert(isCommon());
return CommonAlign;
}
/// COFF-specific: for weak externals, returns the name of the symbol that is
/// used as a fallback if the weak external remains undefined.
StringRef getCOFFWeakExternalFallback() const {
assert(isWeak() && isIndirect());
return COFFWeakExternFallbackName;
}
StringRef getSectionName() const { return SectionName; }
};
/// This class can be used to read a Symtab and Strtab produced by
/// irsymtab::build.
class Reader {
StringRef Symtab, Strtab;
ArrayRef<storage::Module> Modules;
ArrayRef<storage::Comdat> Comdats;
ArrayRef<storage::Symbol> Symbols;
ArrayRef<storage::Uncommon> Uncommons;
ArrayRef<storage::Str> DependentLibraries;
StringRef str(storage::Str S) const { return S.get(Strtab); }
template <typename T> ArrayRef<T> range(storage::Range<T> R) const {
return R.get(Symtab);
}
const storage::Header &header() const {
return *reinterpret_cast<const storage::Header *>(Symtab.data());
}
public:
class SymbolRef;
Reader() = default;
Reader(StringRef Symtab, StringRef Strtab) : Symtab(Symtab), Strtab(Strtab) {
Modules = range(header().Modules);
Comdats = range(header().Comdats);
Symbols = range(header().Symbols);
Uncommons = range(header().Uncommons);
DependentLibraries = range(header().DependentLibraries);
}
using symbol_range = iterator_range<object::content_iterator<SymbolRef>>;
/// Returns the symbol table for the entire bitcode file.
/// The symbols enumerated by this method are ephemeral, but they can be
/// copied into an irsymtab::Symbol object.
symbol_range symbols() const;
size_t getNumModules() const { return Modules.size(); }
/// Returns a slice of the symbol table for the I'th module in the file.
/// The symbols enumerated by this method are ephemeral, but they can be
/// copied into an irsymtab::Symbol object.
symbol_range module_symbols(unsigned I) const;
StringRef getTargetTriple() const { return str(header().TargetTriple); }
/// Returns the source file path specified at compile time.
StringRef getSourceFileName() const { return str(header().SourceFileName); }
/// Returns a table with all the comdats used by this file.
std::vector<StringRef> getComdatTable() const {
std::vector<StringRef> ComdatTable;
ComdatTable.reserve(Comdats.size());
for (auto C : Comdats)
ComdatTable.push_back(str(C.Name));
return ComdatTable;
}
/// COFF-specific: returns linker options specified in the input file.
StringRef getCOFFLinkerOpts() const { return str(header().COFFLinkerOpts); }
/// Returns dependent library specifiers
std::vector<StringRef> getDependentLibraries() const {
std::vector<StringRef> Specifiers;
Specifiers.reserve(DependentLibraries.size());
for (auto S : DependentLibraries) {
Specifiers.push_back(str(S));
}
return Specifiers;
}
};
/// Ephemeral symbols produced by Reader::symbols() and
/// Reader::module_symbols().
class Reader::SymbolRef : public Symbol {
const storage::Symbol *SymI, *SymE;
const storage::Uncommon *UncI;
const Reader *R;
void read() {
if (SymI == SymE)
return;
Name = R->str(SymI->Name);
IRName = R->str(SymI->IRName);
ComdatIndex = SymI->ComdatIndex;
Flags = SymI->Flags;
if (Flags & (1 << storage::Symbol::FB_has_uncommon)) {
CommonSize = UncI->CommonSize;
CommonAlign = UncI->CommonAlign;
COFFWeakExternFallbackName = R->str(UncI->COFFWeakExternFallbackName);
SectionName = R->str(UncI->SectionName);
} else
// Reset this field so it can be queried unconditionally for all symbols.
SectionName = "";
}
public:
SymbolRef(const storage::Symbol *SymI, const storage::Symbol *SymE,
const storage::Uncommon *UncI, const Reader *R)
: SymI(SymI), SymE(SymE), UncI(UncI), R(R) {
read();
}
void moveNext() {
++SymI;
if (Flags & (1 << storage::Symbol::FB_has_uncommon))
++UncI;
read();
}
bool operator==(const SymbolRef &Other) const { return SymI == Other.SymI; }
};
inline Reader::symbol_range Reader::symbols() const {
return {SymbolRef(Symbols.begin(), Symbols.end(), Uncommons.begin(), this),
SymbolRef(Symbols.end(), Symbols.end(), nullptr, this)};
}
inline Reader::symbol_range Reader::module_symbols(unsigned I) const {
const storage::Module &M = Modules[I];
const storage::Symbol *MBegin = Symbols.begin() + M.Begin,
*MEnd = Symbols.begin() + M.End;
return {SymbolRef(MBegin, MEnd, Uncommons.begin() + M.UncBegin, this),
SymbolRef(MEnd, MEnd, nullptr, this)};
}
/// The contents of the irsymtab in a bitcode file. Any underlying data for the
/// irsymtab are owned by Symtab and Strtab.
struct FileContents {
SmallVector<char, 0> Symtab, Strtab;
Reader TheReader;
};
/// Reads the contents of a bitcode file, creating its irsymtab if necessary.
Expected<FileContents> readBitcode(const BitcodeFileContents &BFC);
} // end namespace irsymtab
} // end namespace llvm
#endif // LLVM_OBJECT_IRSYMTAB_H