This adds a visitor that is capable of accessing type
records randomly and caching intermediate results that it
learns about during partial linear scans. This yields
amortized O(1) access to a type stream even though type
streams cannot normally be indexed.
Differential Revision: https://reviews.llvm.org/D33009
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302936 91177308-0d34-0410-b5e6-96231b3b80d8
Previously type visitation was done strictly sequentially, and
TypeIndexes were computed by incrementing the TypeIndex of the
last visited record. This works fine for situations like dumping,
but not when you want to visit types in random order. For example,
in a debug session someone might lookup a symbol by name, find that
it has TypeIndex 10,000 and then want to go straight to TypeIndex
10,000.
In order to make this work, the visitation framework needs a mode
where it can plumb TypeIndices through the callback pipeline. This
patch adds such a mode. In doing so, it is necessary to provide
an alternative implementation of TypeDatabase that supports random
access, so that is done as well.
Nothing actually uses these random access capabilities yet, but
this will be done in subsequent patches.
Differential Revision: https://reviews.llvm.org/D32928
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302454 91177308-0d34-0410-b5e6-96231b3b80d8
Most of the time we know exactly how many type records we
have in a list, and we want to use the visitor to deserialize
them into actual records in a database. Previously we were
just using push_back() every time without reserving the space
up front in the vector. This is obviously terrible from a
performance standpoint, and it's not uncommon to have PDB
files with half a million type records, where the performance
degredation was quite noticeable.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302302 91177308-0d34-0410-b5e6-96231b3b80d8
The raw CodeView format references strings by "offsets", but it's
confusing what table the offset refers to. In the case of line
number information, it's an offset into a buffer of records,
and an indirection is required to get another offset into a
different table to find the final string. And in the case of
checksum information, there is no indirection, and the offset
refers directly to the location of the string in another buffer.
This would be less confusing if we always just referred to the
strings by their value, and have the library be smart enough
to correctly resolve the offsets on its own from the right
location.
This patch makes that possible. When either reading or writing,
all the user deals with are strings, and the library does the
appropriate translations behind the scenes.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302053 91177308-0d34-0410-b5e6-96231b3b80d8
llvm-readobj hand rolls some CodeView parsing code for string
tables, so this patch updates it to re-use some of the newly
introduced parsing code in LLVMDebugInfoCodeView.
Differential Revision: https://reviews.llvm.org/D32772
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302052 91177308-0d34-0410-b5e6-96231b3b80d8
This was reverted due to a "missing" file, but in reality
what happened was that I renamed a file, and then due to
a merge conflict both the old file and the new file got
added to the repository. This led to an unused cpp file
being in the repo and not referenced by any CMakeLists.txt
but #including a .h file that wasn't in the repo. In an
even more unfortunate coincidence, CMake didn't report the
unused cpp file because it was in a subdirectory of the
folder with the CMakeLists.txt, and not in the same directory
as any CMakeLists.txt.
The presence of the unused file was then breaking certain
tools that determine file lists by globbing rather than
by what's specified in CMakeLists.txt
In any case, the fix is to just remove the unused file from
the patch set.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302042 91177308-0d34-0410-b5e6-96231b3b80d8
The patch is failing to add StringTableStreamBuilder.h, but that isn't
even discovered because the corresponding StringTableStreamBuilder.cpp
isn't added to any CMakeLists.txt file and thus never built. I think
this patch is just incomplete.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@302002 91177308-0d34-0410-b5e6-96231b3b80d8
This was reported by the ASAN bot, and it turned out to be
a fairly fundamental problem with the design of VarStreamArray
and the way it passes context information to the extractor.
The fix was cumbersome, and I'm not entirely pleased with it,
so I plan to revisit this design in the future when I'm not
pressed to get the bots green again. For now, this fixes
the issue by storing the context information by value instead
of by reference, and introduces some impossibly-confusing
template magic to make things "work".
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301999 91177308-0d34-0410-b5e6-96231b3b80d8
Previously we had knowledge of how to serialize and deserialize
a string table inside of DebugInfo/PDB, but the string table
that it serializes contains a piece that is actually considered
CodeView and can appear outside of a PDB. We already have logic
in llvm-readobj and MCCodeView to read and write this format,
so it doesn't make sense to duplicate the logic in DebugInfoPDB
as well.
This patch makes codeview::StringTable (for writing) and
codeview::StringTableRef (for reading), updates DebugInfoPDB
to use these classes for its own writing, and updates llvm-readobj
to additionally use StringTableRef for reading.
It's a bit more difficult to get MCCodeView to use this for
writing, but it's a logical next step.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301986 91177308-0d34-0410-b5e6-96231b3b80d8
Previously we wrote line information and file checksum
information, but we did not write information about inlinee
lines and functions. This patch adds support for that.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301936 91177308-0d34-0410-b5e6-96231b3b80d8
In preparation for introducing writing capabilities for each of
these classes, I would like to adopt a Foo / FooRef naming
convention, where Foo indicates that the class can manipulate and
serialize Foos, and FooRef indicates that it is an immutable view of
an existing Foo. In other words, Foo is a writer and FooRef is a
reader. This patch names some existing readers to conform to the
FooRef convention, while offering no functional change.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301810 91177308-0d34-0410-b5e6-96231b3b80d8
There is a lot of duplicate code for printing line info between
YAML and the raw output printer. This introduces a base class
that can be shared between the two, and makes some minor
cleanups in the process.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301728 91177308-0d34-0410-b5e6-96231b3b80d8
Previously parsing of these were all grouped together into a
single master class that could parse any type of debug info
fragment.
With writing forthcoming, the complexity of each individual
fragment is enough to warrant them having their own classes so
that reading and writing of each fragment type can be grouped
together, but isolated from the code for reading and writing
other fragment types.
In doing so, I found a place where parsing code was duplicated
for the FileChecksums fragment, across llvm-readobj and the
CodeView library, and one of the implementations had a bug.
Now that the codepaths are merged, the bug is resolved.
Differential Revision: https://reviews.llvm.org/D32547
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301557 91177308-0d34-0410-b5e6-96231b3b80d8
We have a lot of very similarly named classes related to
dealing with module debug info. This patch has NFC, it just
renames some classes to be more descriptive (albeit slightly
more to type). The mapping from old to new class names is as
follows:
Old | New
ModInfo | DbiModuleDescriptor
ModuleSubstream | ModuleDebugFragment
ModStream | ModuleDebugStream
With the corresponding Builder classes renamed accordingly.
Differential Revision: https://reviews.llvm.org/D32506
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301555 91177308-0d34-0410-b5e6-96231b3b80d8
We were already parsing and dumping this to the human readable
format, but not to the YAML format. This does so, in preparation
for reading it in and reconstructing the line information from
YAML.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@301357 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
MASM can produce type streams that are not topologically sorted. It can
even produce type streams with circular references, but those are not
common in practice.
Reviewers: inglorion, ruiu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31629
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@299403 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
When dumping these records from an object file section, we should use
only one type database. However, when dumping from a PDB, we should use
two: one for the type stream and one for the IPI stream.
Certain type records that normally live in the .debug$T object file
section get moved over to the IPI stream of the PDB file and they get
new indices.
So far, I've noticed that the MSVC linker always moves these records
into IPI:
- LF_FUNC_ID
- LF_MFUNC_ID
- LF_STRING_ID
- LF_SUBSTR_LIST
- LF_BUILDINFO
- LF_UDT_MOD_SRC_LINE
These records have index fields that can point into TPI or IPI. In
particular, LF_SUBSTR_LIST and LF_BUILDINFO point to LF_STRING_ID
records to describe compilation command lines.
I've modified the dumper to have an optional pointer to the item DB, and
to do type name lookup of these fields in that DB. See printItemIndex.
The result is that our pdbdump-headers.test is more faithful to the PDB
contents and the output is less confusing.
Reviewers: ruiu
Subscribers: amccarth, zturner, llvm-commits
Differential Revision: https://reviews.llvm.org/D31309
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298649 91177308-0d34-0410-b5e6-96231b3b80d8
Summary:
This removes the 'remapTypeIndices' method on every TypeRecord class. My
original idea was that this would be the beginning of some kind of
generic entry point that would enumerate all of the TypeIndices inside
of a TypeRecord, so that we could write generic graph algorithms for
them without duplicating the knowledge of which fields are type index
fields everywhere. This never happened, and nothing else uses this
method. I need to change the API to deal with merging into IPI streams,
so let's move it into the file that uses it first.
Reviewers: zturner, ruiu
Reviewed By: zturner, ruiu
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D31267
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298564 91177308-0d34-0410-b5e6-96231b3b80d8
They are structurally the same, but now we need to distinguish them
because one record lives in the IPI stream and the other lives in TPI.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@298474 91177308-0d34-0410-b5e6-96231b3b80d8
In doing so I discovered that we completely ignore some bytes
of the PDB Stream after we "finish" loading it. These bytes
seem to specify some additional information about what kind
of data is present in the PDB. A subsequent patch will add
code to read in those fields and store their values.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297983 91177308-0d34-0410-b5e6-96231b3b80d8
Previously we could round-trip type records from PDB -> Yaml ->
PDB, but for symbols we could only go from PDB -> Yaml. This
completes the round-tripping for symbols as well.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@297625 91177308-0d34-0410-b5e6-96231b3b80d8
Before the endianness was specified on each call to read
or write of the StreamReader / StreamWriter, but in practice
it's extremely rare for streams to have data encoded in
multiple different endiannesses, so we should optimize for the
99% use case.
This makes the code cleaner and more general, but otherwise
has NFC.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296415 91177308-0d34-0410-b5e6-96231b3b80d8
This was reverted because it was breaking some builds, and
because of incorrect error code usage. Since the CL was
large and contained many different things, I'm resubmitting
it in pieces.
This portion is NFC, and consists of:
1) Renaming classes to follow a consistent naming convention.
2) Fixing the const-ness of the interface methods.
3) Adding detailed doxygen comments.
4) Fixing a few instances of passing `const BinaryStream& X`. These
are now passed as `BinaryStreamRef X`.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296394 91177308-0d34-0410-b5e6-96231b3b80d8
r296215, "[PDB] General improvements to Stream library."
r296217, "Disable BinaryStreamTest.StreamReaderObject temporarily."
r296220, "Re-enable BinaryStreamTest.StreamReaderObject."
r296244, "[PDB] Disable some tests that are breaking bots."
r296249, "Add static_cast to silence -Wc++11-narrowing."
std::errc::no_buffer_space should be used for OS-oriented errors for socket transmission.
(Seek discussions around llvm/xray.)
I could substitute s/no_buffer_space/others/g, but I revert whole them ATM.
Could we define and use LLVM errors there?
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296258 91177308-0d34-0410-b5e6-96231b3b80d8
This adds various new functionality and cleanup surrounding the
use of the Stream library. Major changes include:
* Renaming of all classes for more consistency / meaningfulness
* Addition of some new methods for reading multiple values at once.
* Full suite of unit tests for reader / writer functionality.
* Full set of doxygen comments for all classes.
* Streams now store their own endianness.
* Fixed some bugs in a few of the classes that were discovered
by the unit tests.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296215 91177308-0d34-0410-b5e6-96231b3b80d8
This is part of a larger effort to get the Stream code moved
up to Support. I don't want to do it in one large patch, in
part because the changes are so big that it will treat everything
as file deletions and add, losing history in the process.
Aside from that though, it's just a good idea in general to
make small changes.
So this change only changes the names of the Stream related
source files, and applies necessary source fix ups.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296211 91177308-0d34-0410-b5e6-96231b3b80d8
In an effort to generalize this so it can be used by more than
just PDB code, we shouldn't assume little endian.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295525 91177308-0d34-0410-b5e6-96231b3b80d8
Some PDBs or object files can contain references to other PDBs
where the real type information lives. When this happens,
all type indices in the original PDB are meaningless because
their records are not there.
With this patch we add the ability to pull type info from those
secondary PDBs.
Differential Revision: https://reviews.llvm.org/D29973
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@295382 91177308-0d34-0410-b5e6-96231b3b80d8
Previously, mergeTypeStreams returns only true or false, so it was
impossible to know the reason if it failed. This patch changes the
function signature so that it returns an Error object.
Differential Revision: https://reviews.llvm.org/D29362
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293820 91177308-0d34-0410-b5e6-96231b3b80d8
This introduces the `analyze` subcommand. For now there is only
one option, to analyze hash collisions in the type streams. In
the future, however, we could add many more things here, such
as performing size analyses, compacting, and statistics about
the type of records etc.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@293795 91177308-0d34-0410-b5e6-96231b3b80d8
With some minor manual fixes for using function_ref instead of
std::function. No functional change intended.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291904 91177308-0d34-0410-b5e6-96231b3b80d8
Previously the type dumper itself was passed around to a lot of different
places and manipulated in ways that were more appropriate on the type
database. For example, the entire TypeDumper was passed into the symbol
dumper, when all the symbol dumper wanted to do was lookup the name of a
TypeIndex so it could print it. That's what the TypeDatabase is for --
mapping type indices to names.
Another example is how if the user runs llvm-pdbdump with the option to
dump symbols but not types, we still have to visit all types so that we
can print minimal information about the type of a symbol, but just without
dumping full symbol records. The way we did this before is by hacking it
up so that we run everything through the type dumper with a null printer,
so that the output goes to /dev/null. But really, we don't need to dump
anything, all we want to do is build the type database. Since
TypeDatabaseVisitor now exists independently of TypeDumper, we can do
this. We just build a custom visitor callback pipeline that includes a
database visitor but not a dumper.
All the hackery around printers etc goes away. After this patch, we could
probably even delete the entire CVTypeDumper class since really all it is
at this point is a thin wrapper that hides the details of how to build a
useful visitation pipeline. It's not a priority though, so CVTypeDumper
remains for now.
After this patch we will be able to easily plug in a different style of
type dumper by only implementing the proper visitation methods to dump
one-line output and then sticking it on the pipeline.
Differential Revision: https://reviews.llvm.org/D28524
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291724 91177308-0d34-0410-b5e6-96231b3b80d8
We were starting to get some name clashes between llvm-pdbdump
and the common CodeView framework, so I took this opportunity
to rename a bunch of files to more accurately describe their
usage. This also helps in llvm-pdbdump to distinguish
between different files and whether they are used for pretty
dump mode or raw dump mode.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291627 91177308-0d34-0410-b5e6-96231b3b80d8
This creates a centralized class in which to store type records.
It stores types as an array of entries, which matches the
notion of a type stream being a topologically sorted DAG.
Logic to build up such a database was already being used in
CVTypeDumper, so CVTypeDumper is now updated to to read from
a TypeDatabase which is filled out by an earlier visitor in
the pipeline.
Differential Revision: https://reviews.llvm.org/D28486
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@291626 91177308-0d34-0410-b5e6-96231b3b80d8
The original patch was broken due to some undefined behavior
as well as warnings that were triggering -Werror.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@290000 91177308-0d34-0410-b5e6-96231b3b80d8