mirror of
https://github.com/RPCS3/llvm.git
synced 2024-12-12 06:06:32 +00:00
Xray docs with description of Flight Data Recorder binary format.
Summary: Adding a new restructuredText file to document the trace format produced with an FDR mode handler and read by llvm-xray toolset. Reviewers: dberris, pelikan Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36041 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@309836 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
parent
c4d94abf28
commit
e97c5836a7
@ -196,6 +196,9 @@ on your application, you may set the ``xray_fdr_log`` option to ``true`` in the
|
||||
``XRAY_OPTIONS`` environment variable (while also optionally setting the
|
||||
``xray_naive_log`` to ``false``).
|
||||
|
||||
When the buffers are flushed to disk, the result is a binary trace format
|
||||
described by `XRay FDR format <XRayFDRFormat.html>`_
|
||||
|
||||
When FDR mode is on, it will keep writing and recycling memory buffers until
|
||||
the logging implementation is finalized -- at which point it can be flushed and
|
||||
re-initialised later. To do this programmatically, we follow the workflow
|
||||
|
394
docs/XRayFDRFormat.rst
Normal file
394
docs/XRayFDRFormat.rst
Normal file
@ -0,0 +1,394 @@
|
||||
======================================
|
||||
XRay Flight Data Recorder Trace Format
|
||||
======================================
|
||||
|
||||
:Version: 1 as of 2017-07-20
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
|
||||
Introduction
|
||||
============
|
||||
|
||||
When gathering XRay traces in Flight Data Recorder mode, each thread of an
|
||||
application will claim buffers to fill with trace data, which at some point
|
||||
is finalized and flushed.
|
||||
|
||||
A goal of the profiler is to minimize overhead, so the flushed data directly
|
||||
corresponds to the buffer.
|
||||
|
||||
This document describes the format of a trace file.
|
||||
|
||||
|
||||
General
|
||||
=======
|
||||
|
||||
Each trace file corresponds to a sequence of events in a particular thread.
|
||||
|
||||
The file has a header followed by a sequence of discriminated record types.
|
||||
|
||||
The endianess of byte fields matches the endianess of the platform which
|
||||
produced the trace file.
|
||||
|
||||
|
||||
Header Section
|
||||
==============
|
||||
|
||||
A trace file begins with a 32 byte header.
|
||||
|
||||
+-------------------+-----------------+----------------------------------------+
|
||||
| Field | Size (bytes) | Description |
|
||||
+===================+=================+========================================+
|
||||
| version | ``2`` | Anticipates versioned readers. This |
|
||||
| | | document describes the format when |
|
||||
| | | version == 1 |
|
||||
+-------------------+-----------------+----------------------------------------+
|
||||
| type | ``2`` | An enumeration encoding the type of |
|
||||
| | | trace. Flight Data Recorder mode |
|
||||
| | | traces have type == 1 |
|
||||
+-------------------+-----------------+----------------------------------------+
|
||||
| bitfield | ``4`` | Holds parameters that are not aligned |
|
||||
| | | to bytes. Further described below. |
|
||||
+-------------------+-----------------+----------------------------------------+
|
||||
| cycle_frequency | ``8`` | The frequency in hertz of the CPU |
|
||||
| | | oscillator used to measure duration of |
|
||||
| | | events in ticks. |
|
||||
+-------------------+-----------------+----------------------------------------+
|
||||
| buffer_size | ``8`` | The size in bytes of the data portion |
|
||||
| | | of the trace following the header. |
|
||||
+-------------------+-----------------+----------------------------------------+
|
||||
| reserved | ``8`` | Reserved for future use. |
|
||||
+-------------------+-----------------+----------------------------------------+
|
||||
|
||||
The bitfield parameter of the file header is composed of the following fields.
|
||||
|
||||
+-------------------+----------------+-----------------------------------------+
|
||||
| Field | Size (bits) | Description |
|
||||
+===================+================+=========================================+
|
||||
| constant_tsc | ``1`` | Whether the platform's timestamp |
|
||||
| | | counter used to record ticks between |
|
||||
| | | events ticks at a constant frequency |
|
||||
| | | despite CPU frequency changes. |
|
||||
| | | 0 == non-constant. 1 == constant. |
|
||||
+-------------------+----------------+-----------------------------------------+
|
||||
| nonstop_tsc | ``1`` | Whether the tsc continues to count |
|
||||
| | | despite whether the CPU is in a low |
|
||||
| | | power state. 0 == stop. 1 == non-stop. |
|
||||
+-------------------+----------------+-----------------------------------------+
|
||||
| reserved | ``30`` | Not meaningful. |
|
||||
+-------------------+----------------+-----------------------------------------+
|
||||
|
||||
|
||||
Data Section
|
||||
============
|
||||
|
||||
Following the header in a trace is a data section with size matching the
|
||||
buffer_size field in the header.
|
||||
|
||||
The data section is a stream of elements of different types.
|
||||
|
||||
There are a few categories of data in the sequence.
|
||||
|
||||
- ``Function Records``: Function Records contain the timing of entry into and
|
||||
exit from function execution. Function Records have 8 bytes each.
|
||||
|
||||
- ``Metadata Records``: Metadata records serve many purposes. Mostly, they
|
||||
capture information that may be too costly to record for each function, but
|
||||
that is required to contextualize the fine-grained timings. They also are used
|
||||
as markers for user-defined Event Data payloads. Metadata records have 16
|
||||
bytes each.
|
||||
|
||||
- ``Event Data``: Free form data may be associated with events that are traced
|
||||
by the binary and encode data defined by a handler function. Event data is
|
||||
always preceded with a marker record which indicates how large it is.
|
||||
|
||||
- ``Function Arguments``: The arguments to some functions are included in the
|
||||
trace. These are either pointer addresses or primitives that are read and
|
||||
logged independently of their types in a high level language. To the tracer,
|
||||
they are all simply numbers. Function Records that have attached arguments
|
||||
will indicate their presence on the function entry record. We only support
|
||||
logging contiguous function argument sequences starting with argument zero,
|
||||
which will be the "this" pointer for member function invocations. For example,
|
||||
we don't support logging the first and third argument.
|
||||
|
||||
A reader of the memory format must maintain a state machine. The format makes no
|
||||
attempt to pad for alignment, and it is not seekable.
|
||||
|
||||
|
||||
Function Records
|
||||
----------------
|
||||
|
||||
Function Records have an 8 byte layout. This layout encodes information to
|
||||
reconstruct a call stack of instrumented function and their durations.
|
||||
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| Field | Size (bits) | Description |
|
||||
+===============+==============+===============================================+
|
||||
| discriminant | ``1`` | Indicates whether a reader should read a |
|
||||
| | | Function or Metadata record. Set to ``0`` for |
|
||||
| | | Function records. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| action | ``3`` | Specifies whether the function is being |
|
||||
| | | entered, exited, or is a non-standard entry |
|
||||
| | | or exit produced by optimizations. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| function_id | ``28`` | A numeric ID for the function. Resolved to a |
|
||||
| | | name via the xray instrumentation map. The |
|
||||
| | | instrumentation map is built by xray at |
|
||||
| | | compile time into an object file and pairs |
|
||||
| | | the function ids to addresses. It is used for |
|
||||
| | | patching and as a lookup into the binary's |
|
||||
| | | symbols to obtain names. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| tsc_delta | ``32`` | The number of ticks of the timestamp counter |
|
||||
| | | since a previous record recorded a delta or |
|
||||
| | | other TSC resetting event. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
|
||||
On little-endian machines, the bitfields are ordered from least significant bit
|
||||
bit to most significant bit. A reader can read an 8 bit value and apply the mask
|
||||
``0x01`` for the discriminant. Similarly, they can read 32 bits and unsigned
|
||||
shift right by ``0x04`` to obtain the function_id field.
|
||||
|
||||
On big-endian machine, the bitfields are written in order from most significant
|
||||
bit to least significant bit. A reader would read an 8 bit value and unsigned
|
||||
shift right by 7 bits for the discriminant. The function_id field could be
|
||||
obtained by reading a 32 bit value and applying the mask ``0x0FFFFFFF``.
|
||||
|
||||
Function action types are as follows.
|
||||
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| Type | Number | Description |
|
||||
+===============+==============+===============================================+
|
||||
| Entry | ``0`` | Typical function entry. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| Exit | ``1`` | Typical function exit. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| Tail_Exit | ``2`` | An exit from a function due to Tail call |
|
||||
| | | optimization. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| Entry_Args | ``3`` | A function entry that records arguments. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
|
||||
Entry_Args records do not contain the arguments themselves. Instead, metadata
|
||||
records for each of the logged args follow the function record in the stream.
|
||||
|
||||
|
||||
Metadata Records
|
||||
----------------
|
||||
|
||||
Interspersed throughout the buffer are 16 byte Metadata records. For typically
|
||||
instrumented binaries, they will be sparser than Function records, and they
|
||||
provide a fuller picture of the binary execution state.
|
||||
|
||||
Metadata record layout is partially record dependent, but they share a common
|
||||
structure.
|
||||
|
||||
The same bit field rules described for function records apply to the first byte
|
||||
of MetadataRecords. Within this byte, little endian machines use lsb to msb
|
||||
ordering and big endian machines use msb to lsb ordering.
|
||||
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| Field | Size | Description |
|
||||
+===============+==============+===============================================+
|
||||
| discriminant | ``1 bit`` | Indicates whether a reader should read a |
|
||||
| | | Function or Metadata record. Set to ``1`` for |
|
||||
| | | Metadata records. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| record_kind | ``7 bits`` | The type of Metadata record. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| data | ``15 bytes`` | A data field used differently for each record |
|
||||
| | | type. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
|
||||
Here is a table of the enumerated record kinds.
|
||||
|
||||
====== ===========================
|
||||
Number Type
|
||||
------ ---------------------------
|
||||
0 NewBuffer
|
||||
1 EndOfBuffer
|
||||
2 NewCPUId
|
||||
3 TSCWrap
|
||||
4 WallTimeMarker
|
||||
5 CustomEventMarker
|
||||
6 CallArgument
|
||||
|
||||
|
||||
NewBuffer Records
|
||||
-----------------
|
||||
|
||||
Each buffer begins with a NewBuffer record immediately after the header.
|
||||
It records the thread ID of the thread that the trace belongs to.
|
||||
|
||||
Its data segment is as follows.
|
||||
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| Field | Size (bytes) | Description |
|
||||
+===============+==============+===============================================+
|
||||
| thread_Id | ``2`` | Thread ID for buffer. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| reserved | ``13`` | Unused. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
|
||||
|
||||
WallClockTime Records
|
||||
---------------------
|
||||
|
||||
Following the NewBuffer record, each buffer records an absolute time as a frame
|
||||
of reference for the durations recorded by timestamp counter deltas.
|
||||
|
||||
Its data segment is as follows.
|
||||
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| Field | Size (bytes) | Description |
|
||||
+===============+==============+===============================================+
|
||||
| seconds | ``8`` | Seconds on absolute timescale. The starting |
|
||||
| | | point is unspecified and depends on the |
|
||||
| | | implementation and platform configured by the |
|
||||
| | | tracer. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| microseconds | ``4`` | The microsecond component of the time. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| reserved | ``3`` | Unused. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
|
||||
|
||||
NewCpuId Records
|
||||
----------------
|
||||
|
||||
Each function entry invokes a routine to determine what CPU is executing.
|
||||
Typically, this is done with readtscp, which reads the timestamp counter at the
|
||||
same time.
|
||||
|
||||
If the tracing detects that the execution has switched CPUs or if this is the
|
||||
first instrumented entry point, the tracer will output a NewCpuId record.
|
||||
|
||||
Its data segment is as follows.
|
||||
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| Field | Size (bytes) | Description |
|
||||
+===============+==============+===============================================+
|
||||
| cpu_id | ``2`` | CPU Id. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| absolute_tsc | ``8`` | The absolute value of the timestamp counter. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| reserved | ``5`` | Unused. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
|
||||
|
||||
TSCWrap Records
|
||||
---------------
|
||||
|
||||
Since each function record uses a 32 bit value to represent the number of ticks
|
||||
of the timestamp counter since the last reference, it is possible for this value
|
||||
to overflow, particularly for sparsely instrumented binaries.
|
||||
|
||||
When this delta would not fit into a 32 bit representation, a reference absolute
|
||||
timestamp counter record is written in the form of a TSCWrap record.
|
||||
|
||||
Its data segment is as follows.
|
||||
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| Field | Size (bytes) | Description |
|
||||
+===============+==============+===============================================+
|
||||
| absolute_tsc | ``8`` | Timestamp counter value. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| reserved | ``7`` | Unused. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
|
||||
|
||||
CallArgument Records
|
||||
--------------------
|
||||
|
||||
Immediately following an Entry_Args type function record, there may be one or
|
||||
more CallArgument records that contain the traced function's parameter values.
|
||||
|
||||
The order of the CallArgument Record sequency corresponds one to one with the
|
||||
order of the function parameters.
|
||||
|
||||
CallArgument data segment:
|
||||
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| Field | Size (bytes) | Description |
|
||||
+===============+==============+===============================================+
|
||||
| argument | ``8`` | Numeric argument (may be pointer address). |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| reserved | ``7`` | Unused. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
|
||||
|
||||
CustomEventMarker Records
|
||||
-------------------------
|
||||
|
||||
XRay provides the feature of logging custom events. This may be leveraged to
|
||||
record tracing info for RPCs or similarly trace data that is application
|
||||
specific.
|
||||
|
||||
Custom Events themselves are an unstructured (application defined) segment of
|
||||
memory with arbitrary size within the buffer. They are preceded by
|
||||
CustomEventMarkers to indicate their presence and size.
|
||||
|
||||
CustomEventMarker data segment:
|
||||
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| Field | Size (bytes) | Description |
|
||||
+===============+==============+===============================================+
|
||||
| event_size | ``4`` | Size of preceded event. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| absolute_tsc | ``8`` | A timestamp counter of the event. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
| reserved | ``3`` | Unused. |
|
||||
+---------------+--------------+-----------------------------------------------+
|
||||
|
||||
|
||||
EndOfBuffer Records
|
||||
-------------------
|
||||
|
||||
An EndOfBuffer record type indicates that there is no more trace data in this
|
||||
buffer. The reader is expected to seek past the remaining buffer_size expressed
|
||||
before the start of buffer and look for either another header or EOF.
|
||||
|
||||
|
||||
Format Grammar and Invariants
|
||||
=============================
|
||||
|
||||
Not all sequences of Metadata records and Function records are valid data. A
|
||||
sequence should be parsed as a state machine. The expectations for a valid
|
||||
format can be expressed as a context free grammar.
|
||||
|
||||
This is an attempt to explain the format with statements in EBNF format.
|
||||
|
||||
- Format := Header ThreadBuffer* EOF
|
||||
|
||||
- ThreadBuffer := NewBuffer WallClockTime NewCPUId BodySequence* End
|
||||
|
||||
- BodySequence := NewCPUId | TSCWrap | Function | CustomEvent
|
||||
|
||||
- Function := (Function_Entry_Args CallArgument*) | Function_Other_Type
|
||||
|
||||
- CustomEvent := CustomEventMarker CustomEventUnstructuredMemory
|
||||
|
||||
- End := EndOfBuffer RemainingBufferSizeToSkip
|
||||
|
||||
|
||||
Function Record Order
|
||||
---------------------
|
||||
|
||||
There are a few clarifications that may help understand what is expected of
|
||||
Function records.
|
||||
|
||||
- Functions with an Exit are expected to have a corresponding Entry or
|
||||
Entry_Args function record precede them in the trace.
|
||||
|
||||
- Tail_Exit Function records record the Function ID of the function whose return
|
||||
address the program counter will take. In other words, the final function that
|
||||
would be popped off of the call stack if tail call optimization was not used.
|
||||
|
||||
- Not all functions marked for instrumentation are necessarily in the trace. The
|
||||
tracer uses heuristics to preserve the trace for non-trivial functions.
|
||||
|
||||
- Not every entry must have a traced Exit or Tail Exit. The buffer may run out
|
||||
of space or the program may request for the tracer to finalize toreturn the
|
||||
buffer before an instrumented function exits.
|
@ -222,8 +222,8 @@ Error processCustomEventMarker(FDRState &State, uint8_t RecordFirstByte,
|
||||
DataExtractor &RecordExtractor,
|
||||
size_t &RecordSize) {
|
||||
// We can encounter a CustomEventMarker anywhere in the log, so we can handle
|
||||
// it regardless of the expectation. However, we do se the expectation to read
|
||||
// a set number of fixed bytes, as described in the metadata.
|
||||
// it regardless of the expectation. However, we do set the expectation to
|
||||
// read a set number of fixed bytes, as described in the metadata.
|
||||
uint32_t OffsetPtr = 1; // Read after the first byte.
|
||||
uint32_t DataSize = RecordExtractor.getU32(&OffsetPtr);
|
||||
uint64_t TSC = RecordExtractor.getU64(&OffsetPtr);
|
||||
@ -333,7 +333,7 @@ Error processFDRFunctionRecord(FDRState &State, uint8_t RecordFirstByte,
|
||||
}
|
||||
Record.CPU = State.CPUId;
|
||||
Record.TId = State.ThreadId;
|
||||
// Back up to read first 32 bits, including the 8 we pulled RecordType
|
||||
// Back up to read first 32 bits, including the 4 we pulled RecordType
|
||||
// and RecordKind out of. The remaining 28 are FunctionId.
|
||||
uint32_t OffsetPtr = 0;
|
||||
// Despite function Id being a signed int on XRayRecord,
|
||||
@ -369,8 +369,9 @@ Error processFDRFunctionRecord(FDRState &State, uint8_t RecordFirstByte,
|
||||
/// We expect a format complying with the grammar in the following pseudo-EBNF.
|
||||
///
|
||||
/// FDRLog: XRayFileHeader ThreadBuffer*
|
||||
/// XRayFileHeader: 32 bits to identify the log as FDR with machine metadata.
|
||||
/// ThreadBuffer: BufSize NewBuffer WallClockTime NewCPUId FunctionSequence EOB
|
||||
/// XRayFileHeader: 32 bytes to identify the log as FDR with machine metadata.
|
||||
/// Includes BufferSize
|
||||
/// ThreadBuffer: NewBuffer WallClockTime NewCPUId FunctionSequence EOB
|
||||
/// BufSize: 8 byte unsigned integer indicating how large the buffer is.
|
||||
/// NewBuffer: 16 byte metadata record with Thread Id.
|
||||
/// WallClockTime: 16 byte metadata record with human readable time.
|
||||
|
Loading…
Reference in New Issue
Block a user