llvm/docs/MarkedUpDisassembly.rst

=======================================
LLVM's Optional Rich Disassembly Output
=======================================

.. contents::
   :local:

Introduction
============

LLVM's default disassembly output is raw text. To allow consumers more ability
to introspect the instructions' textual representation or to reformat for a more
user friendly display there is an optional rich disassembly output.

This optional output is sufficient to reference into individual portions of the
instruction text. This is intended for clients like disassemblers, list file
generators, and pretty-printers, which need more than the raw instructions and
the ability to print them.

To provide this functionality the assembly text is marked up with annotations.
The markup is simple enough in syntax to be robust even in the case of version
mismatches between consumers and producers. That is, the syntax generally does
not carry semantics beyond "this text has an annotation," so consumers can
simply ignore annotations they do not understand or do not care about.

After calling ``LLVMCreateDisasm()`` to create a disassembler context the
optional output is enable with this call:

.. code-block:: c

    LLVMSetDisasmOptions(DC, LLVMDisassembler_Option_UseMarkup);

Then subsequent calls to ``LLVMDisasmInstruction()`` will return output strings
with the marked up annotations.

Instruction Annotations
=======================

.. _contextual markups:

Contextual markups
------------------

Annoated assembly display will supply contextual markup to help clients more
efficiently implement things like pretty printers. Most markup will be target
independent, so clients can effectively provide good display without any target
specific knowledge.

Annotated assembly goes through the normal instruction printer, but optionally
includes contextual tags on portions of the instruction string. An annotation
is any '<' '>' delimited section of text(1).

.. code-block:: bat

    annotation: '<' tag-name tag-modifier-list ':' annotated-text '>'
    tag-name: identifier
    tag-modifier-list: comma delimited identifier list

The tag-name is an identifier which gives the type of the annotation. For the
first pass, this will be very simple, with memory references, registers, and
immediates having the tag names "mem", "reg", and "imm", respectively.

The tag-modifier-list is typically additional target-specific context, such as
register class.

Clients should accept and ignore any tag-names or tag-modifiers they do not
understand, allowing the annotations to grow in richness without breaking older
clients.

For example, a possible annotation of an ARM load of a stack-relative location
might be annotated as:

.. code-block:: text

   ldr <reg gpr:r0>, <mem regoffset:[<reg gpr:sp>, <imm:#4>]>


1: For assembly dialects in which '<' and/or '>' are legal tokens, a literal token is escaped by following immediately with a repeat of the character.  For example, a literal '<' character is output as '<<' in an annotated assembly string.

C API Details
-------------

The intended consumers of this information use the C API, therefore the new C
API function for the disassembler will be added to provide an option to produce
disassembled instructions with annotations, ``LLVMSetDisasmOptions()`` and the
``LLVMDisassembler_Option_UseMarkup`` option (see above).
Add a bit of documentation on the annotated disassembly output. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166639 91177308-0d34-0410-b5e6-96231b3b80d8 2012-10-24 23:30:22 +00:00			`=======================================`
			`LLVM's Optional Rich Disassembly Output`
			`=======================================`

			`.. contents::`
			`:local:`

			`Introduction`
			`============`

			`LLVM's default disassembly output is raw text. To allow consumers more ability`
			`to introspect the instructions' textual representation or to reformat for a more`
			`user friendly display there is an optional rich disassembly output.`

			`This optional output is sufficient to reference into individual portions of the`
			`instruction text. This is intended for clients like disassemblers, list file`
			`generators, and pretty-printers, which need more than the raw instructions and`
			`the ability to print them.`

			`To provide this functionality the assembly text is marked up with annotations.`
			`The markup is simple enough in syntax to be robust even in the case of version`
			`mismatches between consumers and producers. That is, the syntax generally does`
			`not carry semantics beyond "this text has an annotation," so consumers can`
			`simply ignore annotations they do not understand or do not care about.`

			After calling ``LLVMCreateDisasm()`` to create a disassembler context the
			`optional output is enable with this call:`

			`.. code-block:: c`

			`LLVMSetDisasmOptions(DC, LLVMDisassembler_Option_UseMarkup);`

			Then subsequent calls to ``LLVMDisasmInstruction()`` will return output strings
			`with the marked up annotations.`

			`Instruction Annotations`
			`=======================`

			`.. _contextual markups:`

			`Contextual markups`
			`------------------`

			`Annoated assembly display will supply contextual markup to help clients more`
			`efficiently implement things like pretty printers. Most markup will be target`
			`independent, so clients can effectively provide good display without any target`
			`specific knowledge.`

			`Annotated assembly goes through the normal instruction printer, but optionally`
			`includes contextual tags on portions of the instruction string. An annotation`
			`is any '<' '>' delimited section of text(1).`

			`.. code-block:: bat`

			`annotation: '<' tag-name tag-modifier-list ':' annotated-text '>'`
			`tag-name: identifier`
			`tag-modifier-list: comma delimited identifier list`

			`The tag-name is an identifier which gives the type of the annotation. For the`
			`first pass, this will be very simple, with memory references, registers, and`
			`immediates having the tag names "mem", "reg", and "imm", respectively.`

			`The tag-modifier-list is typically additional target-specific context, such as`
			`register class.`

			`Clients should accept and ignore any tag-names or tag-modifiers they do not`
			`understand, allowing the annotations to grow in richness without breaking older`
			`clients.`

			`For example, a possible annotation of an ARM load of a stack-relative location`
			`might be annotated as:`

[docs] Fixing Sphinx warnings to unclog the buildbot Lots of blocks had "llvm" or "nasm" syntax types but either weren't following the syntax, or the syntax has changed (and sphinx hasn't keep up) or the type doesn't even exist (nasm?). Other documents had :options: what were invalid. I only removed those that had warnings, and left the ones that didn't, in order to follow the principle of least surprise. This is like this for ages, but the buildbot is now failing on errors. It may take a while to upgrade the buildbot's sphinx, if that's even possible, but that shouldn't stop us from getting docs updates (which seem down for quite a while). Also, we're not losing any syntax highlight, since when it doesn't parse, it doesn't colour. Ie. those blocks are not being highlighted anyway. I'm trying to get all docs in one go, so that it's easy to revert later if we do fix, or at least easy to know what's to fix. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@276109 91177308-0d34-0410-b5e6-96231b3b80d8 2016-07-20 12:16:38 +00:00			`.. code-block:: text`
Add a bit of documentation on the annotated disassembly output. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@166639 91177308-0d34-0410-b5e6-96231b3b80d8 2012-10-24 23:30:22 +00:00
			`ldr <reg gpr:r0>, <mem regoffset:[<reg gpr:sp>, <imm:#4>]>`


			`1: For assembly dialects in which '<' and/or '>' are legal tokens, a literal token is escaped by following immediately with a repeat of the character. For example, a literal '<' character is output as '<<' in an annotated assembly string.`

			`C API Details`
			`-------------`

			`The intended consumers of this information use the C API, therefore the new C`
			`API function for the disassembler will be added to provide an option to produce`
			disassembled instructions with annotations, ``LLVMSetDisasmOptions()`` and the
			``LLVMDisassembler_Option_UseMarkup`` option (see above).