arkcompiler_runtime_core/docs/file_format.md
huangyu c658ccf319 Update runtime_core code
Issue: https://gitee.com/openharmony/arkcompiler_runtime_core/issues/I5G96F
Test: Test262 suit, ark unittest, rk3568 XTS, ark previewer demo

Signed-off-by: huangyu <huangyu76@huawei.com>
Change-Id: I3f63d129a07deaa27a390f556dcaa5651c098185
2022-07-17 10:20:32 +08:00

54 KiB

Panda Binary File Format

This document describes Panda binary file format with the following goals in mind:

  • Compactness.
  • Support for fast access to information.
  • Support for low memory footprint.
  • Extensibility and compatibility.

Compactness

Many mobile applications use a lot of types, methods and fields. Their number is so large that it doesn't fit in 16-bit unsigned integer. It leads to application developer have to create several files and as a result not all data can be deduplicated.

Current binary file format should extend these limits to conform to the modern requirements.

To achieve this, all references in the binary file are 4 bytes long. It allows to have 4Gb for addressing fields, methods, classes, etc.

The format uses TaggedValue which allows to store only information we have and avoid 0 offsets to absent information.

But to achieve more compactness 16-bit indexes are used to refer classes, methods and fields in the bytecode and some metadata. File can contain multiple indexes each one covers part of the file and described by RegionHeader.

Fast information access

Binary file format should support fast access to information. It means that redundant references should be avoided. Also, if it possible, binary file format should avoid data indexes (like sorted list of strings). However, the described binary format supports one index: a sorted list of offsets to classes. This index is compact and allows to find a type definition quickly, which runtime requires a lot during application launch time.

All classes, fields and methods are separated into 2 groups: foreign and local. Foreign classes, fields and methods are declared in other files, with references from the current binary file. Local classes, fields and methods are declared in the current file. Local entities has the same header as the corresponding foreign. So having an offset to an entity it doesn't matter whether the entity is local or foreign.

Runtime can easily check type of an offset by checking it is in the foreign region ([foreign_off; foreign_off + foreign_size)). Depending on the result runtime can search the entity in other files (for foreign entities) or create a runtime object from the definition by the offset (for local entities).

To improve data access speed most data structures have 4 bytes alignment. Since most target architectures are little endian all multibyte values are little endian.

Offsets

Unless otherwise specified, all offsets are calculated from the beginning of the file. An offset cannot contain values in the range [0; 32), except for specially mentioned cases.

Support for low memory footprint

As practice shows, most of file data is not used by the application. It means memory footprint of a file may be significantly reduced by grouping frequently used data. To support this feature, the described binary file format uses offsets and doesn't specify how structures should be located relatively to each other.

Extensibility and compatibility

The binary file format supports future changes via version number. The version field in the header is 4 bytes long and is encoded as byte array to avoid misinterpretation on platforms with different endianness.

Any tool which supports format version N must support format version N - 1 too.

Data types

Type Description
uint8_t 8-bit unsigned integer value
uint16_t 16-bit unsigned integer value
uint32_t 32-bit little endian unsigned integer value.
uleb128 unsigned integer value in leb128 encoding.
sleb128 signed integer value in leb128 encoding.

MUTF-8 Encoding

Binary file format uses MUTF-8 (Modified UTF-8) encoding for strings.

String

Alignment: none

Format:

Name Format Description
utf16_length uleb128 len << 1 | is_ascii where len is the length of the string in UTF-16 code units.
data uint8_t[] 0-terminated character sequence in MUTF-8 encoding.

TaggedValue

Alignment: none

Format:

Name Format Description
tag_value uint8_t The first 8 bits contain tag which determines the meaning of the data. Depending on the tag there may be optional data. Runtime must be able to determine size of the data.
data uint8_t[] Optional payload.

String syntax

TypeDescriptor

TypeDescriptor -> PrimitiveType | ArrayType | RefType
PrimitiveType  -> 'Z' | 'B' | 'H' | 'S' | 'C' | 'I' | 'U' | 'J' | 'Q' | 'F' | 'D' | 'A'
ArrayType      -> '[' TypeDescriptor
RefType        -> 'L' ClassName ';'

PrimitiveType is a one letter encoding for primitive type

Type Encoding
u1 Z
i8 B
u8 H
i16 S
u16 C
i32 I
u32 U
f32 F
f64 D
i64 J
u64 Q
any A

ClassName is a qualified name of a class with . replaced with /.

Access flags

Field access flags

Name Value Description
ACC_PUBLIC 0x0001 Declared public; may be accessed from outside its package.
ACC_PRIVATE 0x0002 Declared private; usable only within the defining class.
ACC_PROTECTED 0x0004 Declared protected; may be accessed within subclasses.
ACC_STATIC 0x0008 Declared static.
ACC_FINAL 0x0010 Declared final; never directly assigned to after object construction (JLS §17.5).
ACC_VOLATILE 0x0040 Declared volatile; cannot be cached.
ACC_TRANSIENT 0x0080 Declared transient; not written or read by a persistent object manager.
ACC_SYNTHETIC 0x1000 Declared synthetic; not present in the source code.
ACC_ENUM 0x4000 Declared as an element of an enum.

Method access flags

Name Value Description
ACC_PUBLIC 0x0001 Declared public; may be accessed from outside its package.
ACC_PRIVATE 0x0002 Declared private; accessible only within the defining class.
ACC_PROTECTED 0x0004 Declared protected; may be accessed within subclasses.
ACC_STATIC 0x0008 Declared static.
ACC_FINAL 0x0010 Declared final; must not be overridden.
ACC_SYNCHRONIZED 0x0020 Declared synchronized; invocation is wrapped by a monitor use.
ACC_BRIDGE 0x0040 A bridge method, generated by the compiler.
ACC_VARARGS 0x0080 Declared with variable number of arguments.
ACC_NATIVE 0x0100 Declared native;
ACC_ABSTRACT 0x0400 Declared abstract; no implementation is provided.
ACC_STRICT 0x0800 Declared strictfp; floating-point mode is FP-strict.
ACC_SYNTHETIC 0x1000 Declared synthetic; not present in the source code.

Class access flags

Name Value Description
ACC_PUBLIC 0x0001 Declared public; may be accessed from outside its package.
ACC_FINAL 0x0010 Declared final; no subclasses allowed.
ACC_SUPER 0x0020 No special meaning, exists for compatibility
ACC_INTERFACE 0x0200 Is an interface, not a class.
ACC_ABSTRACT 0x0400 Declared abstract; must not be instantiated.
ACC_SYNTHETIC 0x1000 Declared synthetic; not present in the source code.
ACC_ANNOTATION 0x2000 Declared as an annotation type.
ACC_ENUM 0x4000 Declared as an enum type.

Source language

A file can be emitted from sources that are written in following languages:

Name Value
Reserved 0x00
Panda Assembly 0x01
Reserved 0x02 - 0x0f

Source language can be specified for classes and methods. For classes Panda Assembly language is assumed by default. Default language for methods is the class's one.

Data layout

A file begins with the header which is located at offset 0. All other data can be reached from the header.

Header

Alignment: 4 bytes

Format:

Name Format Description
magic uint8_t[8] Magic string. Must be 'P' 'A' 'N' 'D' 'A' '\0' '\0' '\0'
checksum uint8_t[4] adler32 checksum of the file except magic and checksum fields.
version uint8_t[4] Version of the format. Current version is 0002.
file_size uint32_t Size of the file in bytes.
foreign_off uint32_t Offset to the foreign region. The region must contain elements only of types ForeignField, ForeignMethod, or ForeignClass. It is not necessary foreign_off points to the first entity. Runtime should use foreign_off and foreign_size to determine type of an offset.
foreign_size uint32_t Size of the foreign region in bytes.
num_classes uint32_t Number of classes defined in the file. Also this is the number of elements in the ClassIndex structure.
class_idx_off uint32_t Offset to the class index structure. The offset must point to a structure in ClassIndex format.
num_lnps uint32_t Number of line number programs in the file. Also this is the number of elements in the LineNumberProgramIndex structure.
lnp_idx_off uint32_t Offset to the line number program index structure. The offset must point to a structure in LineNumberProgramIndex format.
num_literalarrays uint32_t Number of literalArrays defined in the file. Also this is the number of elements in the LiteralArrayIndex structure.
literalarray_idx_off uint32_t Offset to the literalarray index structure. The offset must point to a structure in LiteralArrayIndex format.
num_index_regions uint32_t Number of the index regions in the file. Also this is the number of elements in the RegionIndex structure.
index_section_off uint32_t Offset to the index section. The offset must point to a structure in RegionIndex format.

Constraint: size of header must be > 16 bytes. FieldType uses this fact.

RegionHeader

To address file structures using 16-bit indexes file is split into regions. Each region has class, method, field and proto indexes and described by RegionHeader structure.

Alignment: 4 bytes

Format:

Name Format Description
start_off uint32_t Start offset of the region.
end_off uint32_t End offset of the region.
class_idx_size uint32_t Number of elements in the ClassRegionIndex structure. Max value is 65536.
class_idx_off uint32_t Offset to the class index structure. The offset must point to a structure in ClassRegionIndex format.
method_idx_size uint32_t Number of elements in the MethodRegionIndex structure. Max value is 65536.
method_idx_off uint32_t Offset to the method index structure. The offset must point to a structure in MethodRegionIndex format.
field_idx_size uint32_t Number of elements in the FieldRegionIndex structure. Max value is 65536.
field_idx_off uint32_t Offset to the field index structure. The offset must point to a structure in FieldRegionIndex format.
proto_idx_size uint32_t Number of elements in the ProtoRegionIndex structure. Max value is 65536.
proto_idx_off uint32_t Offset to the proto index structure. The offset must point to a structure in ProtoRegionIndex format.

Constraint: regions must not overlap each other.

RegionIndex

RegionIndex structure is aimed to allow runtime to find index structure that covers specified offset in the file. It is organized as an array of RegionHeader structures. All regions are sorted by the start offset of the region. Number of elements in the index is num_index_regions from Header.

Alignment: 4 bytes

ForeignField

Alignment: none

Format:

Name Format Description
class_idx uint16_t Index of the declaring class in a ClassRegionIndex structure. Corresponding index entry must be an offset to a Class or a ForeignClass.
type_idx uint16_t Index of the field's type in a ClassRegionIndex structure. Corresponding index entry must be in FieldType format.
name_off uint32_t Offset to the name of the field. The offset must point to a String

Note: Proper region index to resolve class_idx and type_idx can be found by foreign field's offset.

Field

Alignment: none

Format:

Name Format Description
class_idx uint16_t Index of the declaring class in a ClassRegionIndex structure. Corresponding index entry must be an offset to a Class.
type_idx uint16_t Index of the field's type in a ClassRegionIndex structure. Corresponding index entry must be in FieldType format.
name_off uint32_t Offset to the name of the field. The offset must point to a String
access_flags uleb128 Access flags of the field. The value must be a combination of the Field access flags.
field_data TaggedValue[] Variable length list of tagged values. Each element must have type TaggedValue. Tag must have values from FieldTag and follow in order of increasing tag (except 0x00 tag).

Note: Proper region index to resolve class_idx and type_idx can be found by field's offset.

FieldTag

Name Tag Quantity Data format Description
NOTHING 0x00 1 none No more values. The value with this tag must be the last.
INT_VALUE 0x01 0-1 sleb128 Integral value of the field. This tag is used when the field has type boolean, byte, char, short or int.
VALUE 0x02 0-1 uint8_t[4] Contains value in the Value format.
RUNTIME_ANNOTATIONS 0x03 >=0 uint8_t[4] Offset to runtime visible annotation of the field. The tag may be repeated in case the field has several annotations. The offset must point to the value in Annotation format.
ANNOTATIONS 0x04 >=0 uint8_t[4] Offset to runtime invisible annotation of the field. The tag may be repeated in case the field has several annotations. The offset must point to the value in Annotation format.
RUNTIME_TYPE_ANNOTATION 0x05 >=0 uint8_t[4] Offset to runtime visible type annotation of the field. The tag may be repeated in case the field has several annotations. The offset must point to the value in Annotation format.
TYPE_ANNOTATION 0x06 >=0 uint8_t[4] Offset to runtime invisible type annotation of the field. The tag may be repeated in case the field has several annotations. The offset must point to the value in Annotation format.

Note: Only INT_VALUE or VALUE tags must be present.

FieldType

Since the first bytes of the file contain the header and size of the header > 16 bytes, any offset in the range [0; sizeof(Header)) is invalid. FieldType encoding uses this fact to encode primitive types of the field in the low 4 bits. For non-primitive type the value is an offset to Class or to ForeignClass. In both cases FieldType is uint32_t.

Primitive types are encoded as follows:

Type Code
u1 0x00
i8 0x01
u8 0x02
i16 0x03
u16 0x04
i32 0x05
u32 0x06
f32 0x07
f64 0x08
i64 0x09
u64 0x0a
any 0x0b

ForeignMethod

Alignment: none

Format:

Name Format Description
class_idx uint16_t Index of the declaring class in a ClassRegionIndex structure. Corresponding index entry must be an offset to a Class or a ForeignClass.
proto_idx uint16_t Index of the method's prototype in a ProtoRegionIndex structure. Corresponding index entry must be an offset to a Proto.
name_off uint32_t Offset to the name of the method. The offset must point to a String.
access_flags uleb128 Access flags of the method. The value must be a combination of Method access flags. For foreign methods, only ACC_STATIC flag is used, other flags should be ignored.

Note: Proper region index to resolve class_idx and proto_idx can be found by foreign method's offset.

Method

Alignment: none

Format:

Name Format Description
class_idx uint16_t Index of the declaring class in a ClassRegionIndex structure. Corresponding index entry must be an offset to a Class.
proto_idx uint16_t Index of the method's prototype in a ProtoRegionIndex structure. Corresponding index entry must be an offset to a Proto.
name_off uint32_t Offset to the name of the method. The offset must point to a String.
access_flags uleb128 Access flags of the method. The value must be a combination of Method access flags.
method_data TaggedValue[] Variable length list of tagged values. Each element must have type TaggedValue. Tag must have values from MethodTag and follow in order of increasing tag (except 0x00 tag).

Note: Proper region index to resolve class_idx and proto_idx can be found by method's offset.

MethodTag

Name Tag Quantity Data format Description
NOTHING 0x00 1 none No more values. The value with this tag must be the last.
CODE 0x01 0-1 uint8_t[4] Data represents the offset to method's code. The offset must point to Code.
SOURCE_LANG 0x02 0-1 uint8_t Data represents the source language.
RUNTIME_ANNOTATION 0x03 >=0 uint8_t[4] Data represents the offset to runtime visible annotation of the method. The tag may be repeated in case the method has several annotations. The offset must point to the value in Annotation format.
RUNTIME_PARAM_ANNOTATION 0x04 0-1 uint8_t[4] Data represents the offset to the runtime visible annotations of the method's parameters. The offset must point to the value in ParamAnnotations format.
DEBUG_INFO 0x05 0-1 uint8_t[4] Data represents the offset to debug information related to the method. The offset must point to DebugInfo.
ANNOTATION 0x06 >=0 uint8_t[4] Data represents the offset to runtime invisible annotation of the method. The tag may be repeated in case the method has several annotations. The offset must point to the value in Annotation format.
PARAM_ANNOTATION 0x07 0-1 uint8_t[4] Data represents the offset to the runtime invisible annotations of the method's parameters. The offset must point to the value in ParamAnnotations format.
TYPE_ANNOTATION 0x08 >=0 uint8_t[4] Data represents the offset to runtime invisible type annotation of the method. The tag may be repeated in case the method has several annotations. The offset must point to the value in Annotation format.
RUNTIME_TYPE_ANNOTATION 0x09 >=0 uint8_t[4] Data represents the offset to runtime visible type annotation of the method. The tag may be repeated in case the method has several annotations. The offset must point to the value in Annotation format.

ForeignClass

Alignment: none

Format:

Name Format Description
name String Name of the foreign type. The name must conform to TypeDescriptor syntax.

Class

Alignment: none

Format:

Name Format Description
name String Name of the class. The name must conform to TypeDescriptor syntax.
super_class_off uint8_t[4] Offset to the name of the super class or 0 for root object class (panda.Object in Panda Assembly, plugin-specific in plugins). Non-zero offset must point to a ForeignClass or to a Class.
access_flags uleb128 Access flags of the class. The value must be a combination of Class access flags.
num_fields uleb128 Number of fields the class has.
num_methods uleb128 Number of methods the class has.
class_data TaggedValue[] Variable length list of tagged values. Each element must have type TaggedValue. Tag must have values from ClassTag and follow in order of increasing tag (except 0x00 tag).
fields Field[] Class fields. Number of elements is num_fields. Each element must have Field format.
methods Method[] Class methods. Number of elements is num_methods. Each element must have Method format.

ClassTag

Name Tag Quantity Data format Description
NOTHING 0x00 1 none No more values. The value with this tag must be the last.
INTERFACES 0x01 0-1 uleb128 uint8_t[] List of interfaces the class implements. Data contains number of interfaces encoded in uleb128 format followed by indexes of the interfaces in a ClassRegionIndex structure. Each index is 2 bytes long and must be resolved to offset point to a ForeignClass or to a Class. Number of indexes is equal to number of interfaces.
SOURCE_LANG 0x02 0-1 uint8_t Data represents the source language.
RUNTIME_ANNOTATION 0x03 >=0 uint8_t[4] Offset to runtime visible annotation of the class. The tag may be repeated in case the class has several annotations. The offset must point to the value in Annotation format.
ANNOTATION 0x04 >=0 uint8_t[4] Offset to runtime invisible annotation of the class. The tag may be repeated in case the class has several annotations. The offset must point to the value in Annotation format.
RUNTIME_TYPE_ANNOTATION 0x05 >=0 uint8_t[4] Offset to runtime visible type annotation of the class. The tag may be repeated in case the class has several annotations. The offset must point to the value in Annotation format.
TYPE_ANNOTATION 0x06 >=0 uint8_t[4] Offset to runtime invisible type annotation of the class. The tag may be repeated in case the class has several annotations. The offset must point to the value in Annotation format.
SOURCE_FILE 0x07 0-1 uint8_t[4] Offset to a file name string containing source code of this class.

Note: Proper region index to resolve interfaces indexes can be found by class's offset.

LiteralArray

Alignment: none

Name Format Description
num_literals uint32_t num of literals that a literalarray has.
literals literal[] Array of literal in one LiteralArray. The array has num_literals elements in Literal format.

Proto

Alignment: 2 bytes

Format:

Name Format Description
shorty uint16_t[] Short representation of the prototype. Encoding of the shorty is described in Shorty.
reference_types uint16_t[] Array of indexes of the method's signature non-primitive types. For each non-primitive type in the shorty there is the corresponding element in the array. Size of the array is equals to number of reference types in the shorty.

Note: Proper region index to resolve reference types indexes can be found by proto's offset.

Shorty

Shorty is a short description of method's signature without detailed information about reference types. A shorty begins with a return type followed by method arguments and ends with 0x0.

Shorty syntax:

Shorty -> ReturnType ParamTypeList End
ReturnType -> Type
ParamTypeList -> '' | Type ParamTypeList
Type -> <encoded type>
End -> 0x0

<encoded type> must be one of:

Type Value
void 0x01
u1 0x02
i8 0x03
u8 0x04
i16 0x05
u16 0x06
i32 0x07
u32 0x08
f32 0x09
f64 0x0a
i64 0x0b
u64 0x0c
ref 0x0d
any 0x0e

All shorty elements are divided into groups of 4 elements starting from the beginning. Each group is encoded in uint16_t. Each element is encoded in 4 bits. A group with 4 elements v1, v2, ..., v4 is encoded in uint16_t as follow:

| bits   | 0 - 3 | 4 - 7 | 8 - 11 | 12 - 15 |
| ------ | ----- | ----- | ------ | ------- |
| values | v1    | v2    | v3     | v4      |

If the group contains less then 4 elements the rest bits are filled with 0x0.

Code

Alignment: none

Format:

Name Format Description
num_vregs uleb128 Number of registers (without argument registers).
num_args uleb128 Number of arguments.
code_size uleb128 Size of instructions in bytes.
tries_size uleb128 Number of try blocks.
instructions uint8_t[] Instructions.
try_blocks TryBlock[] Array of try blocks. The array has tries_size elements in TryBlock format.

TryBlock

Alignment: none

Format:

Name Format Description
start_pc uleb128 Start pc of the try block. This pc points to the first instruction covered by this try block.
length uleb128 Number of instructions covered by the try block.
num_catches uleb128 Number of catch blocks associated with the try block.
catch_blocks CatchBlock[] Array of catch blocks associated with the try block. The array has num_catches elements in CatchBlock format. Catch blocks follows in the order runtime must check the exception's type. The catch all block, if present, must be the last.

CatchBlock

Alignment: none

Format:

Name Format Description
type_idx uleb128 Index + 1 of the exception's type the block handles in a ClassRegionIndex structure or 0 in case of catch all block. Corresponding index entry must be an offset to a ForeignClass or to Class. The case when the index is 0 means it is a catch all block which catches all exceptions.
handler_pc uleb128 pc of the first instruction of the exception handler.
code_size uleb128 Handler's code size in bytes

Note: Proper region index to resolve type_idx can be found by corresponding method's offset.

Annotation

Alignment: none

Format:

Name Format Description
class_idx uint16_t Index of the declaring class in a ClassRegionIndex structure. Corresponding index entry must be an offset to a Class or a ForeignClass.
count uint16_t Number of name-value pairs in the annotation (number of elements in elements array).
elements AnnotationElement[] Array of annotation elements. Each element is in AnnotationElement format. Order of elements must be the same as they follow in the annotation class.
element_types uint8_t[] Array of annotation element's types. Each element in the array describes the type of AnnotationElement. The order of elements in the array matches the order of elements field.

Note: Proper region index to resolve class_idx can be found by annotation's offset.

Tags description

Type Tag
u1 1
i8 2
u8 3
i16 4
u16 5
i32 6
u32 7
i64 8
u64 9
f32 A
f64 B
string C
record D
method E
enum F
annotation G
method_handle J
array H
u1[] K
i8[] L
u8[] M
i16[] N
u16[] O
i32[] P
u32[] Q
i64[] R
u64[] S
f32[] T
f64[] U
string[] V
record[] W
method[] X
enum[] Y
annotation[] Z
method_handle[] @
nullptr string *

The correct value for element with nullptr string tag is 0 (\x00\x00\x00\x00)

AnnotationElement

Alignment: none

Format:

Name Format Description
name_off uint32_t Offset to the element's name. The offset must point to a String.
value uint32_t Value of the element. If the annotation element has type boolean, byte, short, char, int or float the field value contains the value itself in the corresponding Value format. Else the field contains offset to a Value. Format of the value could be determined based on element's type.

ParamAnnotations

Alignment: none

Format:

Name Format Description
count uint32_t Number of parameters the method has. This number includes synthetic and mandated parameters.
annotations AnnotationArray[] Array of annotation lists for each parameter. The array has count elements and each element is in AnnotationArray format.

AnnotationArray

Alignment: none

Format:

Name Format Description
count uint32_t Number of elements in the array.
offsets uint32_t[] Array of offsets to the parameter annotations. Each offset must refers to an Annotation. The array has count elements.

Value

There are different value encodings depending on the value's type.

ByteValue

Alignment: None

Format:

Name Format Description
value uint8_t Signed 1-byte integer value.

ShortValue

Alignment: 2 bytes

Format:

Name Format Description
value uint8_t[2] Signed 2-byte integer value.

IntegerValue

Alignment: 4 bytes

Format:

Name Format Description
value uint8_t[4] Signed 4-byte integer value.

LongValue

Alignment: 8 bytes

Format:

Name Format Description
value uint8_t[8] Signed 8-byte integer value.

FloatValue

Alignment: 4 bytes

Format:

Name Format Description
value uint8_t[4] 4-byte bit pattern, zero-extended to the right, and interpreted as an IEEE754 32-bit floating point value.

DoubleValue

Alignment: 8 bytes

Format:

Name Format Description
value uint8_t[8] 8-byte bit pattern, zero-extended to the right, and interpreted as an IEEE754 64-bit floating point value.

StringValue

Alignment: 4 bytes

Format:

Name Format Description
value uint32_t The value represents an offset to String.

EnumValue

Alignment: 4 bytes

Format:

Name Format Description
value uint32_t The value represents an offset to an enum's field. The offset must point to Field or ForeignField.

ClassValue

Alignment: 4 bytes

Format:

Name Format Description
value uint32_t The value represents an offset to Class or ForeignClass.

AnnotationValue

Alignment: 4 bytes

Format:

Name Format Description
value uint32_t The value represents an offset to Annotation.

MethodValue

Alignment: 4 bytes

Format:

Name Format Description
value uint32_t The value represents an offset to Method or ForeignMethod.

MethodHandleValue

Alignment: 4 bytes

Format:

Name Format Description
value uint32_t The value represents an offset to MethodHandle.

MethodTypeValue

Alignment: 4 bytes

Format:

Name Format Description
value uint32_t The value represents an offset to Proto.

ArrayValue

Alignment: None

Format:

Name Format Description
count uleb128 Number of elements in the array.
elements Value[] Unaligned array of Value items. The array has count elements.

Literal

There are different literal encodings depending on the num of value's bytes.

ByteOne

Alignment: None

Format:

Name Format Description
value uint8_t 1-byte value.

ByteTwo

Alignment: 2 bytes

Format:

Name Format Description
value uint8_t[2] 2-byte value.

ByteFour

Alignment: 4 bytes

Format:

Name Format Description
value uint8_t[4] 4-byte value.

ByteEight

Alignment: 8 bytes

Format:

Name Format Description
value uint8_t[8] 8-byte value.

ClassIndex

ClassIndex structure is aimed to allow runtime to find a type definition by name quickly. The structure is organized as an array of offsets from the beginning of the file to the Class or ForeignClass structures. All the offsets are sorted by corresponding class names. Number of elements in the index is num_classes from Header.

Alignment: 4 bytes

Format:

Name Format Description
offsets uint32_t[] Sorted array of offsets to Class or ForeignClass structures. The array must be sorted by class names.

LineNumberProgramIndex

LineNumberProgramIndex structure is aimed to allow use more compact references to Line Number Program. The structure is organized as an array of offsets from the beginning of the file to the Line Number Program structures. Number of elements in the index is num_lnps from Header.

Alignment: 4 bytes

Format:

Name Format Description
offsets uint32_t[] Array of offsets to Line Number Program structures.

ClassRegionIndex

ClassRegionIndex structure is aimed to allow runtime to find a type definition by index. The structure is organized as an array of FieldType. Number of elements in the index is class_idx_size from RegionHeader.

Alignment: 4 bytes

Format:

Name Format Description
types FieldType[] Array of FieldType structures.

MethodRegionIndex

MethodRegionIndex structure is aimed to allow runtime to find a method definition by index. The structure is organized as an array of offsets from the beginning og the file to the Method or the ForeignMethod structure. Number of elements in the index is method_idx_size from RegionHeader.

Alignment: 4 bytes

Format:

Name Format Description
offsets uint32_t[] Array of offsets to Method or ForeignMethod structures.

FieldRegionIndex

FieldRegionIndex structure is aimed to allow runtime to find a field definition by index. The structure is organized as an array of offsets from the beginning og the file to the Field or the ForeignField structure. Number of elements in the index is field_idx_size from RegionHeader.

Alignment: 4 bytes

Format:

Name Format Description
offsets uint32_t[] Array of offsets to Field or ForeignField structures.

ProtoRegionIndex

ProtoRegionIndex structure is aimed to allow runtime to find a proto definition by index. The structure is organized as an array of offsets from the beginning og the file to the Proto structure. Number of elements in the index is proto_idx_size from RegionHeader.

Alignment: 4 bytes

Format:

Name Format Description
offsets uint32_t[] Array of offsets to Proto structures.

LiteralArrayIndex

LiteralArrayIndex structure is aimed to allow runtime to find a LiteralArray definition by index. The structure is organized as an array of offsets from the beginning of the file to the LiteralArray structures. Number of elements in the index is num_literalarrays from Header.

Alignment: 4 bytes

Format:

Name Format Description
offsets uint32_t[] Sorted array of offsets to LiteralArray structures.

DebugInfo

Debug information contains mapping between program counter of a method and line numbers in source code and information about local variables. The format is derived from DWARF Debugging Information Format, Version 3 (see item 6.2). The mapping and local variable information are encoded in line number program which is interpreted by the state machine. To deduplicate the same line number programs of different methods all constants the program refers to are moved into the constant pool.

Alignment: none

Format:

Name Format Description
line_start uleb128 The initial value of line register of the state machine.
num_parameters uleb128 Number of method parameters.
parameters uleb128[] Parameters names of the method. The array has num_parameters elements. Each element is an offset to String or 0 if there is no name.
constant_pool_size uleb128 Size of constant pool in bytes.
constant_pool uleb128[] Constant pool data of length constant_pool_size bytes.
line_number_program_idx uleb128 Line number program index in a LineNumberProgramIndex structure. The program has variable length and ends with DBG_END_SEQUENCE opcode.

Constant pool

Many methods has similar line number program. The difference is only in variable names, variable types and file names. To deduplicate such programs all constants the program refers to are stored in the constant pool. During interpretation of the program the state machine tracks a pointer to the constant pool. When the state machine interprets an instruction which requires a constant argument the machine reads the value from memory constant pool pointer points to and then increments the pointer. Thus programs has no explicit references to constants and could be deduplicated.

State Machine

The aim of the state machine is to generate a mapping between program counter and line numbers and local variable information. The machine has the following registers:

Name Initial value Description
address 0 Program counter (refers to method's instructions). Must only monotonically increase.
line line_start from DebugInfo. Unsigned integer which corresponds to line number in source code. All lines are numbered beginning at 1 so the register mustn't have value less then 1.
file Value of SOURCE_FILE tag in class_data (see Class) or 0 Offset to the name of source file. If there is no such information (SOURCE_FILE tag is absent in Class) then the register has value 0.
prologue_end false The register indicates the current address is one where entry breakpoint of the method could be set.
epilogue_begin false The register indicates the current address is one where exit breakpoint of the method could be set.
constant_pool_ptr Address of the constant_pool's first byte from DebugInfo. Pointer to the current constant value.

Line Number Program

A line number program consists of instructions. Each instruction has one byte opcode and optional arguments. Depending on opcode argument's value may be encoded into the instruction or the instruction requires reading the value from constant pool.

Opcode Value Instruction Format Constant pool arguments Description
END_SEQUENCE 0x00 Marks the end of line number program.
ADVANCE_PC 0x01 uleb128 Increment address register by the value constant_pool_ptr refers to without emitting a line.
ADVANCE_LINE 0x02 sleb128 Increment line register by the value constant_pool_ptr refers to without emitting a line.
START_LOCAL 0x03 sleb128 uleb128 uleb128 Introduce a local variable with name and type the constant_pool_ptr refers to at the current address. The number of the register contains the variable is encoded in the instruction. The register's value -1 means the accumulator register. The name is an offset to String and the type is an offset to ForeignClass or Class. The offsets may be 0 which means the corresponding information is absent.
START_LOCAL_EXTENDED 0x04 sleb128 uleb128 uleb128 uleb128 Introduce a local variable with name, type and type signature the constant_pool_ptr refers to at the current address. The number of the register contains the variable is encoded in the instruction. The register's value -1 means the accumulator register. The name is an offset to String, the type is an offset to ForeignClass or Class and the signature is an offset to TODO: figure out what are signatures. The offsets may be 0 which means the corresponding information is absent.
END_LOCAL 0x05 sleb128 Mark the local variable in the specified register is out of scope. The register number is encoded in the instruction. The register's value -1 means the accumulator register.
RESTART_LOCAL 0x06 sleb128 Re-introduces a local variable at the specified register. The name and type are the same as the last local that was in the register. The register number is encoded in the instruction. The register's value -1 means the accumulator register.
SET_PROLOGUE_END 0x07 Set prologue_end register to true. Any special opcodes clear prologue_end register.
SET_EPILOGUE_BEGIN 0x08 Set epilogue_end register to true. Any special opcodes clear epilogue_end register.
SET_FILE 0x09 uleb128 Set file register to the value constant_pool_ptr refers to. The argument is an offset to String which represents the file name or 0.
SET_SOURCE_CODE 0x0a uleb128 Set source_code register to the value constant_pool_ptr refers to. The argument is an offset to String which represents the source code or 0.
SET_COLUMN 0x0b uleb128 Set column register by the value constant_pool_ptr refers to
Special opcodes 0x0c..0xff

Special opcodes:

The state machine interprets each special opcode as follow (see DWARF Debugging Information Format item 6.2.5.1 Special Opcodes):

  1. Calculate the adjusted opcode: adjusted_opcode = opcode - OPCODE_BASE.
  2. Increment address register: address += adjusted_opcode / LINE_RANGE.
  3. Increment line register: line += LINE_BASE + (adjusted_opcode % LINE_RANGE).
  4. Emit line number.
  5. Set prologue_end register to false.
  6. Set epilogue_begin register to false.

Where:

  • OPCODE_BASE = 0x0c: the first special opcode.
  • LINE_BASE = -4: the smallest line number increment.
  • LINE_RANGE = 15: the number of line increments presented.

MethodHandle

Alignment: none

Format:

Name Format Description
type uint8_t Type of the handle. Must be one of MethodHandle's type.
offset uleb128 Offset to the entity of the corresponding type. Type of the entity is determined depending on handle's type (see Types of MethodHandle).

Types of MethodHandle

The available types of a method handle are:

Name Code Description
PUT_STATIC 0x00 Method handle refers to a static setter. Offset in MethodHandle must point to Field or ForeignField.
GET_STATIC 0x01 Method handle refers to a static getter. Offset in MethodHandle must point to Field or ForeignField.
PUT_INSTANCE 0x02 Method handle refers to an instance getter. Offset in MethodHandle must point to Field or ForeignField.
GET_INSTANCE 0x03 Method handle refers to an instance setter. Offset in MethodHandle must point to Field or ForeignField.
INVOKE_STATIC 0x04 Method handle refers to a static method. Offset in MethodHandle must point to Method or ForeignMethod.
INVOKE_INSTANCE 0x05 Method handle refers to an instance method. Offset in MethodHandle must point to Method or ForeignMethod.
INVOKE_CONSTRUCTOR 0x06 Method handle refers to a constructor. Offset in MethodHandle must point to Method or ForeignMethod.
INVOKE_DIRECT 0x07 Method handle refers to a direct method. Offset in MethodHandle must point to Method or ForeignMethod.
INVOKE_INTERFACE 0x08 Method handle refers to an interface method. Offset in MethodHandle must point to Method or ForeignMethod.

Argument Types

A bootstrap method can accept static arguments of the following types:

Type Code Description
Integer 0x00 The corresponding argument has IntegerValue encoding.
Long 0x01 The corresponding argument has LongValue encoding.
Float 0x02 The corresponding argument has FloatValue encoding.
Double 0x03 The corresponding argument has DoubleValue encoding.
String 0x04 The corresponding argument has StringValue encoding.
Class 0x05 The corresponding argument has ClassValue encoding.
MethodHandle 0x06 The corresponding argument has MethodHandleValue encoding.
MethodType 0x07 The corresponding argument has MethodTypeValue encoding.