docs: Overhaul and organize all of the existing documentation we have (#412)

* docs: Overhaul and organize all of the existing documentation we have

* docs: Autoscroll to top when changing pages
This commit is contained in:
Tyler Wilding 2021-05-02 14:58:22 -04:00 committed by GitHub
parent 7cb04c6cd5
commit 928cb48dd4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
20 changed files with 2626 additions and 2220 deletions

View File

@ -33,6 +33,9 @@
.search a {
color: #c5ccd4 !important;
}
.sidebar li > p {
color: #ffe301;
}
</style>
</head>
<body>
@ -43,10 +46,11 @@
repo: "https://github.com/water111/jak-project",
basePath: "./markdown/",
loadSidebar: true,
subMaxLevel: 2,
subMaxLevel: 3,
logo: "./markdown/imgs/logo-text-colored.png",
notFoundPage: true,
auto2top: true,
search: "auto", // default
// complete configuration parameters
search: {
maxAge: 86400000, // Expiration time, the default one day
@ -57,10 +61,38 @@
depth: 6,
hideOtherSidebarContent: false, // whether or not to hide other sidebar content
},
plugins: [
function(hook, vm) {
hook.beforeEach(function(html) {
if (/githubusercontent\.com/.test(vm.route.file)) {
url = vm.route.file
.replace('raw.githubusercontent.com', 'github.com')
.replace(/\/master/, '/blob/master');
} else {
url =
'https://github.com/water111/jak-project/blob/master/docs' +
vm.route.file;
}
var editHtml = '[:memo: Edit Document](' + url + ')\n';
return (
editHtml +
html
);
})
},
],
};
</script>
<!-- Docsify v4 -->
<script src="https://cdn.jsdelivr.net/npm/docsify@4"></script>
<script src="https://unpkg.com/docsify@4.12.1/lib/plugins/search.min.js"></script>
<script src="https://unpkg.com/docsify-copy-code@2"></script>
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-bash.min.js"></script>
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-c.min.js"></script>
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-cpp.min.js"></script>
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-clojure.min.js"></script>
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-lisp.min.js"></script>
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-scheme.min.js"></script>
<script src="//cdn.jsdelivr.net/npm/prismjs@1/components/prism-nasm.min.js"></script>
</body>
</html>

File diff suppressed because it is too large Load Diff

1
docs/markdown/_404.md Normal file
View File

@ -0,0 +1 @@
# 404 - Doc Page not Found!

View File

@ -1,9 +1,20 @@
* [Home](/README.md)
* [Compiler Example](compiler_example.md)
* [Type System](type_system.md)
* [Emitter](emitter.md)
* [GOAL Debugging](goal_dbg_doc.md)
* [GOOS](goos.md)
* [Object File Generation](object_file_generation.md)
* [Porting to x86](porting_to_x86.md)
* [Registers](registers.md)
- OpenGOAL Reference
- [Overview](/README.md)
- [Type System](/type_system.md)
- [Method System](/method_system.md)
- [Language Syntax & Features](/syntax.md)
- [Standard Library](/lib.md)
- [The Reader](/reader.md)
- [Macro Support](/goos.md)
- [Object File Formats](/object_file_formats.md)
- Working with OpenGOAL
- [The REPL](/repl.md)
- [Debugging](/debugging.md)
- [Editor Configuration](/editor_setup.md)
- Developing OpenGOAL
- [Compiler Walkthrough](/compiler_example.md)
- [Assembly Emitter](/asm_emitter.md)
- [Porting to x86](/porting_to_x86.md)
- [Register Handling](/registers.md)

View File

@ -1,4 +1,5 @@
# Emitter
# Assembly Emitter
x86-64 has a lot of instructions. They are described in Volume 2 of the 5 Volume "Intel® 64 and IA-32 Architectures Software Developers Manual". Just this volume alone is over 2000 pages, which would take forever to fully implement. As a result, we will use only a subset of these instructions. This the rough plan:
- Most instructions like `add` will only be implemented with `r64 r64` versions.

View File

@ -1,18 +1,43 @@
# Compiler Example
This describes how the compiler works on a small piece of code in `example_goal.gc`.
This describes how the compiler works using the following code snippet, saved in a file named `example_goal.gc`.
```lisp
(defun factorial-iterative ((x integer))
(let ((result 1))
(while (!= x 1)
(set! result (* result x))
(set! x (- x 1))
)
result
)
)
;; until we load KERNEL.CGO automatically, we have to do this to
;; make format work correctly.
(define-extern _format function)
(define format _format)
(let ((x 10))
(format #t "The value of ~D factorial is ~D~%" x (factorial-iterative x))
)
```
To run this yourself, start the compiler and runtime, then run:
```
```lisp
(lt)
(asm-file "doc/example_goal.gc" :color :load)
```
And you should see:
```
The value of 10 factorial is 3628800
```
# Overview
## Overview
The code to read from the GOAL REPL is in `Compiler.cpp`, in `Compiler::execute_repl`. Compiling an `asm-file` form will call `Compiler::compile_asm_file` in `CompilerControl.cpp`, which is where we'll start.
I've divided the process into these steps:
@ -25,31 +50,35 @@ I've divided the process into these steps:
7. __Linking__: The runtime links the code so it can be run, and runs it.
8. __Result__: The code prints the message which is sent back to the REPL.
# Reader
The reader converts GOAL/GOOS source into a `goos::Object`. One of the core ideas of lisp is that "code is data", so GOAL code is represented as GOOS data. This makes it easy for GOOS macros to operate on GOAL code. A GOOS object can represent a number, string, pair, etc. This strips out comments/whitespace.
## Reader
The reader converts GOAL/GOOS source into a `goos::Object`. One of the core ideas of lisp is that "code is data", so GOAL code is represented as GOOS data. This makes it easy for GOOS macros to operate on GOAL code. A GOOS object can represent a number, string, pair, etc. This strips out comments/whitespace.
The reader is run with this code:
```
```cpp
auto code = m_goos.reader.read_from_file({filename});
```
If you were to `code.print()`, you would get:
```
```lisp
(top-level (defun factorial-iterative ((x integer)) (let ((result 1)) (while (!= x 1) (set! result (* result x)) (set! x (- x 1))) result)) (define-extern _format function) (define format _format) (let ((x 10)) (format #t "The value of ~D factorial is ~D~%" x (factorial-iterative x))))
```
There are a few details worth mentioning about this process:
- The reader will expand `'my-symbol` to `(quote my-symbol)`
- The reader will throw errors on syntax errors (mismatched parentheses, bad strings/numbers, etc.)
- Using `read_from_file` adds information about where each thing came from to a map stored in the reader. This map is used to determine the source file/line for compiler errors.
# IR Pass
## IR Pass
This pass converts code (represented as a `goos::Object`) into intermediate representation. This is stored in an `Env*`, a tree structure. At the top is a `GlobalEnv*`, then an `FileEnv` for each file compiled, then a `FunctionEnv` for each function in the file. There are environments within `FunctionEnv` that are used for lexical scoping and the other types of GOAL scopes. The Intermediate Representation (IR) is a list per function that's built up in order as the compiler goes through the function. Note that the IR is a list of instructions and doesn't have a tree or more complicated structure. Here's an example of the IR for the example function:
```
```nasm
Function: function-factorial-iterative
mov-ic igpr-2, 1
mov igpr-3, igpr-2
mov-ic igpr-2, 1
mov igpr-3, igpr-2
goto label-10
mov igpr-4, igpr-3
imul igpr-4, igpr-0
@ -84,37 +113,46 @@ Function: function-top-level
mov igpr-16, igpr-11
ret igpr-17 igpr-16
```
The `function-top-level` is the "top level" function, which is everything not in a function. In the example, this is just defining the function, defining `format`, and calling `format`.
The `function-top-level` is the "top level" function, which is everything not in a function. In the example, this is just defining the function, defining `format`, and calling `format`.
You'll notice that there are a ton of `mov`s between `igpr`. The compiler inserts tons of moves. Because this is all done in a single pass, there's a lot of cases where the compiler can't know if a move is needed or not. But the register allocator can figure it out and will remove most unneeded moves. Adding moves can also prevent stack spills. For example, consider the case where you want to get the return value of function `a`, then call function `b`, then call function `c` with the return value of function `a`. If there are a lot of moves, the register allocator can figure out a way to temporarily stash the value in a saved register instead of spilling to the stack.
Another thing to notice is that GOAL nested function calls suck. Example:
```
```lisp
(format #t "The value of ~D factorial is ~D~%" x (factorial-iterative x))
```
requires loading `format`, `#t`, the string, and `x` into registers, then calling `(factorial-iterative x)`, then calling `format`. This has to be done this way, just in case the `factorial-iterative` call modifies the value of `format`, `#t`, or `x`.
## IR Pass Implementation
### IR Pass Implementation
An important type in the compiler is `Val`, which is a specification on how to get a value. A `Val` has an associated GOAL type (`TypeSpec`) and the IR Pass should take care of all type checking. A `Val` can represent a constant, a memory location (relative to pointer, or static data, etc), a spot in an array, a register etc. A `Val` representing a register is a `RegVal`, which contains an `IRegister`. An `IRegister` is a register that's not yet mapped to the hardware, and instead has a unique integer to identify itself. The IR assumes there are infinitely many `IRegister`s, and a later stage maps `IRegister`s to real hardware registers.
The general process starts with a `compile` function, which dispatches other `compile_<thing>` functions as needed. These generally take in `goos::Object` as a code input, emit IR into an `Env`/or modify things in the `Env`, and return a `Val*` describing the result.
The general process starts with a `compile` function, which dispatches other `compile_<thing>` functions as needed. These generally take in `goos::Object` as a code input, emit IR into an `Env`/or modify things in the `Env`, and return a `Val*` describing the result.
In general, GOAL is very greedy and `compile` functions emit IR to do things, then put the result in a register, and return a `RegVal`.
In general, GOAL is very greedy and `compile` functions emit IR to do things, then put the result in a register, and return a `RegVal`.
However, there is an exception for memory related things. Consider
```
```lisp
(-> my-object my-field) ; my_object->my_field in C
```
This shouldn't return a `Val` for a register containing the value of `my_object->my_field`, but should instead return something that represents "the memory location of my_field in my_object". This way you can do
```
This shouldn't return a `Val` for a register containing the value of `my_object->my_field`, but should instead return something that represents "the memory location of my_field in my_object". This way you can do
```lisp
(set! (-> my-object my-field) val)
;; or
(& (-> my-object my-field)) ;; &my_object->my_field
```
and the compiler will have enough information to figure out the memory address.
and the compiler will have enough information to figure out the memory address.
If the compiler actually needs the value of something, and wants to be sure its a value in a register, it will use the `to_reg` method of `Val`. This will emit IR into the current function to get the value in a register, then return a `Val` that represents this register. Example `to_reg` implementation for integer constants:
```
```cpp
RegVal* IntegerConstantVal::to_reg(Env* fe) {
auto rv = fe->make_gpr(m_ts);
fe->emit(std::make_unique<IR_LoadConstant64>(rv, m_value));
@ -122,29 +160,29 @@ RegVal* IntegerConstantVal::to_reg(Env* fe) {
}
```
Note that `to_reg` can emit code like reading from memory, where the order of operations really matters, so you have to be very careful.
It's extremely dangerous to let a memory reference `Val` propagate too far. Consider this example:
```
```lisp
(let ((x (-> my-object my-field)))
(set! (-> my-object my-field) 12)
x
)
```
Where `x` should be the old value of `my-field`. The `Val` for `x` needs to be `to_reg`ed _before_ getting inside the `let`. There's also some potential confusion around the order that you compile and `to_gpr` things. In a case where you need a bunch of values in gprs, you should do the `to_gpr` immediately after compiling to match the exact behavior of the original GOAL. For example
```
```lisp
(+ (-> my-array (* x y)) (some-function))
;; like c++ my_array[x*y] + some_function()
```
When we `compile` the `(-> my-array (* x y))`, it will emit code to calculate the `(*x y)`, but won't actually do the memory access until we call `to_reg` on the result. This memory access should happen __before__ `some-function` is called.
When we `compile` the `(-> my-array (* x y))`, it will emit code to calculate the `(*x y)`, but won't actually do the memory access until we call `to_reg` on the result. This memory access should happen __before__ `some-function` is called.
In general, each time you `compile` something, you should immediately `to_gpr` it, _before_ `compile`ing the next thing. Many places will only accept a `RegVal` as an input to help with this. Also, the result for almost all compilation functions should be `to_reg`ed. The only exceptions are forms which deal with memory references (address of operator, dereference operator) or math.
Another important thing is that compilation functions should _never_ modify any existing `IRegister`s or `Val`s, unless that function is `set!`, which handles the situation correctly. Instead, create new `IRegister`s and move into those. I am planning to implement a `settable` flag to help reduce errors.
Another important thing is that compilation functions should _never_ modify any existing `IRegister`s or `Val`s, unless that function is `set!`, which handles the situation correctly. Instead, create new `IRegister`s and move into those. I am planning to implement a `settable` flag to help reduce errors.
For example:
- `RegVal` storing a local variable: is `settable`, you can modify local variables by writing to the register they use.
@ -153,29 +191,36 @@ For example:
The only settable `RegVal` is one corresponding to a local variable.
## Following the Code
### Following the Code
This pass runs from here:
```
```cpp
auto obj_file = compile_object_file(obj_file_name, code, !no_code);
```
That function sets up a `FileEnv*`, then runs
```
```cpp
file_env->add_top_level_function(
compile_top_level_function("top-level", std::move(code), compilation_env));
```
which compiles the body of the function with:
```
which compiles the body of the function with:
```cpp
auto result = compile_error_guard(code, fe.get());
```
The `compile_error_guard` function takes in code (as a `goos::Object`) and a `Env*`, and returns a `Val` representing the return value of the code. It calls the `compile` function, but wraps it in a `try catch` block to catch any compilation errors and print an error message. In the case where there's no error, it just does:
```
```cpp
return compile(code, env);
```
The `compile` function is pretty simple:
```
```cpp
/*!
* Highest level compile function
*/
@ -197,8 +242,10 @@ Val* Compiler::compile(const goos::Object& code, Env* env) {
return get_none();
}
```
In our case, the code starts with `(defun..`, which is actually a GOOS macro. It throws away the docstring, creates a lambda, then stores the function in a symbol:
```
```lisp
;; Define a new function
(defmacro defun (name bindings &rest body)
(if (and
@ -214,7 +261,8 @@ In our case, the code starts with `(defun..`, which is actually a GOOS macro. It
```
The compiler notices this is a macro in `compile_pair`:
```
```cpp
if (head.is_symbol()) {
// ...
@ -225,10 +273,11 @@ The compiler notices this is a macro in `compile_pair`:
// ...
}
```
```
The `compile_goos_macro` function sets up a GOOS environment and interprets the GOOS macro to generate more GOAL code:
```
```cpp
Val* Compiler::compile_goos_macro(const goos::Object& o,
const goos::Object& macro_obj,
const goos::Object& rest,
@ -243,10 +292,12 @@ Val* Compiler::compile_goos_macro(const goos::Object& o,
return compile_error_guard(goos_result, env); // compile resulting GOAL code
}
```
and the last line of that function compiles the result of macro expansion in GOAL.
and the last line of that function compiles the result of macro expansion in GOAL.
As an example, I'm going to look at `compile_add`, which handles the `+` form, and is representative of typical compiler code for this part. We start by checking that the arguments look valid:
```
```cpp
Val* Compiler::compile_add(const goos::Object& form, const goos::Object& rest, Env* env) {
auto args = get_va(form, rest); // get arguments to + in a list
if (!args.named.empty() || args.unnamed.empty()) {
@ -255,19 +306,23 @@ Val* Compiler::compile_add(const goos::Object& form, const goos::Object& rest, E
```
Then we compile the first thing in the `(+ ...` form, get its type, and pick a math mode (int, float):
```
```cpp
auto first_val = compile_error_guard(args.unnamed.at(0), env);
auto first_type = first_val->type();
auto math_type = get_math_mode(first_type);
```
In the integer case, we first create a new variable in the IR called an `IRegister` that must be in a GPR (as opposed to an XMM floating point register), and then emit an IR instruction that sets this result register to the first argument.
```
```cpp
auto result = env->make_gpr(first_type);
env->emit(std::make_unique<IR_RegSet>(result, first_val->to_gpr(env)));
```
Then, for each of the remaining arguments, we do:
```
```cpp
for (size_t i = 1; i < args.unnamed.size(); i++) {
env->emit(std::make_unique<IR_IntegerMath>(
IntegerMathKind::ADD_64, result,
@ -275,16 +330,18 @@ Then, for each of the remaining arguments, we do:
->to_gpr(env)));
}
```
which emits an IR to add the value to the sum. The `to_math_type` will emit any code needed to convert this to the correct numeric type (returns either a numeric constant or a `RegVal` containing the value).
An important detail is that we create a new register which will hold the result. This may seem inefficient in cases, but a later compile pass will try to make this new register be the same register as `first_val` if possible, and will eliminate the `IR_RegSet`.
# Register Allocation
## Register Allocation
This step figures out how to match up `IRegister`s to real hardware registers. In the case where there aren't enough hardware registers, it figures out how to "spill" variables onto the stack. The current implementation is a very greedy one, so it doesn't always succeed at doing things perfectly. The stack spilling is also not handled very efficiently, but the hope is that most functions won't require stack spilling.
This step is run from `compile_asm_function` on the line:
```
```cpp
color_object_file(obj_file);
void Compiler::color_object_file(FileEnv* env) {
@ -305,15 +362,16 @@ void Compiler::color_object_file(FileEnv* env) {
```
The actual algorithm is too complicated to describe here, but it figures out a mapping from `IRegister`s to hardware registers. It also figures out how much space on the stack is needed for any stack spills, which saved registers will be used, and deals with aligning the stack.
# Code Generation
## Code Generation
This part actually generates the static data and x86 instructions and stores them in an `ObjectGenerator`. See `CodeGenerator::do_function`. It emits the function prologue and epilogue, as well as any extra loads/stores from the stack that the register allocator added. Each `IR` gets to emit instructions with:
```
```cpp
ir->do_codegen(&m_gen, allocs, i_rec);
```
Each IR has its own `do_codegen` that emits the right instruction, and also any linking data that's needed. For example, instructions that access the symbol table are patched by the runtime to directly access the correct slot of the hash table, so the `do_codegen` also lets the `ObjectGenerator` know about this link:
```
```cpp
// IR_GetSymbolValue::do_codegen
// look at register allocation result to determine hw register
@ -326,19 +384,23 @@ auto instr = gen->add_instr(IGen::load32u_gpr64_gpr64_plus_gpr64_plus_s32(
// add link info
gen->link_instruction_symbol_mem(instr, m_src->name());
```
here `0xbadbeef` is used as a placeholder offset - the runtime should patch this to the actual offset of the symbol.
There's a ton of book-keeping to figure out the correct offsets for `rip`-relative addressing, or how to deal with jumps to/from IR which become multiple (or zero!) x86-64 instructions. It should all be handled by `ObjectFileGenerator`, and not in `do_codegen` or `CodeGenerator`.
# Object File Generation
## Object File Generation
Once the `CodeGenerator` is done going through all functions and static data, it runs:
```
```cpp
return m_gen.generate_data_v3().to_vector();
```
This actually lays out everything in memory. It takes a few passes because x86 instructions are variable length (may even change based on which registers are used!), so it's a little bit tricky to figure out offsets between different instructions or instructions and data. Finally it generates link data tables, which efficiently pack together links to the same symbols into a single entry, to avoid duplicated symbol names. The link table also contains information about linking references in between different segments, as different parts of the object file may be loaded into different spots in memory, and will need to reference each other.
This is the final result for top-level function (stored in top-level segment)
```
```nasm
;; prologue
push rbx
push rbp
@ -400,11 +462,13 @@ pop rbp
pop rbx
ret
```
and the factorial function (stored in main segment)
```
```nasm
mov eax,0x1
jmp 0x18
mul eax,edi
mul eax,edi
movsxd rax,eax
mov ecx,0x1
sub rdi,rcx
@ -414,11 +478,14 @@ jne 0xa
ret
```
# Sending and Receiving
## Sending and Receiving
The result of `codegen_object_file` is sent with:
```
```cpp
m_listener.send_code(data);
```
which adds a message header then just sends the code over the socket.
The receive process is complicated, so this is just a quick summary
@ -427,14 +494,15 @@ The receive process is complicated, so this is just a quick summary
- Once fully received, `WaitForMessageAndAck` will return a pointer to the message. The name of this function is totally wrong, it doesn't wait for a message, and it doesn't ack the message.
- The main loop in `KernelCheckAndDispatch` will see this message and run `ProcessListenerMessage`
- `ProcessListenerMessage` sees that it has code, copies the message to the debug heap, and links it.
```
```cpp
auto buffer = kmalloc(kdebugheap, MessCount, 0, "listener-link-block");
memcpy(buffer.c(), msg.c(), MessCount);
ListenerLinkBlock->value = buffer.offset + 4;
ListenerFunction->value = link_and_exec(buffer, "*listener*", 0, kdebugheap, LINK_FLAG_FORCE_DEBUG).offset;
```
- The `link_and_exec` function doesn't actually execute anything because it doesn't have the `LINK_FLAG_EXECUTE` set, it just links things. It moves the top level function and linking data to the top of the heap (temporary storage for the kernel) and keep both the main segment and debug segment of the code on the debug heap. It'll move them together and eliminate gaps before linking. After linking, the `ListenerFunction->value` will contain a pointer to the top level function, which is stored in the top temp area of the heap. This `ListenerFunction` is the GOAL `*listener-function*` symbol.
- The next time the GOAL kernel runs, it will notice that `*listener-function*` is set, then call this function, then set it to `#f` to indicate it called the function.
- This
- After this, `ClearPending()` is called, which sends all of the `print` messages with the `Deci2Server` back to the compiler.
- Because the GOAL kernel changed `ListenerFunction` to `#f`, it does a `SendAck()` to send a special `ACK` message to the compiler, saying "I got the function, ran it, and didn't crash. Now I'm ready for more messages."

View File

@ -1,10 +1,15 @@
# OpenGOAL Debugger
Currently the debugger only works on Linux. All the platform specific stuff is in `xdbg.cpp`.
## `(dbs)`
## Commands
### `(dbs)`
Print the status of the debugger and listener. The listener status is whether or not there is a socket connection open between the compiler and the target. The "debug context" is information that the runtime sends to the compiler so it can find the correct thread to debug. In order to debug, you need both.
## `(dbg)`
### `(dbg)`
Attach the debugger. This will stop the target.
Example of connecting to the target for debugging:
@ -37,28 +42,35 @@ gc> (dbs)
Context: valid = true, s7 = 0x147d24, base = 0x2000000000, tid = 1062568
```
## `(:cont)`
### `(:cont)`
Continue the target if it has been stopped.
## `(:break)`
### `(:break)`
Immediately stop the target if it is running. Will print some registers.
## `(:dump-all-mem <path>)`
### `(:dump-all-mem <path>)`
Dump all GOAL memory to a file. Must be stopped.
```
```lisp
(:dump-all-mem "mem.bin")
```
The path is relative to the Jak project folder.
The file will be the exact size of `EE_MAIN_MEM_SIZE`, but the first `EE_LOW_MEM_PROTECT` bytes are zero, as these cannot be written or read.
## Address Spec
Anywhere an address can be used, you can also use an "address spec", which gives you easier ways to input addresses. For now, the address spec is pretty simple, but there will be more features in the future.
- `(sym-val <sym-name>)`. Get the address stored in the symbol with the given name. Currently there's no check to see if the symbol actually stores an address or not. This is like "evaluate `<sym-name>`, then treat the value as an address"
- `(sym <sym-name>)`. Get the address of the symbol object itself, including the basic offset.
Example to show the difference:
```lisp
;; the symbol is at 0x142d1c
@ -88,30 +100,31 @@ gc> (inspect *kernel-context*)
;; break, so we can debug
gc> (:break)
Read symbol table (159872 bytes, 226 reads, 225 symbols, 1.96 ms)
rax: 0xfffffffffffffdfc rcx: 0x00007f745b508361 rdx: 0x00007f745b3ffca0 rbx: 0x0000000000147d24
rsp: 0x00007f745b3ffc40 rbp: 0x00007f745b3ffcc0 rsi: 0x0000000000000000 rdi: 0x0000000000000000
r8: 0x0000000000000000 r9: 0x0000000000000008 r10: 0x00007f745b3ffca0 r11: 0x0000000000000293
r12: 0x0000000000147d24 r13: 0x00007ffdff32cfaf r14: 0x00007ffdff32cfb0 r15: 0x00007f745b3fffc0
rax: 0xfffffffffffffdfc rcx: 0x00007f745b508361 rdx: 0x00007f745b3ffca0 rbx: 0x0000000000147d24
rsp: 0x00007f745b3ffc40 rbp: 0x00007f745b3ffcc0 rsi: 0x0000000000000000 rdi: 0x0000000000000000
r8: 0x0000000000000000 r9: 0x0000000000000008 r10: 0x00007f745b3ffca0 r11: 0x0000000000000293
r12: 0x0000000000147d24 r13: 0x00007ffdff32cfaf r14: 0x00007ffdff32cfb0 r15: 0x00007f745b3fffc0
rip: 0x00007f745b508361
;; reads the symbol's memory:
;; at 0x142d1c there is the value 0x164a84
gc> (dw (sym *kernel-context*) 1)
0x00142d1c: 0x00164a84
0x00142d1c: 0x00164a84
;; treat the symbol's value as an address and read the memory there.
;; notice that the 0x41 in the first word is decimal 65, the first field of the kernel-context.
gc> (dw (sym-val *kernel-context*) 10)
0x00164a84: 0x00000041 0x00000000 0x00000000 0x00000002
0x00164a94: 0x70004000 0x00147d24 0x00147d24 0x00000000
0x00164aa4: 0x00000000 0x00000000
0x00164a84: 0x00000041 0x00000000 0x00000000 0x00000002
0x00164a94: 0x70004000 0x00147d24 0x00147d24 0x00000000
0x00164aa4: 0x00000000 0x00000000
```
## `(:pm)`
### `(:pm)`
Print memory
```
```lisp
(:pm elt-size addr elt-count [:print-mode mode])
```
@ -119,7 +132,6 @@ The element size is the size of each word to print. It can be 1, 2, 4, 8 current
There are some useful macros inspired by the original PS2 TOOL debugger (`dsedb`) for the different sizes. They are `db`, `dh`, `dw`, and `dd` for 1, 2, 4, and 8 byte hex prints which follows the naming convention of MIPS load/stores. There is also a `df` for printing floats. See the example below.
```lisp
OpenGOAL Compiler 0.1
@ -144,28 +156,28 @@ gc> (set! (-> x 2) 2.0)
;; attach the debugger (halts the target)
gc> (dbg)
[Debugger] PTRACE_ATTACHED! Waiting for process to stop...
rax: 0xfffffffffffffdfc rcx: 0x00007f6b94964361 rdx: 0x00007f6b8fffeca0 rbx: 0x0000000000147d24
rsp: 0x00007f6b8fffec40 rbp: 0x00007f6b8fffecc0 rsi: 0x0000000000000000 rdi: 0x0000000000000000
r8: 0x0000000000000000 r9: 0x000000000000000b r10: 0x00007f6b8fffeca0 r11: 0x0000000000000293
r12: 0x0000000000147d24 r13: 0x00007ffd16fb117f r14: 0x00007ffd16fb1180 r15: 0x00007f6b8fffefc0
rax: 0xfffffffffffffdfc rcx: 0x00007f6b94964361 rdx: 0x00007f6b8fffeca0 rbx: 0x0000000000147d24
rsp: 0x00007f6b8fffec40 rbp: 0x00007f6b8fffecc0 rsi: 0x0000000000000000 rdi: 0x0000000000000000
r8: 0x0000000000000000 r9: 0x000000000000000b r10: 0x00007f6b8fffeca0 r11: 0x0000000000000293
r12: 0x0000000000147d24 r13: 0x00007ffd16fb117f r14: 0x00007ffd16fb1180 r15: 0x00007f6b8fffefc0
rip: 0x00007f6b94964361
Debugger connected.
;; print memory as 10 bytes
gc> (db 1452224 10)
0x001628c0: 0x00 0x00 0x80 0x3f 0x00 0x00 0x00 0x00 0x00 0x00
0x001628c0: 0x00 0x00 0x80 0x3f 0x00 0x00 0x00 0x00 0x00 0x00
;; print memory as 10 words (32-bit words)
gc> (dw 1452224 10)
0x001628c0: 0x3f800000 0x00000000 0x40000000 0x00000000
0x001628d0: 0x00000000 0x00000000 0x00000000 0x00000000
0x001628e0: 0x00000000 0x00000000
0x001628c0: 0x3f800000 0x00000000 0x40000000 0x00000000
0x001628d0: 0x00000000 0x00000000 0x00000000 0x00000000
0x001628e0: 0x00000000 0x00000000
;; print memory as 10 floats
gc> (df 1452224 10)
0x001628c0: 1.0000 0.0000 2.0000 0.0000
0x001628d0: 0.0000 0.0000 0.0000 0.0000
0x001628e0: 0.0000 0.0000
0x001628c0: 1.0000 0.0000 2.0000 0.0000
0x001628d0: 0.0000 0.0000 0.0000 0.0000
0x001628e0: 0.0000 0.0000
;; set some more values, must unbreak first
gc> (:cont)
@ -174,31 +186,32 @@ gc> (set! (-> x 1) (the-as float -12))
;; break and print as decimal
gc> (:break)
rax: 0xfffffffffffffdfc rcx: 0x00007f6b94964361 rdx: 0x00007f6b8fffeca0 rbx: 0x0000000000147d24
rsp: 0x00007f6b8fffec40 rbp: 0x00007f6b8fffecc0 rsi: 0x0000000000000000 rdi: 0x0000000000000000
r8: 0x0000000000000000 r9: 0x0000000000000004 r10: 0x00007f6b8fffeca0 r11: 0x0000000000000293
r12: 0x0000000000147d24 r13: 0x00007ffd16fb117f r14: 0x00007ffd16fb1180 r15: 0x00007f6b8fffefc0
rax: 0xfffffffffffffdfc rcx: 0x00007f6b94964361 rdx: 0x00007f6b8fffeca0 rbx: 0x0000000000147d24
rsp: 0x00007f6b8fffec40 rbp: 0x00007f6b8fffecc0 rsi: 0x0000000000000000 rdi: 0x0000000000000000
r8: 0x0000000000000000 r9: 0x0000000000000004 r10: 0x00007f6b8fffeca0 r11: 0x0000000000000293
r12: 0x0000000000147d24 r13: 0x00007ffd16fb117f r14: 0x00007ffd16fb1180 r15: 0x00007f6b8fffefc0
rip: 0x00007f6b94964361
gc> (:pm 4 1452224 10 :print-mode unsigned-dec)
0x001628c0: 1065353216 4294967284 1073741824 0
0x001628d0: 0 0 0 0
0x001628e0: 0 0
0x001628c0: 1065353216 4294967284 1073741824 0
0x001628d0: 0 0 0 0
0x001628e0: 0 0
gc> (:pm 4 1452224 10 :print-mode signed-dec)
0x001628c0: 1065353216 -12 1073741824 0
0x001628d0: 0 0 0 0
0x001628e0: 0 0
0x001628c0: 1065353216 -12 1073741824 0
0x001628d0: 0 0 0 0
0x001628e0: 0 0
```
### `(:disasm)`
## `(:disasm)`
Disassembly instructions in memory
```
```lisp
(:disasm addr len)
```
Example (after doing a `(lt)`, `(blg)`, `(dbg)`):
```asm
```nasm
gc> (:disasm (sym-val basic-type?) 80)
[0x2000162ae4] mov eax, [r15+rdi*1-0x04]
[0x2000162ae9] mov ecx, [r15+r14*1+0x38]
@ -216,14 +229,13 @@ gc> (:disasm (sym-val basic-type?) 80)
[0x2000162b24] jnz 0x0000002000162AF4
[0x2000162b2a] mov eax, [r15+r14*1]
[0x2000162b32] ret
```
For now, the disassembly is pretty basic, but it should eventually support GOAL symbols.
## Breakpoints
```
```lisp
OpenGOAL Compiler 0.1
;; first, connect to the target
@ -247,10 +259,10 @@ gc > (dbg)
[Debugger] PTRACE_ATTACHED! Waiting for process to stop...
Target has stopped. Run (:di) to get more information.
Read symbol table (146816 bytes, 124 reads, 123 symbols, 2.02 ms)
rax: 0x000000000000000a rcx: 0x0000000000000005 rdx: 0x0000000000000000 rbx: 0x0000002000000000
rsp: 0x00007fddcde75c58 rbp: 0x00007fddcde75cc0 rsi: 0x0000000000000000 rdi: 0x0000000000000000
r8: 0x0000000000147d24 r9: 0x0000002000000000 r10: 0x00007fddcde75ca0 r11: 0x0000000000000000
r12: 0x0000000000147d24 r13: 0x0000002007ffbf14 r14: 0x0000000000147d24 r15: 0x0000002000000000
rax: 0x000000000000000a rcx: 0x0000000000000005 rdx: 0x0000000000000000 rbx: 0x0000002000000000
rsp: 0x00007fddcde75c58 rbp: 0x00007fddcde75cc0 rsi: 0x0000000000000000 rdi: 0x0000000000000000
r8: 0x0000000000147d24 r9: 0x0000002000000000 r10: 0x00007fddcde75ca0 r11: 0x0000000000000000
r12: 0x0000000000147d24 r13: 0x0000002007ffbf14 r14: 0x0000000000147d24 r15: 0x0000002000000000
rip: 0x0000002007ffbf3b
[0x2007ffbf1b] add [rax], al
[0x2007ffbf1d] add [rcx+0x02], bh
@ -292,10 +304,10 @@ Target has stopped. Run (:di) to get more information.
;; get some info:
gcs> (:di)
Read symbol table (146816 bytes, 124 reads, 123 symbols, 1.46 ms)
rax: 0x0000000000000015 rcx: 0x0000000000000007 rdx: 0x0000000000000000 rbx: 0x0000002000000000
rsp: 0x00007fddcde75c58 rbp: 0x00007fddcde75cc0 rsi: 0x0000000000000000 rdi: 0x0000000000000000
r8: 0x0000000000147d24 r9: 0x0000002000000000 r10: 0x00007fddcde75ca0 r11: 0x0000000000000000
r12: 0x0000000000147d24 r13: 0x0000002007ffbf14 r14: 0x0000000000147d24 r15: 0x0000002000000000
rax: 0x0000000000000015 rcx: 0x0000000000000007 rdx: 0x0000000000000000 rbx: 0x0000002000000000
rsp: 0x00007fddcde75c58 rbp: 0x00007fddcde75cc0 rsi: 0x0000000000000000 rdi: 0x0000000000000000
r8: 0x0000000000147d24 r9: 0x0000002000000000 r10: 0x00007fddcde75ca0 r11: 0x0000000000000000
r12: 0x0000000000147d24 r13: 0x0000002007ffbf14 r14: 0x0000000000147d24 r15: 0x0000002000000000
rip: 0x0000002007ffbf4c
[0x2007ffbf2c] add eax, ecx
[0x2007ffbf2e] mov ecx, 0x04
@ -334,16 +346,16 @@ gcs> (:ubp #x2007ffbf4b)
;; continue, it stays running
gcs> (:cont)
gcr>
gcr>
;; break and check, the code is back to normal!
gcr> (:break)
Target has stopped. Run (:di) to get more information.
Read symbol table (146816 bytes, 124 reads, 123 symbols, 1.28 ms)
rax: 0x0000000000000015 rcx: 0x0000000000000007 rdx: 0x0000000000000000 rbx: 0x0000002000000000
rsp: 0x00007fddcde75c58 rbp: 0x00007fddcde75cc0 rsi: 0x0000000000000000 rdi: 0x0000000000000000
r8: 0x0000000000147d24 r9: 0x0000002000000000 r10: 0x00007fddcde75ca0 r11: 0x0000000000000000
r12: 0x0000000000147d24 r13: 0x0000002007ffbf14 r14: 0x0000000000147d24 r15: 0x0000002000000000
rax: 0x0000000000000015 rcx: 0x0000000000000007 rdx: 0x0000000000000000 rbx: 0x0000002000000000
rsp: 0x00007fddcde75c58 rbp: 0x00007fddcde75cc0 rsi: 0x0000000000000000 rdi: 0x0000000000000000
r8: 0x0000000000147d24 r9: 0x0000002000000000 r10: 0x00007fddcde75ca0 r11: 0x0000000000000000
r12: 0x0000000000147d24 r13: 0x0000002007ffbf14 r14: 0x0000000000147d24 r15: 0x0000002000000000
rip: 0x0000002007ffbf4b
[0x2007ffbf2b] add rax, rcx
[0x2007ffbf2e] mov ecx, 0x04
@ -375,11 +387,11 @@ rip: 0x0000002007ffbf4b
[0x2007ffbf86] add [rbx], al
[0x2007ffbf88] add [rax], al
gcs>
gcs>
;; we can still properly exit from the target, even in this state!
gcs> (e)
Tried to reset a halted target, detaching...
Error - target has timed out. If it is stuck in a loop, it must be manually killed.
[Listener] Closed connection to target
```
```

View File

@ -1,3 +1,10 @@
# Editor Configuration
## EMacs
The following EMacs config file should get you started and configure OpenGOAL's formatting style
```lisp
;; make gc files use lisp-mode
(add-to-list 'auto-mode-alist '("\\.gc\\'" . lisp-mode))
;; run setup-goal when we enter lisp mode
@ -22,3 +29,5 @@
(setq-default indent-tabs-mode nil)
)
)
```

View File

@ -1,18 +0,0 @@
(defun factorial-iterative ((x integer))
(let ((result 1))
(while (!= x 1)
(set! result (* result x))
(set! x (- x 1))
)
result
)
)
;; until we load KERNEL.CGO automatically, we have to do this to
;; make format work correctly.
(define-extern _format function)
(define format _format)
(let ((x 10))
(format #t "The value of ~D factorial is ~D~%" x (factorial-iterative x))
)

View File

@ -1,10 +1,10 @@
Goos
-----
GOOS is a macro language for GOAL. It is a separate language. Files written in GOAL end in `.gc` and files written in GOOS end in `.gs`. The REPL will display a `goos>` prompt for GOOS and `goal>` for GOAL.
# GOOS
GOOS is a macro language for GOAL. It is a separate language. Files written in GOAL end in `.gc` and files written in GOOS end in `.gs`. The REPL will display a `goos>` prompt for GOOS and `gc >` for GOAL.
There is a special namespace shared between GOOS and GOAL containing the names of the macros (written in GOOS) which can be used in GOAL code.
To access a GOOS REPL, run `(goos)` from the `goal>` prompt (note, currently this happens by default as GOAL is not implemented).
To access a GOOS REPL, run `(goos)` from the `gc >` prompt.
This document assumes some familiarity with the Scheme programming language. It's recommended to read a bit about Scheme first.
@ -13,28 +13,34 @@ Note that most Scheme things will work in GOOS, with the following exceptions:
- The short form for defining functions is `(desfun function-name (arguments) body...)`
- GOOS does not have tail call optimization and prefers looping to recursion (there is a `while` form)
## Special Forms
Special Forms
---------------
Most forms in Scheme have a name, and list of arguments. Like:
```
```scheme
(my-operation first-argument second-argument ...)
```
Usually, each argument is evaluated, then passed to the operation, and the resulting value is returned. However, there are cases where not all arguments are evaluated. For example:
```
```scheme
(if (want-x?)
(do-x)
(do-y)
)
```
In this case, only one of `(do-x)` and `(do-y)` are executed. This doesn't follow the pattern of "evaluate all arguments...", so it is a *SPECIAL FORM*. It's not possible for a function call to be a special form - GOOS will automatically evaluate all arguments. It is possible to build macros which act like special forms. There are some special forms built-in to the GOOS interpreter, which are documented in this section.
### define
This is used to define a value in the current lexical environment.
For example:
```
```scheme
(define x 10)
```
will define `x` as a variable equal to `10` in the inner-most lexical environment. (Note, I'm not sure this is actually how Scheme works)
There is an optional keyword argument to pick the environment for definition, but this is used rarely. The only named environments are:
@ -42,15 +48,17 @@ There is an optional keyword argument to pick the environment for definition, bu
- `*global-env*`
Example:
```
```scheme
(define :env *global-env* x 10)
```
will define `x` in the global (outer-most) environment, regardless of where the `define` is written.
will define `x` in the global (outer-most) environment, regardless of where the `define` is written.
### quote
This form simply returns its argument without evaluating it. There's a reader shortcut for this:
```
```scheme
(quote x)
;; reader shortcut
@ -58,7 +66,8 @@ This form simply returns its argument without evaluating it. There's a reader s
```
It's often used to get a symbol, but you can quote complex things like lists, pairs, and arrays.
```
```scheme
goos> (cdr '(1 . 2))
2
goos> (cdr '(1 2))
@ -69,16 +78,19 @@ goos> '#(1 2 3)
### set!
Set is used to search for a variable in the enclosing environments, then set it to a value.
```
```scheme
(set! x (+ 1 2))
```
will set the lexically closest `x` to 3. It's an error if there's no variable named `x` in an enclosing scope.
### lambda
See any Lisp/Scheme tutorial for a good explanation of `lambda`.
Note that GOOS has some extensions for arguments. You can have a "rest" argument at the end, like this:
```
```scheme
(lambda (a b &rest c) ...) ;; c is the rest arg
(lambda (&rest a) ...) ;; a is the rest
```
@ -86,13 +98,15 @@ Note that GOOS has some extensions for arguments. You can have a "rest" argumen
The rest argument will contain a list of all extra arguments passed to the function. If there are no extra arguments, the rest argument will be the empty list.
There are also keyword arguments, which contain a `&key` before the argument name.
```
```scheme
(lambda (a b &key c) ...) ;; b is a keyword argument, a and c are not.
(lambda (&key a &key b) ...) ;; a and b are keyword arguments
```
These keyword arguments _must_ be specified by name. So to call the two functions above:
```
```scheme
(f 1 2 :c 3) ;; a = 1, b = 2, c = 3
(f :a 1 : b 2) ;; a = 1, b = 2
```
@ -100,7 +114,8 @@ These keyword arguments _must_ be specified by name. So to call the two function
Note that it is not required to put keyword arguments last, but it is a good idea to do it for clarity.
There are also keyword default arguments, which are like keyword arguments, but can be omitted from the call. In this case a default value is used instead.
```
```scheme
(lambda (&key (c 12)) ...)
(f :c 2) ;; c = 2
(f) ;; c = 12
@ -123,7 +138,7 @@ Short circuiting `or`. If nothing is truthy, `#f`. Otherwise returns first truth
Short circuiting `or`. If not all truthy, `#f`. Otherwise returns last truthy.
### macro
Kind of like `lambda`, but for a macro.
Kind of like `lambda`, but for a macro.
A lambda:
- Evaluate the arguments
@ -149,7 +164,8 @@ Special while loop form for GOOS.
`(while condition body...)`
To add together `[0, 100)`:
```
```scheme
(define count 0)
(define sum 0)
@ -161,16 +177,18 @@ To add together `[0, 100)`:
sum
```
Not Special Built-In Forms
---------------------------
## Not Special Built-In Forms
TODO - None at this time
Namespace Details
------------------
## Namespace Details
The GOOS `define` form accepts an environment for definition. For example:
```
```scheme
(define :env *goal-env* x 10)
```
will define `x` in the `*goal-env*`. Any macros defined in the `*goal-env*` can be used as macros from within GOAL code.
Things that aren't macros in the `*goal-env*` cannot be accessed.

1235
docs/markdown/lib.md Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,147 @@
# OpenGOAL's Method System
OpenGOAL has a virtual method system. This means that child types can override parent methods. The first argument to a method is always the object the method is being called on, except for `new`.
All types have methods. Objects have access to all of their parents methods, and may override parent methods. All types have these 9 methods:
- `new` - like a constructor, returns a new object. It's not used in all cases, and on all types, and needs more documentation on when specifically it is used.
- `delete` - basically unused, but like a destructor. Often calls `kfree`, which does nothing.
- `print` - prints a short, one line representation of the object to the `PrintBuffer`
- `inspect` - prints a multi-line description of the object to the `PrintBuffer`. Usually auto-generated by the compiler and prints out the name and value of each field.
- `length` - Returns a length if the type has something like a length (number of characters in string, etc). Otherwise returns 0. Usually returns the number of filled slots, instead of the total number of allocated slots, when there is possibly a difference.
- `asize-of` - Gets the size in memory of the entire object. Usually this just looks this up from the appropriate `type`, unless it's dynamically sized.
- `copy` - Create a copy of this object on the given heap. Not used very much?
- `relocate` - Some GOAL objects will be moved in memory by the kernel as part of the compacting actor heap system. After being moved, the `relocate` method will be called with the offset of the move, and the object should fix up any internal pointers which may point to the old location. It's also called on v2 objects loaded by the linker when they are first loaded into memory.
- `memusage` - Not understood yet, but probably returns how much memory in bytes the object uses. Not supported by all objects.
Usually a method which overrides a parent method must have the same argument and return types. The only exception is `new` methods, which can have different argument/return types from the parent. (Dee the later section on `_type_` for another exception)
The compiler's implementation for calling a method is:
- Is the type a basic?
- If so, look up the type using runtime type information
- Get the method from the vtable
- Is the type not a basic?
- Get the method from the vtable of the compile-time type
- Note that this process isn't very efficient - instead of directly linking to the slot in the vtable (one deref) it first looks up the `type` by symbol, then the slot (two derefs). I have no idea why it's done this way.
In general, I suspect that the method system was modified after GOAL was first created. There is some evidence that types were once stored in the symbol table, but were removed because the symbol table became full. This could explain some of the weirdness around method calls/definition rules, and the disaster `method-set!` function.
All type definitions should also define all the methods, in the order they appear in the vtable. I suspect GOAL had this as well because the method ordering otherwise seems random, and in some cases impossible to get right unless (at least) the number of methods was specified in the type declaration.
## Special `_type_` Type
The first argument of a method always contains the object that the method is being called on. It also must have the type `_type_`, which will be substituted by the type system (at compile time) using the following rules:
- At method definition: replace with the type that the method is being defined for.
- At method call: replace with the compile-time type of the object the method is being called on.
The type system is flexible with allowing you to use `_type_` in the method declaration in `deftype`, but not using `_type_` in the actual `defmethod`.
A method can have other arguments or a return value that's of type `_type_`. This special "type" will be replaced __at compile time__ with the type which is defining or calling the method. No part of this exists at runtime. It may seem weird, but there are two uses for this.
The first is to allow children to specialize methods and have their own child type as an argument type. For example, say you have a method `is-same-shape`, which compares two objects and sees if they are the same shape. Suppose you first defined this for type `square` with
```
(defmethod square is-same-shape ((obj1 square) (obj2 square))
(= (-> obj1 side-length) (-> obj2 side-length))
)
```
Then, if you created a child class of `square` called `rectangle` (this is a terrible way to use inheritance, but it's just an example), and overrode the `is-same-shape` method, you would have to have arguments that are `square`s, which blocks you from accessing `rectangle`-specific fields. The solution is to define the original method with type `_type_` for the first two arguments. Then, the method defined for `rectangle` also will have arguments of type `_type_`, which will expand to `rectangle`.
The second use is for a return value. For example, the `print` and `inspect` methods both return the object that is passed to them, which will always be the same type as the argument passed in. If `print` was define as `(function object object)`, then `(print my-square)` would lose the information that the return object is a `square`. If `print` is a `(function _type_ _type_)`, the type system will know that `(print my-square)` will return a `square`.
## Details on the Order of Overrides
The order in which you `defmethod` and `deftype` matters.
When you `deftype`, you copy all methods from the parent. When you `defmethod`, you always set a method in that type. You may also override methods in a child if: the child hasn't modified that method already, and if you are in a certain mode. This is a somewhat slow process that involves iterating over the entire symbol table and every type in the runtime, so I believe it was disabled when loading level code, and you just had to make sure to `deftype` and `defmethod` in order.
Assume you have the type hierarchy where `a` is the parent of `b`, which is the parent of `c`.
If you first define the three types using `deftype`, then override a method from `a` on `c`, then override that same method on `b`, then `c` won't use the override from `b`.
If you first define the three types using `deftype`, then override a method on `b`, it will _sometimes_ do the override on `c`. This depends on the value of the global variable `*enable-method-set*`, and some other confusing options. It may also print a warning but still do the override in certain cases.
## Built in Methods
All types have these 9 methods. They have reasonable defaults if you don't provide anything.
### `new`
The new method is a very special method used to construct a new object, like a constructor. Note that some usages of the `new` keyword __do not__ end up calling the new method. See the `new` section for more details. Unlike C++, fields of a type and elements in an array are not constructed either.
The first argument is an "allocation", indicating where the object should be constructed. It can be
- The symbol `'global` or `'debug`, indicating the global or debug heaps
- The symbols `'process-level-heap` or `'loading-level`, indicating whatever heaps are stored in those symbols.
- `'process`, indicating the allocation should occur on the current process heap.
- `'scratch`, for allocating on the scratchpad. This is unused.
- Otherwise it's treated as a 16-byte aligned address and used for in place construction (it zeros the memory first)
The second argument is the "type to make". It might seem stupid at first, but it allows child classes to use the same `new` method as the parent class.
The remaining arguments can be used for whatever you want.
When writing your own `new` methods, you should ignore the `allocation` argument and use the `object-new` macro to actually do the allocation. This takes care of all the details for getting the memory (and setting up runtime type information if its a basic). See the section on `object-new` for more details.
### `delete`
This method isn't really used very much. Unlike a C++ destructor it's never called automatically. In some cases, it's repurposed as a "clean up" type function but it doesn't actually free any memory. It takes no arguments. The default implementations call `kfree` on what the allocation, but there are two issues:
1. The implementation is sometimes wrong, likely confusing doing pointer math (steps by array stride) with address math (steps by one byte).
2. The `kfree` function does nothing.
The `kheap` system doesn't really support freeing objects unless you free in the opposite order you allocate, so it makes sense that `delete` doesn't really work.
### `print`
This method should print out a short description of the object (with no newlines) and return the object. The printing should be done with `(format #t ...)` (see the section on `format`) for more information. If you call `print` by itself, it'll make this description show up in the REPL. (Note that there is some magic involved to add a newline here... there's actually a function named `print` that calls the `print` method and adds a newline)
The default short description looks like this: `#<test-type @ #x173e54>` for printing an object of type `test-type`. Of course, you can override it with a better version. Built-in types like string, type, boxed integer, pair, have reasonable overrides.
This method is also used to print out the object with `format`'s `~A` format option.
### `inspect`
This method should print out a detailed, multi-line description. By default, `structure`s and `basic`s will have an auto-generated method that prints out the name and value of all fields. For example:
```lisp
gc > (inspect *kernel-context*)
[00164b44] kernel-context
prevent-from-run: 65
require-for-run: 0
allow-to-run: 0
next-pid: 2
fast-stack-top: 1879064576
current-process: #f
relocating-process: #f
relocating-min: 0
relocating-max: 0
relocating-offset: 0
low-memory-message: #t
```
In some cases this method is overridden to provide nicer formatting.
### `length`
This method should return a "length". The default method for this just returns 0, but for things like strings or buffers, it could be used to return the number of characters or elements in use. It's usually used to refer to how many are used, rather than the capacity.
### `asize-of`
This method should return the size of the object. Including the 4 bytes of type info for a `basic`.
By default this grabs the value from the object's `type`, which is only correct for non-dynamic types. For types like `string` or other dynamic types, this method should be overridden. If you intend to store dynamically sized objects of a given type on a process heap, you __must__ implement this method accurately.
### `copy`
Creates a copy of the object. I don't think this used very much. Just does a `memcpy` to duplicate by default.
### `relocate`
The exact details are still unknown, but is used to update internal data structures after an object is moved in memory. This must be support for objects allocated in process heaps of processes allocated on the actor heap or debug actor heap.
It's also called on objects loaded from a GOAL data object file.
### `mem-usage`
Not much is known yet, but used for computing memory usage statistics.

View File

@ -1,11 +1,15 @@
# CGO/DGO Files
The CGO/DGO file format is exactly the same - the only difference is the name of the file. The DGO name indicates that the file contains all the data for a level. The engine will load these files into a level heap, which can then be cleared and replaced with a different level.
# Object File Formats
## CGO/DGO Files
The CGO/DGO file format is exactly the same - the only difference is the name of the file. The DGO name indicates that the file contains all the data for a level. The engine will load these files into a level heap, which can then be cleared and replaced with a different level.
I suspect that the DGO file name came first, as a package containing all the data in the level which can be loaded very quickly. Names in the code all say `dgo`, and the `MakeFileName` system shows that both CGO and DGO files are stored in the `game/dgo` folder. Probably the engine and kernel were packed into a CGO file after the file format was created for loading levels.
Each CGO/DGO file contains a bunch of individual object files. Each file has a name. There are some duplicate names - sometimes the two files with the same names are very different (example, code for an enemy, art for an enemy), and other times they are very similar (tiny differences in code/data). The files come in two versions, v4 and v3, and both CGOs and DGOs contain both versions. If an object file has code in it, it is always a v3. It is possible to have a v3 file with just data, but usually the data is pretty small. The v4 files tend to have a lot of data in them. My theory is that the compiler creates v3 files out of GOAL source code files, and that other tools for creating things like textures/levels/art-groups generate v4 objects. There are a number of optimizations in the loading process for v4 objects that are better suited for larger files. To stay at 60 fps always, a v3 object must be smaller than around 750 kB. A v4 object does not have this limitation.
# The V3 format
Each CGO/DGO file contains a bunch of individual object files. Each file has a name. There are some duplicate names - sometimes the two files with the same names are very different (example, code for an enemy, art for an enemy), and other times they are very similar (tiny differences in code/data). The files come in two versions, v4 and v3, and both CGOs and DGOs contain both versions. If an object file has code in it, it is always a v3. It is possible to have a v3 file with just data, but usually the data is pretty small. The v4 files tend to have a lot of data in them. My theory is that the compiler creates v3 files out of GOAL source code files, and that other tools for creating things like textures/levels/art-groups generate v4 objects. There are a number of optimizations in the loading process for v4 objects that are better suited for larger files. To stay at 60 fps always, a v3 object must be smaller than around 750 kB. A v4 object does not have this limitation.
### The V3 format
The v3 format is divided into three segments:
1. Main: this contains all of the functions/data that will be used by the game.
2. Debug: this is only loaded in debug mode, and is always stored on a separate `kdebugheap`.
@ -17,30 +21,33 @@ This format will be different between the PC and PS2 versions, as linking data f
Each segment can contain functions and data. The top-level segment must start with a function which will be run to initialize the object. All the data here goes through the GOAL compiler and type system.
# The V4 format
### The V4 format
The V4 format contains just data. Like v3, the data is GOAL objects, but was probably generated by a tool that wasn't the compiler. A V4 object has no segments, but must start with a `basic` object. After being linked, the `relocate` method of this `basic` will be called, which should do any additional linking required for the specific object.
Because this is just data, there's no reason for the PC version to change this format. This means we can also check the
Because this is just data, there's no reason for the PC version to change this format. This means we can also check the
Note: you may see references to v2 format in the code. I believe v4 format is identical to v2, except the linking data is stored at the end, to enable a "don't copy the last object" optimization. The game's code uses the `work_v2` function on v4 objects as a result, and some of my comments may refer to v2, when I really mean v4. I believe there are no v2 objects in any games.
# Plan
### The Plan
- Create a library for generating obj files in V3/V4 format
- V4 should match game exactly. Doesn't support code.
- V3 is our own thing. Must support code.
We'll eventually create tools which use the library in V4 mode to generate object files for rebuilding levels and textures. We may need to wait until more about these formats is understood before trying this.
The compiler will use the library in V3 mode to generate object files for each `gc` (GOAL source code) file.
# CGO files
The only CGO files read are `KERNEL.CGO` and `GAME.CGO`.
The `KERNEL.CGO` file contains the GOAL kernel and some very basic libraries (`gcommon`, `gstring`, `gstate`, ...). I believe that `KERNEL` was always loaded on boot during development, as its required for the Listener to function.
### CGO files
The `GAME.CGO` file combines the contents of the `ENGINE`, `COMMON` and `ART` CGO files. `ENGINE` contains the game engine code, `COMMON` contains level-specific code (outside of the game engine) that is always loaded. If code is used in basically all the levels, it makes sense to put in in `COMMON`, so it doesn't have to be loaded for each currently active level. The `ART` CGO contains common art/textures/models, like Jak and his animations.
The only CGO files read are `KERNEL.CGO` and `GAME.CGO`.
The `JUNGLE.CGO`, `MAINCAVE.CGO`, `SUNKEN.CGO` file contains some copies of files used in the jungle, cave, LPC levels. Some are a tiny bit different. I believe it is unused.
The `KERNEL.CGO` file contains the GOAL kernel and some very basic libraries (`gcommon`, `gstring`, `gstate`, ...). I believe that `KERNEL` was always loaded on boot during development, as its required for the Listener to function.
The `GAME.CGO` file combines the contents of the `ENGINE`, `COMMON` and `ART` CGO files. `ENGINE` contains the game engine code, `COMMON` contains level-specific code (outside of the game engine) that is always loaded. If code is used in basically all the levels, it makes sense to put in in `COMMON`, so it doesn't have to be loaded for each currently active level. The `ART` CGO contains common art/textures/models, like Jak and his animations.
The `JUNGLE.CGO`, `MAINCAVE.CGO`, `SUNKEN.CGO` file contains some copies of files used in the jungle, cave, LPC levels. Some are a tiny bit different. I believe it is unused.
The `L1.CGO` file contains basically all the level-specific code/Jak animations and some textures. It doesn't seem to contain any 3D models. It's unused, but I'm still interested in understanding its format, as the Jak 1 demos have this file.
@ -50,10 +57,11 @@ The `VILLAGEP.CGO` file contains common code shared in village levels, which isn
The `WATER-AN.CGO` file contains some small code/data for water animations. Unused. The same data appears in the levels as needed.
# CGO/DGO Loading Process
### CGO/DGO Loading Process
A CGO/DGO file is loaded onto a "heap", which is just a chunk of contiguous memory. The loading process is designed to be fast, and also able to fill the entire heap, and allow each object to allocate memory after it is loaded. The process works like this:
1. Two temporary buffers are allocated at the end of the heap. These are sized so that they can fit the largest object file, not including the last object file.
1. Two temporary buffers are allocated at the end of the heap. These are sized so that they can fit the largest object file, not including the last object file.
2. The IOP begins loading, and is permitted to load the first two object files to the two temporary buffers
3. The main CPU waits for the first object file to be loaded.
4. While the second object file being loaded, the first object is "linked". The first step to linking is to copy the object file data from the temporary buffer to the bottom of the heap, kicking out all the other data in the process. The linking data is checked to see if it is in the top of the heap, and is moved there if it isn't already. The run-once initialization code is copied to another temporary allocation on top of the heap and the debug data is copied to the debug heap.
@ -66,13 +74,12 @@ A CGO/DGO file is loaded onto a "heap", which is just a chunk of contiguous memo
11. The last object will be loaded directly onto the bottom of the heap, as there may not be enough memory to use the temporary buffers and load the last object. The temporary buffers are freed.
12. If the last object is a v3, its linking data will be moved to the top-level, and the object data will be moved to fill in the gap left behind. If the last object is a v2, the main data will be at the beginning of the object data, so there is an optimization that will avoid copying the object data to save time, if the data is already close to being in the right place.
Generally the last file in a level DGO will be the largest v4 object. You can only have one file larger than a temporary buffer, and it must come last. The last file also doesn't have to be copied after being loaded into memory if it is a v4.
V3 max size:
A V3 object is copied all at once with a single `ultimate-memcpy`. Usually linking gets to run for around 3 to 5% of a total frame. The `ultimate-memcpy` routine does a to/from scratchpad transfer. In practice, mem/spr transfers are around 1800 MB/sec, and the data has to be copied twice, so the effective bandwidth is 900 MB/sec.
`900 MB / second * (0.04 * 0.0167 seconds) = 601 kilobytes`
This estimate is backed up by the the chunk size of the v4 copy routine, which copies one chunk per frame. It picks 524 kB as the maximum amount that's safe to copy per frame.
#### V3 max size
A V3 object is copied all at once with a single `ultimate-memcpy`. Usually linking gets to run for around 3 to 5% of a total frame. The `ultimate-memcpy` routine does a to/from scratchpad transfer. In practice, mem/spr transfers are around 1800 MB/sec, and the data has to be copied twice, so the effective bandwidth is 900 MB/sec.
`900 MB / second * (0.04 * 0.0167 seconds) = 601 kilobytes`
This estimate is backed up by the the chunk size of the v4 copy routine, which copies one chunk per frame. It picks 524 kB as the maximum amount that's safe to copy per frame.

View File

@ -1,12 +1,14 @@
# Porting to x86
This document will keep track of stuff that needs to be ported or modified significantly for x86. Anything that uses PS2-specific hardware or relies on stuff in the C Kernel will need to be ported.
## Basic Info
Most of the game is written in GOAL. All this source lives in `goal_src`.
The "runtime" is all the support code written in C++. It's located in `game/`. Sometimes the "runtime" + GOAL code together is called the "target".
Most of the code in "runtime" is reverse engineered code from the real game, with small tweaks to make it work on x86 and with OpenGOAL.
Most of the code in "runtime" is reverse engineered code from the real game, with small tweaks to make it work on x86 and with OpenGOAL.
The code in `game/system` is **not** from the game and is an implementation of system functions that are implemented by Sony in the PS2 game. It's stuff like threading, I/O, etc.
@ -23,6 +25,7 @@ To give an idea of the size of these (counted by `wc -l`):
- System is 1294 lines, and still has some to implement
## Math Libraries
I think most of the math libraries can be decompiled, but there are a few that will need manual work. These are also a great place to do tests as the math functions have very few dependencies and we know what the right answer should be for many of them.
- `bounding-box` (only some stuff)
@ -34,11 +37,12 @@ I think most of the math libraries can be decompiled, but there are a few that w
At some point we may want to re-implement some of these to be more efficient.
## The IOP (I/O Processor) Framework
This is already implemented.
The IOP was a separate I/O Processor on the PS2. It runs a cooperative multi-tasking kernel developed by Sony. In OpenGOAL it is implemented in `game/system/IOP_Kernel.h`. The IOP Kernel is managed by the IOP runtime thread (`game/system/iop_thread.h`).
The library in `game/sce/iop.h` wraps the `IOP_Kernel` in an interface that looks like the Sony libraries used by the game so the IOP code can be ported directly.
The library in `game/sce/iop.h` wraps the `IOP_Kernel` in an interface that looks like the Sony libraries used by the game so the IOP code can be ported directly.
There are a few stub functions that are hardcoded to return the correct values for stuff like CD drive initialization. The main features currently supported are:
- Threads (create, wakeup, start, sleep, delay, get ID)
@ -49,6 +53,7 @@ There are a few stub functions that are hardcoded to return the correct values f
All this stuff is currently used for loading DGOs, which is tested and working.
## OVERLORD Framework
This is already implemented.
The OVERLORD is the code written by Naughty Dog that runs on the IOP. It is responsible for sound and loading data. It's surprisingly complicated and some parts of it are extremely poorly written, especially the thread synchronization stuff. My implementation of OVERLORD is in `game/overlord`. It's not complete yet, but the basics are there and it does enough to load DGOs.
@ -58,6 +63,7 @@ The framework for OVERLORD is already implemented. The C Kernel calls a Sony lib
Once `start_overlord` returns, the initial call to `sceSifLoadModule` returns and the runtime keeps initializing.
## OVERLORD ISO Thread
This is partially implemented.
This thread is responsible for controlling the DVD drive and the small DVD data buffers used in the IOP. It has a big loop in `ISOThread()` in `iso.cpp` that looks for pending reads, executes them, waits for data to be read, then calls a callback. This code is unbelievably confusing.
@ -68,11 +74,12 @@ To interact with the DVD drive, it uses an `IsoFS` abstraction, which is a struc
It also has some sound stuff in it for managing VAG audio streams, but this isn't implemented yet.
The other threads in OVERLORD are "RPC" threads. They sit in a loop waiting for the main runtime thread (EE thread) to send a remote procedure call (RPC). Then they do something (like maybe sending a message to the ISO thread), maybe wait for something to happen, and then return.
The other threads in OVERLORD are "RPC" threads. They sit in a loop waiting for the main runtime thread (EE thread) to send a remote procedure call (RPC). Then they do something (like maybe sending a message to the ISO thread), maybe wait for something to happen, and then return.
From the GOAL/EE side of things, RPC calls can be blocking or non-blocking. They can be issued from GOAL (with `rpc-call`) or from the C Kernel (`RpcCall`). Per "channel" (corresponds to an IOP thread), there can only be one RPC call happening at a time. The `rpc-busy?` command can be used to check if an RPC is complete.
From the GOAL/EE side of things, RPC calls can be blocking or non-blocking. They can be issued from GOAL (with `rpc-call`) or from the C Kernel (`RpcCall`). Per "channel" (corresponds to an IOP thread), there can only be one RPC call happening at a time. The `rpc-busy?` command can be used to check if an RPC is complete.
## IOP PLAY (6)
This is unimplemented.
The `PLAY` RPC appears to be relatively simple and plays/stops/pauses/queues a VAG audio stream. It can either use the "AnimationName" system or another system to get the name of the audio stream. I don't know what sound effects in the game are streamed, but I believe there are some.
@ -80,22 +87,25 @@ The `PLAY` RPC appears to be relatively simple and plays/stops/pauses/queues a V
I suspect the GOAL side code for this is in `gsound` and `gsound-h`.
## IOP STR (5)
This is unimplemented.
This is an RPC for streaming data back to the EE. I think this is used to control animation streaming.
## IOP DGO (4)
This is implemented.
This is the RPC for loading DGO files. The DGO loading is super complicated, but the basic idea is that loading / linking are double buffered. In order to allow linking files to allocate memory, the currently loading file goes in a temporary buffer on the top of the heap. (There are actually two temp buffers that rotate, one for loading out of and one for linking, as the "copy to heap" step is done as part of linking, not loading)
The final chunk is not double buffered. This is so it can be loaded directly into its final location in the heap. This has three advantages: you don't need to copy it out of a temporary buffer, you can have a file larger than the temp buffer and you can also entirely fill the heap this way (the temp buffers are freed so you don't have to worry about that).
The IOP side state machine for this is in `iso.cpp`, implemented inside of the DGO load buffer complete callback and is somewhat complicated because DGO info may be split between multiple buffers, and you have to deal with getting partial info. The EE side is in `kdgo.cpp`.
The IOP side state machine for this is in `iso.cpp`, implemented inside of the DGO load buffer complete callback and is somewhat complicated because DGO info may be split between multiple buffers, and you have to deal with getting partial info. The EE side is in `kdgo.cpp`.
The DGO synchronization is pretty confusing but I believe I have it working. It may be worth documenting it more (I thought I did already, but where did I put it?).
## IOP Server/Ramdisk (3)
This is implemented, but so far unused and untested.
This RPC is used to store files in RAM on the IOP. There's a buffer of around 800 kB. I believe it's used for a few different things, in particular the level visibility data. The EE requests data to be loaded from a file on DVD into the "ramdisk" (just a buffer on the IOP), then can request chunks of this file. Of course it is not as fast as storing the file in the EE RAM, but it is much faster than reading from the DVD again.
@ -104,16 +114,19 @@ This is what Andy Gavin refers to when they said they did "things they weren't s
## IOP Loader (2)
This is unimplemented.
This is used to control the loading of music and soundbanks. I haven't touched it yet. Music and soundbanks are loaded into IOP memory when you switch levels.
## IOP Player (1)
This is unimplemented.
This is used to control the playing of sound, and goes with Loader. Like PLAY it can play VAG audio streams. I'm not sure which one is actually used for streaming audio, maybe both?
## IOP VBlank Handler
This is unimplemented.
The IOP's kernel will call `VBlank_Handler` on each vblank. This is once per frame, and I don't know where it is, or if its tied to the actual HW vblank or framebuffer swap, if it happens at 30/60 fps (or even/odd frames if 30 fps). I suspect it's the real vblank at 60 fps but I don't know.
@ -124,8 +137,8 @@ The EE first has to do some set up to tell the IOP where to copy the data, which
We'll also need to add some stuff to `system` and `sce/iop` to set this up, which will have to work with frame timing stuff so it happens at the right part of the frame.
## Sound Library
This is a pretty big one. To actually make sounds, OVERLORD code uses a third-party sound library called 989SND. Internally 989SND uses the SPU2 (Sound Processor) to actually do the "sound math" to decode ADPCM, do ADSR for the sampled sounds, and do reverb/mixing.
I think the lowest effort sound implementation is to try to reimplement 989SND + the SPU as a single library. This could be tested and developed in isolation from everything else.
@ -133,6 +146,7 @@ I think the lowest effort sound implementation is to try to reimplement 989SND +
We'll also need to pick a library for doing audio on PC and a design for how to keep the audio in sync. My gut feeling is to let the IOP side audio stuff just run totally independent from everything else, like the real game does. Let the audio sampling be driven by the sound device so you never have any crackling/interpolation artifacts. This is why the audio keeps going even after the game crashes on PS2.
## GOAL Kernel
The GOAL kernel needs some modification to work on x86. It implements userspace threading and needs to know the details of how to back up the current CPU state and restore it. It also needs to work with the compiler to make sure that the kernel and compiler agree on what registers may not be preserved across a thread suspend There are also some CPU specific details on how to do dynamic throw/catches, unwinding stack frames, and passing initial arguments to a thread.
In OpenGOAL, the `rsp` is a "real" pointer and all other pointers are "GOAL pointer"s (offset from base of GOAL memory), so there are some details needed to correctly save/restore stacks.
@ -140,9 +154,10 @@ In OpenGOAL, the `rsp` is a "real" pointer and all other pointers are "GOAL poin
A final detail is we will probably want/need the ability to increase the default size of stack that can be backed up on suspend. The default is 256 bytes so if our compiler does worse than the original and we use more stack space, we could run out. There's a check for this so it shouldn't be hard to detect.
## Jak Graphics Basics
The PS2 has some special hardware that's used for graphics. These are the DMAC, the VU1, and the GS.
The DMAC is a sophisticated DMA controller. It runs separately from the EE and can copy data from one place to another at pretty high speed. If it is not stalled for any reason it can reach 2.4 GB/sec. The main RAM is only good for around 1.2 GB/sec so in practice "big" things don't move around any faster than 1.2 GB/sec on average. It's used to send graphics data from main memory to the other components. It can be configured, but it's not programmable. It can do simple transfers, like "copy this block of data from here to there", and more complicated things, like following linked lists.
The PS2 has some special hardware that's used for graphics. These are the DMAC, the VU1, and the GS.
The DMAC is a sophisticated DMA controller. It runs separately from the EE and can copy data from one place to another at pretty high speed. If it is not stalled for any reason it can reach 2.4 GB/sec. The main RAM is only good for around 1.2 GB/sec so in practice "big" things don't move around any faster than 1.2 GB/sec on average. It's used to send graphics data from main memory to the other components. It can be configured, but it's not programmable. It can do simple transfers, like "copy this block of data from here to there", and more complicated things, like following linked lists.
The VU1 takes the role of vertex shaders. It can be programmed, but only in assembly, and it is extremely challenging and confusing. It has an extremely small memory (16 kB), but this memory is extremely fast. It's role is usually to do vertex transformations and lighting, then generate a list of commands to send to the GS. The `XGKICK` instruction on VU1 is used to send data from the VU1 memory to the GS.
@ -153,7 +168,7 @@ The GS is the actual GPU. It has VRAM and receives commands from a few different
The GS is like pixel shaders but it's very simple - it's not programmable and only can do a few fixed things. The GS also has the VRAM, which can contain frame buffers, z buffers, textures, and scratch area for effects.
My understanding is that during a frame, the EE generates a long list of things to draw. These are a DMA "chain" - basically a complicated linked-list like data structure that the PS2's DMA knows how to handle. I believe some graphics calculations are done on the EE - particularly the environment mapping.
My understanding is that during a frame, the EE generates a long list of things to draw. These are a DMA "chain" - basically a complicated linked-list like data structure that the PS2's DMA knows how to handle. I believe some graphics calculations are done on the EE - particularly the environment mapping.
## DMA
@ -209,4 +224,4 @@ My understanding is that during a frame, the EE generates a long list of things
## Ocean
## Navigate
## Navigate

View File

@ -0,0 +1,141 @@
# All Forms
Documented forms are crossed out.
- ~~asm-file~~
- ~~m~~
- ~~ml~~
- ~~md~~
- ~~build-game~~
- ~~build-data~~
- ~~blg~~
- tc
- ~~e~~
- db (debug mode)
- #when
- #unless
- ~~lt~~
- ~~r~~
- ~~shutdown-target~~
- db (disassemble byte)
- dh
- dw
- dd
- df
- segfault
- fpe
- let
- let*
- ~~defun~~
- while
- until
- dotimes
- protect
- +!
- ~~if~~
- ~~when~~
- ~~unless~~
- and
- or
- +1
- +!
- -!
- *!
- 1-
- zero?
- &+!
- &-
- &->
- basic?
- pair?
- binteger?
- rtype-of
- cons
- list
- null?
- caar
- object-new
- expect-eq
- expect-true
- expect-false
- start-test
- finish-test
- top-level
- ~~begin~~
- ~~block~~
- ~~return-from~~
- ~~label~~
- ~~goto~~
- gs
- :exit
- ~~asm-file~~
- ~~asm-data-file~~
- listen-to-target
- reset-target
- :status
- ~~in-package~~
- ~~#cond~~
- ~defglobalconstant~
- ~~seval~~
- ~~cond~~
- ~~when-goto~~
- ~~define~~
- ~~define-extern~~
- ~~set!~~
- dbs
- dbg
- :cont
- :break
- :dump-all-mem
- :pm
- :di
- :disasm
- :bp
- :ubp
- deftype
- ~~defmethod~~
- ->
- &
- the-as
- the
- print-type
- new
- car
- cdr
- method
- declare-type
- none
- ~~lambda~~
- ~~declare~~
- ~~inline~~
- ~~quote~~
- ~~mlet~~
- defconstant
- ~~+~~
- ~~-~~
- ~~*~~
- ~~/~~
- ~~shlv~~
- ~~shrv~~
- ~~sarv~~
- ~~mod~~
- ~~logior~~
- ~~logxor~~
- ~~logand~~
- ~~lognot~~
- ~~=~~
- ~~!=~~
- ~~eq?~~
- ~~neq?~~
- ~~not~~
- ~~<=~~
- ~~>=~~
- ~~<~~
- ~~>~~
- &+
- ~~build-dgos~~
- ~~set-config!~~
- rlet
- .ret
- .sub
- .push
- .pop
- set-config!

View File

@ -1,13 +1,17 @@
# Reader
GOOS and GOAL both use the same reader, which converts text files to S-Expressions and allows these s-expressions to be mapped back to a line in a source file for error messages. This document explains the syntax of the reader. Note that these rules do not explain the syntax of the language (for instance, GOAL has a much more complicated system of integers and many more restrictions), but rather the rules of how your program source must look.
## Integer Input
Integers handled by the reader are 64-bits. Any overflow is considered an error. An integer can be specified as a decimal, like `0` or `-12345`; in hex, like `#xbeef`; or in binary, like `#b101001`. All three representations can be used anywhere an integer is used. Hex numbers do not care about the case of the characters. Decimal numbers are signed, and wrapping from a large positive number to a negative number will generate an error. The valid input range for decimals is `INT64_MIN` to `INT64_MAX`. Hex and binary are unsigned and do not support negative signs, but allow large positive numbers to wrap to negative. Their input range is `0` to `UINT64_MAX`. For example, `-1` can be entered as `-1` or `#xffffffffffffffff`, but not as `UINT64_MAX` in decimal.
## Floating Point Input
Floating point values handled by the reader are implemented with `double`. Weird numbers (denormals, NaN, infinity) are invalid and not handled by the reader directly. A number _must_ have a decimal point to be interpreted as floating point. Otherwise, it will be an integer. Leading/trailing zeros are optional.
## Character Input
Characters are used to represent characters that are part of text. The character `c` is represented by `#\c`. This representation is used for all ASCII characters between `!` and `~`. There are three special characters which have a non-standard representation:
- Space : `#\\s`
- New Line: `#\\n`
@ -15,18 +19,19 @@ Characters are used to represent characters that are part of text. The characte
All other characters are invalid.
## String
## Strings
A string is a sequence of characters, surrounding by double quotes. The ASCII characters from ` ` to `~` excluding `"` can be entered directly. Strings have the following escape codes:
- `\\` : insert a backslash
- `\n` : insert a new line
- `\t` : insert a tab
- `\"` : insert a double quote
## Comments
The reader supports line comments with `;` and multi-line comments with `#| |#`. For example
```
```lisp
(print "hi") ; prints hi
#|
@ -36,6 +41,7 @@ this is a multi-line comment!
```
## Array
The reader supports arrays with the following syntax:
```
; array of 1, 2, 3, 4
@ -45,17 +51,21 @@ The reader supports arrays with the following syntax:
Arrays can be nested with lists, pairs, and other arrays.
## Pair
The reader supports pairs with the following syntax:
```
```lisp
; pair of a, b
(a . b)
```
Pairs can be nested with lists, pairs, and arrays.
## List
The reader supports lists. Lists are just an easier way of constructing a linked list of pairs, terminated with the empty list. The empty list is a special list written like `()`.
```
```lisp
; list of 1, 2, 3
(1 2 3)
; actually the same as
@ -63,8 +73,10 @@ The reader supports lists. Lists are just an easier way of constructing a linked
```
## Symbol
A symbol is a sequence of characters containing no whitespace, and not matching any other data type. (Note: this is not a very good definition). Typically symbols are lower case, and words are separated by a `-`. Examples:
```
```lisp
this-is-a-symbol
; you can have weird symbols too:
#f
@ -76,9 +88,9 @@ __WEIRDLY-NamedSymbol ; this is weird, but OK.
```
## Reader Macros
The reader has some default macros which are common in Scheme/LISP:
- `'x` will be replaced with `(quote x)`
- `` `x`` will be replaced with `(quasiquote x)`
- `,x` will be replaced with `(unquote x)`
- `,@` will be replaced with `(unquote-splicing x)`

View File

@ -1,8 +1,9 @@
## Registers
# Registers
Although modern computers are much faster than the PS2, and we could probably get away with a really inefficient register allocation scheme, I think it's worth it to get this right.
## Register differences between MIPS and x86-64
The PS2's MIPS processor has these categories of register:
- General Purpose. They are 128-bit, but usually only lower 64 bits are used. 32 registers, each 128-bits.
- Floating point registers. 32 registers, each for a 32-bit float.
@ -75,8 +76,8 @@ And the x86-64 GPR map
- `r14`: symbol table
- `r15`: offset pointer
### Plan for Memory Access
The PS2 uses 32-bit pointers, and changing the pointer size is likely to introduce bugs, so we will keep using 32-bit pointers. Also, GOAL has some hardcoded checks on the value for pointers, so we need to make sure the memory appears to the program at the correct address.
To do this, we have separate "GOAL Pointers" and "real pointers". The "real pointers" are just normal x86-64 pointers, and the "GOAL Pointer" is an offset into a main memory array. A "real pointer" to the main memory array is stored in `r15` (offset pointer) when GOAL code is executing, and the GOAL compiler will automatically add this to all memory accesses.
@ -96,6 +97,7 @@ The other registers are less clear. The process pointer can probably be a real
Right now I'm leaning toward 2, but it shouldn't be a huge amount of work to change if I'm wrong.
### Plan for Function Call and Arguments
In GOAL for MIPS, function calls are weird. Functions are always called by register using `t9`. There seems to be a different register allocator for function pointers, as nested function calls have really wacky register allocation. In GOAL-x86-64, this restriction will be removed, and a function can be called from any register. (see next section for why we can do this)
Unfortunately, GOAL's 128-bit function arguments present a big challenge. When calling a function, we can't know if the function we're calling is expecting an integer, float, or 128-bit integer. In fact, the caller may not even know if it has an integer, float, or 128-bit integer. The easy and foolproof way to get this right is to use 128-bit `xmm` registers for all arguments and return values, but this will cause a massive performance hit and increase code size, as we'll have to move values between register types constantly. The current plan is this:
@ -104,8 +106,10 @@ Unfortunately, GOAL's 128-bit function arguments present a big challenge. When
- We'll compromise for 128-bit function calls. When the compiler can figure out that the function being called expects or returns a 128-bit value, it will use the 128-bit calling convention. In all other cases, it will use 64-bit. There aren't many places where 128-bit integer are used outside of inline assembly, so I suspect this will just work. If there are more complicated instances (call a function pointer and get either a 64 or 128-bit result), we will need to special case them.
### Plan for Static Data
The original GOAL implementation always called functions by using the `t9` register. So, on entry to a function, the `t9` register contains the address of the function. If the function needs to access static data, it will move this `fp`, then do `fp` relative addressing to load data. Example:
```
```nasm
function-start:
daddiu sp, sp, -16 ;; allocate space on stack
sd fp, 8(sp) ;; back up old fp on stack
@ -116,8 +120,9 @@ function-start:
To copy this exactly on x86 would require reserving two registers equivalent to `t9` and `gp`. A better approach for x86-64 is to use "RIP relative addressing". This can be used to load memory relative to the current instruction pointer. This addressing mode can be used with "load effective address" (`lea`) to create pointers to static data as well.
### Plan for Memory
Access memory by GOAL pointer in `rx` with constant offset (optionally zero):
```
```nasm
mov rdest, [roff + rx + offset]
```

304
docs/markdown/repl.md Normal file
View File

@ -0,0 +1,304 @@
# Compiler REPL
When you start the OpenGOAL compiler, you'll see a prompt like this:
```lisp
OpenGOAL Compiler 0.2
g >
```
The `g` indicates that you can input OpenGOAL compiler commands. For a listing of common commands run:
```lisp
(repl-help)
```
## Connecting To Target Example
In order to execute OpenGOAL code, you must connect to the listener.
```lisp
;; we cannot execute OpenGOAL code unless we connect the listener
g > (+ 1 2 3)
REPL Error: Compilation generated code, but wasn't supposed to
;; connect to the target
g > (lt)
[Listener] Socket connected established! (took 0 tries). Waiting for version...
Got version 0.2 OK!
[Debugger] Context: valid = true, s7 = 0x147d24, base = 0x2000000000, tid = 1297692
;; execute OpenGOAL code
gc > (+ 1 2 3)
6
;; quit the compiler and reset the target for next time.
gc > (e)
[Listener] Closed connection to target
```
Once we are connected, we see that there is a `gc` prompt. This indicates that the listener has an open socket connection. Now the REPL will accept both compiler commands and OpenGOAL source code. All `(format #t ...` debugging prints (like `printf`) will show up in this REPL. Each time you run something in the REPL, the result is printed as a decimal number. If the result doesn't make sense to print as a decimal, or there is no result, you will get some random number.
In the future there will be a fancier printer here.
## General Command Listing
### `(e)`
```lisp
(e)
```
Exit the compiler once the current REPL command is finished. Takes no arguments. If we are connected to a target through the listener, attempts to reset the target.
### `(:exit)`
Exit Compiler
```lisp
(:exit)
```
Same as `(e)`, just requires more typing. `(e)` is actually a macro for `(:exit)`. Takes no arguments.
### `(lt)`
Listen to Target
```lisp
(lt ["ip address"] [port-number])
```
Connect the listener to a running target. The IP address defaults to `"127.0.0.1"` and the port to `8112` (`DECI2_PORT` in `listener_common.h`). These defaults are usually what you want, so you can just run `(lt)` to connect.
Example:
```lisp
g > (lt)
[Listener] Socket connected established! (took 0 tries). Waiting for version...
Got version 0.2 OK!
[Debugger] Context: valid = true, s7 = 0x147d24, base = 0x2000000000, tid = 1296302
gc >
```
### `r`
Reset the target.
```lisp
(r ["ip address"] [port-number])
```
Regardless of the current state, attempt to reset the target and reconnect. After this, the target will have nothing loaded. Like with `(lt)`, the default IP and port are probably what you want.
Note: `r` is actually a macro.
### `shutdown-target`
If the target is connected, make it exit.
```lisp
(shutdown-target)
```
The target will print
```
GOAL Runtime Shutdown (code 2)
```
before it exits.
### `:status`
Ping the target.
```lisp
(:status)
```
Send a ping-like message to the target. Requires the target to be connected. If successful, prints nothing. Will time-out and display and error message if the GOAL kernel or code dispatched by the kernel is stuck in an infinite loop. Unlikely to be used often.
## Compiler Forms - Compiler Commands
These forms are used to control the GOAL compiler, and are usually entered at the GOAL REPL, or as part of a macro that's executed at the GOAL REPL. These shouldn't be used in GOAL source code.
### `reload`
Reload the GOAL compiler
```lisp
(reload)
```
Disconnect from the target and reset all compiler state. This is equivalent to exiting the compiler and opening it again.
### `get-info`
Get information about something.
```lisp
(get-info <something>)
```
Use `get-info` to see what something is and where it is defined.
For example:
```lisp
;; get info about a global variable:
g > (get-info *kernel-context*)
[Global Variable] Type: kernel-context Defined: text from goal_src/kernel/gkernel.gc, line: 88
(define *kernel-context* (new 'static 'kernel-context
;; get info about a function. This particular function is forward declared, so there's an entry for that too.
;; global functions are also global variables, so there's a global variable entry as well.
g > (get-info fact)
[Forward-Declared] Name: fact Defined: text from goal_src/kernel/gcommon.gc, line: 1098
(define-extern fact (function int int))
[Function] Name: fact Defined: text from kernel/gcommon.gc, line: 1099
(defun fact ((x int))
[Global Variable] Type: (function int int) Defined: text from goal_src/kernel/gcommon.gc, line: 1099
(defun fact ((x int))
;; get info about a type
g > (get-info kernel-context)
[Type] Name: kernel-context Defined: text from goal_src/kernel/gkernel-h.gc, line: 114
(deftype kernel-context (basic)
;; get info about a method
g > (get-info reset!)
[Method] Type: collide-sticky-rider-group Method Name: reset! Defined: text from goal_src/engine/collide/collide-shape-h.gc, line: 48
(defmethod reset! collide-sticky-rider-group ((obj collide-sticky-rider-group))
[Method] Type: collide-overlap-result Method Name: reset! Defined: text from goal_src/engine/collide/collide-shape-h.gc, line: 94
(defmethod reset! collide-overlap-result ((obj collide-overlap-result))
[Method] Type: load-state Method Name: reset! Defined: text from goal_src/engine/level/load-boundary.gc, line: 9
(defmethod reset! load-state ((obj load-state))
;; get info about a constant
g > (get-info TWO_PI)
[Constant] Name: TWO_PI Value: (the-as float #x40c90fda) Defined: text from goal_src/engine/math/trigonometry.gc, line: 34
(defconstant TWO_PI (the-as float #x40c90fda))
;; get info about a built-in form
g > (get-info asm-file)
[Built-in Form] asm-file
```
### `autocomplete`
Preview the results of the REPL autocomplete:
```lisp
(autocomplete <sym>)
```
For example:
```lisp
g > (autocomplete *)
*
*16k-dead-pool*
*4k-dead-pool*
...
Autocomplete: 326/1474 symbols matched, took 1.29 ms
```
### `seval`
Execute GOOS code.
```lisp
(seval form...)
```
Evaluates the forms in the GOOS macro language. The result is not returned in any way, so it's only useful for getting side effects. It's not really used other than to bootstrap some GOAL macros for creating macros.
### `asm-file`
Compile a file.
```lisp
(asm-file "file-name" [:color] [:write] [:load] [:no-code])
```
This runs the compiler on a given file. The file path is relative to the `jak-project` folder. These are the options:
- `:color`: run register allocation and code generation. Can be omitted if you don't want actually generate code. Usually you want this option.
- `:write`: write the object file to the `out/obj` folder. You must also have `:color` on. You must do this to include this file in a DGO.
- `:load`: send the object file to the target with the listener. Requires `:color` but not `:write`. There may be issues with `:load`ing very large object files (believed fixed).
- `:disassemble`: prints a disassembly of the code by function. Currently data is not disassebmled. This code is not linked so references to symbols will have placeholder values like `0xDEADBEEF`. The IR is printed next to each instruction so you can see what symbol is supposed to be linked. Requires `:color`.
- `:no-code`: checks that the result of processing the file generates no code or data. This will be true if your file contains only macros / constant definition. The `goal-lib.gc` file that is loaded by the compiler automatically when it starts must generate no code. You can use `(asm-file "goal_src/goal-lib.gc" :no-code)` to reload this file and double check that it doesn't generate code.
To reduce typing, there are some useful macros:
- `(m "filename")` is "make" and does a `:color` and `:write`.
- `(ml "filename")` is "make and load" and does a `:color` and `:write` and `:load`. This effectively replaces the previous version of file in the currently running game with the one you just compiled, and is a super useful tool for quick debugging/iterating.
- `(md "filename")` is "make debug" and does a `:color`, `:write`, and `:disassemble`. It is quite useful for working on the compiler and seeing what code is output.
- `(build-game)` does `m` on all game files and rebuilds DGOs
- `(blg)` (build and load game) does `build-game` then sends commands to load KERNEL and GAME CGOs. The load is done through DGO loading, not `:load`ing individual object files.
### `asm-data-file`
Build a data file.
```lisp
(asm-data-file tool-name "file-name")
```
The `tool-name` refers to which data building tool should be used. For example, this should be `game-text` when building the game text data files.
There's a macro `(build-data)` which rebuilds everything.
### `gs`
Enter a GOOS REPL.
```lisp
(gs)
```
Example:
```scheme
g> (gs)
goos> (+ 1 2 3)
6
goos> (exit)
()
```
mainly useful for debugging/trying things out in GOOS. The GOOS REPL shares its environment with the GOOS interpreter used by the compiler, so you can inspect/modify things for debugging with this. Likely not used much outside of initial debugging.
### `set-config!`
```lisp
(set-config! config-name config-value)
```
Used to set compiler configuration. This is mainly for debugging the compiler and enabling print statements. There is a `(db)` macro which sets all the configuration options for the compiler to print as much debugging info as possible. Not used often.
### `in-package`
```lisp
(in-package stuff...)
```
The compiler ignores this. GOAL files evidently start with this for some reason related to emacs.
### `build-dgos`
```lisp
(build-dgos "path to dgos description file")
```
Builds all the DGO files described in the DGO description file. See `goal_src/builds/dgos.txt` for an example. This just packs existing things into DGOs - you must have already built all the dependencies.
In the future, this may become part of `asm-data-file`.
### `add-macro-to-autocomplete`
```lisp
(add-macro-to-autocomplete macro-name)
```
Makes the given name show up as a macro in the GOAL REPL. Generating macros may be done programmatically using GOOS and this form can be used to make these show up in the autocomplete. This also makes the macro known to `get-info` which will report that the macro was defined at the location where the macro which expanded to `add-macro-to-autocomplete` is located in GOAL code. This is used internally by `defmacro`.

239
docs/markdown/syntax.md Normal file
View File

@ -0,0 +1,239 @@
# OpenGOAL Syntax & Examples
## The Basics
### Atoms
An "atom" in Lisp is a form that can't be broken down into smaller forms. For example `1234` is an atom, but `(1234 5678)` is not. OpenGOAL supports the following atoms:
### Integers
All integers are by default `int`, a signed 64-bit integer. You can use:
- decimal: Like `123` or `-232`. The allowable range is `INT64_MIN` to `INT64_MAX`.
- hex: Like `#x123`. The allowable range is `0` to `UINT64_MAX`. Values over `INT64_MAX` will wrap around.
- binary: Like `#b10101010`. The range is the same as hex.
- character:
- Most can be written like `#\c` for the character `c`.
- Space is `#\\s`
- New Line is `#\\n`
- Tab is `#\\t`
GOAL has some weird behavior when it comes to integers. It may seem complicated to describe, but it really makes the implementation simpler - the integer types are designed around the available MIPS instructions.
Integers that are used as local variables (defined with `let`), function arguments, function return values, and intermediate values when combining these are called "register integers", as the values will be stored in CPU registers.
Integers that are stored in memory as a field of a `structure`/`basic`, an element in an array, or accessed through a `pointer` are "memory integers", as the values will need to be loaded/stored from memory to access them.
The "register integer" types are `int` and `uint`. They are 64-bit and mostly work exactly like you'd expect. Multiplication, division, and mod, are a little weird and are documented separately.
The "memory integer" types are `int8`, `int16`, `int32`, `int64`, `uint8`, `uint16`, `uint32`, and `uint64`.
Conversions between these types are completely automatic - as soon as you access a "memory integer", it will be converted to a "register integer", and trying to store a "register integer" will automatically convert it to the appropriate "memory integer". It (should be) impossible to accidentally get this wrong.
#### Side Note
- It's not clear what types `(new 'static 'integer)` or `(new 'stack 'integer)` are, though I would assume both are memory.
- If there aren't enough hardware registers, "register integers" can be spilled to stack, but keep their "register integer" types. This process should be impossible to notice, so you don't have to worry about it.
### String
A string generates a static string constant. Currently the "const" of this string "constant" isn't enforced. Creating two identical string constants creates two different string objects, which is different from GOAL and should be fixed at some point.
The string data is in quotes, like in C. The following escapes are supported:
- Newline: `\n`
- Tab: `\t`
- The `\` character: `\\`
- The `"` character: `\"`
- Any character: `\cXX` where `XX` is the hex number for the character.
### Float
Any number constant with a decimal in it. The trailing and leading zeros and negative sign is flexible, so you can do any of these:
- `1.`, `1.0`, `01.`, `01.0`
- `.1`, `0.1`, `.10`, `0.10`
- `-.1`, `-0.1`, `-.10`, `-0.10`
Like string, it creates a static floating point constant. In later games the float was inlined instead of being a static constant.
### Symbol
Use `symbol-name` to get the value of a symbol and `'symbol-name` to get the symbol object.
### Comments
Use `;` for line comments and `#|` and `|#` for block comments.
## Compiling a list
When the compiler encounters a list like `(a b c)` it attempts to parse in multiple ways in this order:
1. A compiler form
2. A GOOS macro
3. An enum (not yet implemented)
4. A function or method call
## Compiling an integer
Integers can be specified as
- decimal: `1` or `-1234` (range of `INT64_MIN` to `INT64_MAX`)
- hex: `#x123`, `#xbeef` (range of `0` to `UINT64_MAX`)
- binary: `#b101010` (range of `0` to `UINT64_MAX`)
All integers are converted to the signed "integer in variable" type called `int`, regardless of how they are specified.
Integer "constant"s are not stored in memory but instead are generated by code, so there's no way to modify them.
## Compiling a string
A string constant can be specified by just putting it in quotes. Like `"this is a string constant"`.
There is an escape code `\` for string:
- `\n` newline
- `\t` tab character
- `\\` the `\` character
- `\"` the `"` character
- `\cXX` where `XX` is a two character hex number: insert this character.
- Any other character following a `\` is an error.
OpenGOAL stores strings in the same segment of the function which uses the string. I believe GOAL does the same.
In GOAL, string constants are pooled per object file (or perhaps per segment)- if the same string appears twice, it is only included once. OpenGOAL currently does not pool strings. If any code is found that modifies a string "constant", or if repeated strings take up too much memory, string pooling will be added.
For now I will assume that string constants are never modified.
## Compiling a float
A floating point constant is distinguished from an integer by a decimal point. Leading/trailing zeros are optional. Examples of floats: `1.0, 1., .1, -.1, -0.2`. Floats are stored in memory, so it may be possible to modify a float constant. For now I will assume that float constants are never modified. It is unknown if they are pooled like strings.
Trivia: Jak 2 realized that it's faster to store floats inline in the code.
## Compiling a symbol
A `symbol` appearing in code is compiled by trying each of these in the following order
1. Is it `none`? (see section on `none`)
2. Try `mlet` symbols
3. Try "lexical" variables (defined in `let`)
4. Try global constants
5. Try global variables (includes named functions and all types)
## The Special `none` type
Anything which doesn't return anything has a return type of `none`, indicating the return value can't be used. This is similar to C's `void`.
## GOAL Structs vs. C Structs
There is one significant difference between C and GOAL when it comes to structs/classes - GOAL variables can only be references to structs.
As an example, consider a GOAL type `my-type` and a C type `my_type`. In C/C++, a variable of type `my_type` represents an entire copy of a `my_type` object, and a `my_type*` is like a reference to an existing `my_type` object. In GOAL, an object of `my-type` is a reference to an existing `my-type` object, like a C `my_type*`. There is no equivalent to a C/C++ `my_type`.
As a result you cannot pass or return a structure by value.
Another way to explain this is that GOAL structures (including `pair`) always have reference semantics. All other GOAL types have value semantics.
## Pointers
GOAL pointers work a lot like C/C++ pointers, but have some slight differences:
- A C `int32_t*` is a GOAL `(pointer int32)`
- A C `void*` is a GOAL `pointer`
- In C, if `x` is a `int32_t*`, `x + 1` is equivalent to `uintptr_t(x) + sizeof(int32_t)`. In GOAL, all pointer math is done in units of bytes.
- In C, you can't do pointer math on a `void*`. In GOAL you can, and all math is done in units of bytes.
In both C and GOAL, there is a connection between arrays and pointers. A GOAL array field will have a pointer-to-element type, and a pointer can be accessed as an array.
One confusing thing is that a `(pointer int32)` is a C `int32_t*`, but a `(pointer my-structure-type)` is a C `my_structure_type**`, because a GOAL `my-structure-type` is like a C `my_structure_type*`.
## Inline Arrays
One limitation of the system above is that an array of `my_structure_type` is actually an array of references to structures (C `object*[]`). It would be more efficient if instead we had an array of structures, laid out together in memory (C `object[]`).
GOAL has a "inline array" to represent this. A GOAL `(inline-array thing)` is like a C `thing[]`. The inline-array can only be used on structure types, as these are the only reference types.
## Fields in Structs
For a field with a reference type (structure/basic)
- `(data thing)` is like C `Thing* data;`
- `(data thing :inline #t)` is like C `Thing data;`
- `(data thing 12)` is like C `Thing* data[12];`. The field has `(pointer thing)` type.
- `(data thing 12 :inline #t)` is like `Thing data[12];`. The field has `(inline-array thing)` type
For a field with a value type (integer, etc)
- `(data int32)` is like C `int32_t data;`
- `(data int32 12)` is like `int32_t data[12];`. The field has `(array int32)` type.
Using the `:inline #t` option on a value type is not allowed.
## Dynamic Structs
GOAL structure can be dynamically sized, which means their size isn't determined at compile time. Instead the user should implement `asize-of` to return the actual size.
This works by having the structure end in an array of unknown size at compile time. In a dynamic structure definition, the last field of the struct should be an array with an unspecified size. To create this, add a `:dynamic #t` option to the field and do not specify an array size. This can be an array of value types, an array of reference types, or an inline-array of reference types.
### Unknown
Is the `size` of a dynamic struct:
- size assuming the dynamic array has 0 elements (I think it's this)
- size assuming the dynamic array doesn't
These can differ by padding for alignment.
## How To Create GOAL Objects - `new`
GOAL has several different ways to create objects, all using the `new` form.
### Heap Allocated Objects
A new object can be allocated on a heap with `(new 'global 'obj-type [new-method-arguments])`.
This simply calls the `new` method of the given type. You can also replace `'global` with `'debug` to allocate on the debug heap.
Currently these are the only two heaps supported, in the future you will be able to call the new method with other arguments
to allow you to do an "in place new" or allocate on a different heap.
This will only work on structures and basics. If you want a heap allocated float/integer/pointer, create an array of size 1.
This will work on dynamically sized items.
### Heap Allocated Arrays
You can construct a heap array with `(new 'global 'inline-array 'obj-type count)` or `(new 'global 'array 'obj-type count)`.
These objects are not initialized. Note that the `array` version creates a `(pointer obj-type)` plain array,
__not__ a GOAL `array` type fancy array. In the future this may change because it is confusing.
Because these objects are uninitialized, you cannot provide constructor arguments.
You cannot use this on dynamically sized member types. However, the array size can be determined at runtime.
### Static Objects
You can create a static object with `(new 'static 'obj-type [field-def]...)`. It can be a structure, basic, bitfield, array, boxed array, or inline array.
Each field def looks like `:field-name field-value`. The `field-value` is evaluated at compile time. Fields
can be integers, floats, symbols, pairs, strings, or other statics. These field values may come from macros or GOAL constants.
For bitfields, there is an exception, and fields can be set to expression that are not known at compile time. The compiler will generate the appropriate code to combine the values known at compile time and run time. This exception does not apply to a bitfield inside of another `(new 'static ...)`.
Fields which aren't explicitly initialized are zeroed, except for the type field of basics, which is properly initialized to the correct type.
This does not work on dynamically sized structures.
### Stack Allocated Arrays
Currently only arrays of integers, floats, or pointers can be stack allocated.
For example, use `(new 'stack ''array 'int32 1)` to get a `(pointer int32)`. Unlike heap allocated arrays, these stack arrays
must have a size that can be determined at compile time. The objects are uninitialized.
### Stack Allocated Structures
Works like heap allocated, the objects are initialized with the constructor. The constructor must support "stack mode". Using `object-new` supports stack mode so usually you don't have to worry about this. The structure's memory will be memset to 0 with `object-new` automatically.
### Defining a `new` Method
TODO
## Array Spacing
In general, all GOAL objects are 16-byte aligned and the boxing system requires this. All heap memory allocations are 16-byte aligned too, so this is usually not an issue.
## Truth
Everything is true except for `#f`. This means `0` is true, and `'()` is true.
The value of `#f` can be used like `nullptr`, at least for any `basic` object. It's unclear if `#f` can/should be used as a null for other types, including `structure`s or numbers or pointers.
Technical note: the hex number `0x147d24` is considered false in Jak 1 NTSC due to where the symbol table happened to be allocated. However, checking numbers for true/false shouldn't be done, you should use `(zero? x)` instead.
## Empty Pair
TODO

View File

@ -1,63 +1,104 @@
Type System
--------------
This document explains the GOAL type system. The GOAL type system supports runtime typing, single inheritance, virtual methods, and dynamically sized structures.
Everything in GOAL has a type at compile time. A subset of compile-time types are also available in the runtime as objects with the same name as the type. For example, there is a `string` type, and at runtime there is a global object named `string` which is an object of type `type` containing information about the `string` type.
Some objects have runtime type information, and others don't. Objects which have runtime type information can have their type identified at runtime, and are called "boxed objects". Objects without runtime type information are called "unboxed objects". An unboxed object cannot reliably be detected as a unboxed object - you can't write a function that takes an arbitrary object and tells you if its boxed or not. However, boxed objects can always be recognized as boxed.
All types have a parent type, and all types descend from the parent type `object`, except for the special type `none` (and maybe `_type_`, but more on this later). The `none` type doesn't exist in the runtime and is used to represent an invalid value that the compiler should not use. For example, the return type of a function which doesn't return anything is `none`, and attempting to use this value should cause an error.
Here are some important special types:
- `object` - the parent of all types
- `structure` - parent type of any type with fields
- `basic` - parent type of any `structure` with runtime type information.
All types have methods. Objects have access to all of their parents methods, and may override parent methods. All types have these 9 methods:
- `new` - like a constructor, returns a new object. It's not used in all cases, and on all types, and needs more documentation on when specifically it is used.
- `delete` - basically unused, but like a destructor. Often calls `kfree`, which does nothing.
- `print` - prints a short, one line representation of the object to the `PrintBuffer`
- `inspect` - prints a multi-line description of the object to the `PrintBuffer`. Usually auto-generated by the compiler and prints out the name and value of each field.
- `length` - Returns a length if the type has something like a length (number of characters in string, etc). Otherwise returns 0. Usually returns the number of filled slots, instead of the total number of allocated slots, when there is possibly a difference.
- `asize-of` - Gets the size in memory of the entire object. Usually this just looks this up from the appropriate `type`, unless it's dynamically sized.
- `copy` - Create a copy of this object on the given heap. Not used very much?
- `relocate` - Some GOAL objects will be moved in memory by the kernel as part of the compacting actor heap system. After being moved, the `relocate` method will be called with the offset of the move, and the object should fix up any internal pointers which may point to the old location. It's also called on v2 objects loaded by the linker when they are first loaded into memory.
- `memusage` - Not understood yet, but probably returns how much memory in bytes the object uses. Not supported by all objects.
Usually a method which overrides a parent method must have the same argument and return types. The only exception is `new` methods, which can have different argument/return types from the parent. (Dee the later section on `_type_` for another exception)
The compiler's implementation for calling a method is:
- Is the type a basic?
- If so, look up the type using runtime type information
- Get the method from the vtable
- Is the type not a basic?
- Get the method from the vtable of the compile-time type
- Note that this process isn't very efficient - instead of directly linking to the slot in the vtable (one deref) it first looks up the `type` by symbol, then the slot (two derefs). I have no idea why it's done this way.
In general, I suspect that the method system was modified after GOAL was first created. There is some evidence that types were once stored in the symbol table, but were removed because the symbol table became full. This could explain some of the weirdness around method calls/definition rules, and the disaster `method-set!` function.
GOAL Value Types
--------------------
# OpenGOAL's Type System
This document explains the GOAL type system. The GOAL type system supports runtime typing, single inheritance, virtual methods, and dynamically sized structures.
There's a single type system library, located in `common/type_system`. It will be used in both the decompiler and compiler. The plan is to have a single `all_types.gc` file which contains all type information (type definitions and types of globals). The decompiler will help generate this, but some small details may need to be filled in manually for some types. Later versions of the decompiler can use this information to figure out what fields of types are being accessed. We can also add a test to make sure that types defined in the decompiled game match `all_types.gc`.
The main features are:
- `TypeSystem` stores all type information and provides a convenient way to add new types or request information about existing types.
- `Type` information about a GOAL Type. A "base GOAL type" is identified by a single unique string. Examples: `function`, `string`, `vector3h`.
- `TypeSpec` a way to specify either `Type` or a "compound type". Compound types are used to create types which represent specific function types (function which takes two integer arguments and returns a string), or specific pointer/array types (pointer to an integer). These can be represented as (possibly nested) lists, like `(pointer integer)` or `(function (integer integer) string)`.
- Type Checking for compiler
- Parsing of type definitions for compiler
- Lowest common ancestor implementation for compiler to figure out return types for branching forms.
- Logic to catch multiple incompatible type definitions for both compiler warnings and decompiler sanity checks
The type system will store:
- The types of all global variables (this includes functions)
- Information about all types:
- Fields/specific details on how to load from memory, alignment, sign extension, size in arrays, etc...
- Parent type
- Methods not defined for the parent.
It's important that all of the type-related info is stored/calculated in a single location. The proof of concept compiler did not have the equivalent of `TypeSystem` and scattered field/array access logic all over the place. This was extremely confusing to get right.
If type information is specified multiple times, and is also inconsistent, the TypeSystem can be configured to either throw an exception or print a warning.
This will be a big improvement over the "proof of concept" compiler which did not handle this situation well. When debugging GOAL you will often put the same file through the compiler again and again, changing functions, but not types. In this case, there should be no warnings. If the type does change, it should warn (as old code may exist that uses the old type layout), but shouldn't cause the compiler to abort, error, or do something very unexpected.
## Compile-time vs Run-time
The types in the runtime are only a subset of the compile time types. Here are the rules I've discovered so far
- Any compound types become just the first type. So `(pointer my-type)` becomes `pointer`.
- The `inline-array` class just becomes `pointer`.
- Some children of integers disappear, but others don't. The rules for this aren't known yet.
## Types of Types
Everything in GOAL has a type at compile time. A subset of compile-time types are also available in the runtime as objects with the same name as the type. For example, there is a `string` type, and at runtime there is a global object named `string` which is an object of type `type` containing information about the `string` type.
Some objects have runtime type information, and others don't. Objects which have runtime type information can have their type identified at runtime, and are called "boxed objects". Objects without runtime type information are called "unboxed objects". An unboxed object cannot reliably be detected as a unboxed object - you can't write a function that takes an arbitrary object and tells you if its boxed or not. However, boxed objects can always be recognized as boxed.
All types have a parent type, and all types descend from the parent type `object`, except for the special type `none` (and maybe `_type_`, but more on this later). The `none` type doesn't exist in the runtime and is used to represent an invalid value that the compiler should not use. For example, the return type of a function which doesn't return anything is `none`, and attempting to use this value should cause an error.
Here are some important special types:
- `object` - the parent of all types
- `structure` - parent type of any type with fields
- `basic` - parent type of any `structure` with runtime type information.
### Value Types
Some GOAL types are "value types", meaning they are passed by value when used as arguments to functions, return values from functions, local variables, and when using `set!`. These are always very small and fit directly into the CPU registers. Some example value types:
- Floating point numbers
- Integers
### Reference Types
GOAL Reference Types
--------------------
Other GOAL types are "reference types", meaning they act like a reference to data when used as arguments to functions, return values from functions, local variables, and when using `set!`. The data can be allocated on a heap, on the stack, or as part of static data included when loading code (which is technically also on a heap). All structure/basic types are reference types.
You can think of these like C/C++ pointers or references, which is how it is implemented. The difference is that there's no special notation for this. A GOAL `string` object is like a C/C++ `string*` or `string&`. A GOAL "pointer to reference type" is like a C/C++ `my_type**`.
Note - this is quite a bit different from C/C++. In C++ you can have a structure with value semantics (normal), or reference semantics (C++ reference or pointer). In GOAL, there is no value semantics for structures! This is great because it means function arguments/variables always fit into registers.
### Dynamic Size Types
Any type which ends with a dynamic array as the last field is dynamic. For these, it's a good idea to implement the `asize-of` method.
### Compound Types
A compound type is a type like "a pointer to an int64" or "a function which takes int as an argument and returns a string". These exist only at compile time, and get simplified at runtime. For example, all pointers become `pointer` and all functions become `function`. (The one exception to this seems to be `inline-array-class` stuff, but this is not yet supported in OpenGOAL).
#### Pointer
Pointers work like you would expect. They can only point to memory types - you can't have a `(pointer int)`, instead you must have a `(pointer int32)` (for example). Note that a `(pointer basic)` is like a C++ `basic**` as `basic` is already like a C++ pointer to struct. You can nest these, like `(pointer (pointer int64))`. If you want a pointer with no type, (like C++ `void*`) just use a plain `pointer`. The `(pointer none)` type is invalid.
Like in C/C++, you can use array indexing with a pointer. One thing to note is that a `(pointer basic)` (or pointer to any reference type) is like a C++ "array of pointers to structs". To get the C++ "array of structs", you need an `inline-array`.
#### Arrays
For value types, arrays work as you expect. They have type `(pointer your-type)`. Arrays of references come in two versions:
- Array of references: `(pointer your-type)`, like a C array of pointers
- Array of inline objects: `(inline-array your-type)`, like a C array of structs
The default alignment of structs is 16 bytes, which is also the minimum alignment of `kmalloc`, and the minimum alignment used when using a reference type as an inline field. However, it's possible to violate this rule in a `(inline-array your-type)` to be more efficient. The `your-type` can set a flag indicating it should be packed in an inline array.
I believe the alignment then becomes the maximum of the minimum alignment of the `your-type` fields. So if you have a type with two `uint32`s (alignment 4 bytes), an `(inline-array your-type)` can then have spacing of 8 bytes, instead of the usual minimum 16. The behavior of a `(field-name your-type :inline #t)` is unchanged and will still align at the minimum of 16 bytes. I _believe_ that the first element of the array will still have an alignment of 16.
##### Inline Arrays
These are only valid for reference types. They refer to an array of the actual data (like C array of structs) rather than an array of reference (like C array of pointers to structs, or GOAL `(pointer structure)`). At runtime, `inline-array` becomes pointer.
For an inline array of basics, elements are 16-byte aligned. For `structure`s that aren't `basic`, the alignment is usually the minimum alignment of all members of the structure, but there is an option to make it 16-byte aligned if needed.
For information about how to create these arrays, see `deftype` (fields in a type) and `new` (create just an array) sections.
#### Function
Function compound types look like this `(function arg0-type arg1-type... return-type)`. There can be no arguments. The `return-type` must always be specified, and should be `none` if there is no return value. The argument types themselves can be compound types. In order to call a function, you must have a compound function type - a `function` by itself cannot be called.
### Field Definitions
GOAL Fields
--------------
GOAL field definitions look like this:
`(name type-name [optional stuff])`
@ -75,7 +116,7 @@ There are many combinations of reference/value, dynamic/not-dynamic, inline/not-
- Value type, `:dynamic #t`: the field marks the beginning of an array (of unknown size). Field type is `(pointer your-type)`
- Value type, with array size: the field marks the beginning of an array (of known size). Field type is `(pointer your-type)`
- Value type, with `:inline #t`: invalid in all cases.
- Reference type, no modifiers: a single reference is stored in the type. Type of field is `your-type` (a C++ pointer).
- Reference type, no modifiers: a single reference is stored in the type. Type of field is `your-type` (a C++ pointer).
- Reference type, `:inline #t`: a single object is stored inside the type. Type of field is `your-type` still (a C++ pointer). The access logic is different to make this work.
- Reference type, `:dynamic #t` or array size: the field marks the beginning of an **array of references**. Field type is `(pointer your-type)`. Like C array of pointers.
- Reference type, `:inline #t` and (`:dynamic #t` or array size): the field marks the beginning of an **array of inline objects**. Field type is `(inline-array your-type)`. Like C array of structs.
@ -89,10 +130,10 @@ Bonus ones, for where the array is stored _outside_ of the type:
Of course, you can combine these, to get even more confusing types! But this seems uncommon.
GOAL Field Placement
---------------------
#### Field Placement
The exact rules for placing fields in GOAL types is unknown, but the simple approach of "place the next field as close as possible to the end of the last field" seems to get it right almost all the time. However, we need to be extra certain that we lay out type fields correctly because many GOAL types have overlapping fields.
The theory I'm going with for now is:
- The order of fields in the `inspect` method is the order fields are listed in in the type definition
- In the rare cases this is wrong, this is due to somebody manually specifying an offset.
@ -102,92 +143,88 @@ As a result, we should specify offsets like this:
- If we think a field was automatically placed, use `:offset-assert` to inform the compiler where we expect it to be. In this case it will still place the field automatically, but if the result is different from the `:offset-assert`, it will throw an error.
- Avoid defining any fields without `:offset` or `:offset-assert`
## Built-in Types
GOAL Arrays
---------------
For value types, arrays work as you expect. They have type `(pointer your-type)`. Arrays of references come in two versions:
- Array of references: `(pointer your-type)`, like a C array of pointers
- Array of inline objects: `(inline-array your-type)`, like a C array of structs
There are a number of built-in types. I use "abstract" type to refer to a type that is only a parent type.
The default alignment of structs is 16 bytes, which is also the minimum alignment of `kmalloc`, and the minimum alignment used when using a reference type as an inline field. However, it's possible to violate this rule in a `(inline-array your-type)` to be more efficient. The `your-type` can set a flag indicating it should be packed in an inline array.
I believe the alignment then becomes the maximum of the minimum alignment of the `your-type` fields. So if you have a type with two `uint32`s (alignment 4 bytes), an `(inline-array your-type)` can then have spacing of 8 bytes, instead of the usual minimum 16. The behavior of a `(field-name your-type :inline #t)` is unchanged and will still align at the minimum of 16 bytes. I _believe_ that the first element of the array will still have an alignment of 16.
### `none`
This is a special type that represents "no information". This is the return type of a function which returns nothing, and also the return type of an expression that doesn't return anything. For example, the expression `(goto x)` does not produce a value, so its type is `none`.
There's a single type system library, located in `common/type_system`. It will be used in both the decompiler and compiler. The plan is to have a single `all_types.gc` file which contains all type information (type definitions and types of globals). The decompiler will help generate this, but some small details may need to be filled in manually for some types. Later versions of the decompiler can use this information to figure out what fields of types are being accessed. We can also add a test to make sure that types defined in the decompiled game match `all_types.gc`.
The main features are:
### `object`
This is the parent type of all types. This is an abstract class. In a variable, this is always `object`, and can hold any `object`. In memory, this is either `object32` or `object64`. The `object32` can hold everything except for `binteger` and 64-bit integers. This type is neither boxed nor unboxed and is neither value nor reference.
- `TypeSystem` stores all type information and provides a convenient way to add new types or request information about existing types.
- `Type` information about a GOAL Type. A "base GOAL type" is identified by a single unique string. Examples: `function`, `string`, `vector3h`.
- `TypeSpec` a way to specify either `Type` or a "compound type". Compound types are used to create types which represent specific function types (function which takes two integer arguments and returns a string), or specific pointer/array types (pointer to an integer). These can be represented as (possibly nested) lists, like `(pointer integer)` or `(function (integer integer) string)`.
- Type Checking for compiler
- Parsing of type definitions for compiler
- Lowest common ancestor implementation for compiler to figure out return types for branching forms.
- Logic to catch multiple incompatible type definitions for both compiler warnings and decompiler sanity checks
### `structure` (child of `object`)
This is the parent type of all types with fields. This is an abstract class and a reference class. A `structure` can hold any `structure`, both in memory and in a variable. It is unboxed.
Compile Time vs. Run Time types
------------------------
The types in the runtime are only a subset of the compile time types. Here are the rules I've discovered so far
- Any compound types become just the first type. So `(pointer my-type)` becomes `pointer`.
- The `inline-array` class just becomes `pointer`.
- Some children of integers disappear, but others don't. The rules for this aren't known yet.
### `basic` (child of `structure`)
This is the "boxed" version of `structure`. The first field of a basic is `type`, which contains the `type` of the object. It is boxed and a reference. A `basic` can hold any `basic`, both in memory and in a variable.
Special `_type_` for methods
----------------------------
The first argument of a method always contains the object that the method is being called on. It also must have the type `_type_`, which will be substituted by the type system (at compile time) using the following rules:
### `symbol` (child of `basic`)
A symbol has a name and a value. The name is a string, and the value is an `object32`. Note that the value is an `object32` so you cannot store a 64-bit integer in a symbol. It is considered "bad" to store unboxed objects in symbols, though you can get away with it sometimes.
- At method definition: replace with the type that the method is being defined for.
- At method call: replace with the compile-time type of the object the method is being called on.
All `symbol`s are stored in the global symbol table, which is a hash table. As a result, you cannot have multiple symbols with the same name. A name is enough to uniquely determine the symbol. To get a symbol, use the syntax `'symbol-name`. To get the value, use `symbol-name`.
A method can have other arguments or a return value that's of type `_type_`. This special "type" will be replaced __at compile time__ with the type which is defining or calling the method. No part of this exists at runtime. It may seem weird, but there are two uses for this.
Each global variable, type, and named global function has a symbol for it which has the variable, type, or function as its value. The linker is able to perform symbol table lookups at link time and patch the code so you don't have to do a hash table lookup every time you access a global variable, function, or type.
The first is to allow children to specialize methods and have their own child type as an argument type. For example, say you have a method `is-same-shape`, which compares two objects and sees if they are the same shape. Suppose you first defined this for type `square` with
```
(defmethod square is-same-shape ((obj1 square) (obj2 square))
(= (-> obj1 side-length) (-> obj2 side-length))
)
```
You can also use symbols as a efficient way to represent a enum. For example, a function may return `'error` or `'complete` as a status. The compiler is able to compare symbols for equality very efficiently (just a pointer comparison, as symbols are a reference type).
Then, if you created a child class of `square` called `rectangle` (this is a terrible way to use inheritance, but it's just an example), and overrode the `is-same-shape` method, you would have to have arguments that are `square`s, which blocks you from accessing `rectangle`-specific fields. The solution is to define the original method with type `_type_` for the first two arguments. Then, the method defined for `rectangle` also will have arguments of type `_type_`, which will expand to `rectangle`.
### `type` (child of `basic`)
A `type` stores information about an OpenGOAL type, including its size, parent, and name (stored as a `symbol`). It also stores the method table. Some OpenGOAL types (children of integers, bitfield types, enums, compounds types) do not have runtime types, and instead become the parent/base type. But these types cannot have runtime type information or methods and are pretty rare. It is a reference type, is boxed, and is dynamically sized (the method table's size is not fixed).
### `string` (child of `basic`)
A string. The string is null terminated and also stores the buffer size. This type is a reference type, is boxed, and is also dynamically sized.
The second use is for a return value. For example, the `print` and `inspect` methods both return the object that is passed to them, which will always be the same type as the argument passed in. If `print` was define as `(function object object)`, then `(print my-square)` would lose the information that the return object is a `square`. If `print` is a `(function _type_ _type_)`, the type system will know that `(print my-square)` will return a `square`.
### `function` (child of `basic`)
A function. Boxed and reference. It is a reference to a function, so it's like a C/C++ function pointer type.
### `kheap` (child of `structure`)
A simple bump-allocated heap. Doesn't store the heap memory, just metadata. Supports allocating from either the top or the bottom. This is used as the memory allocation strategy for the global, debug, and level heaps. Unboxed, reference, not dynamic.
### `array` (child of `basic`)
A "fancy" array. It is not yet implemented in OpenGOAL.
### `pair` (child of `object`)
A pair. It is boxed. You should not make child types of `pair`. The two objects stored by the pair are `object32`s.
### `pointer` (child of `object`)
It is a 32-bit value type containing a pointer. Not boxed, value type. See section on compound types for more information.
### `number` (child of `object`)
Abstract type for all numbers. Value type. 64-bits.
### `float` (child of `number`)
4-byte, single precision floating point number. Value type.
### `integer` (child of `number`)
Abstract class for integer numbers. Child classes are `sinteger` (signed integer), `uinteger` (unsigned integer), and `binteger` (boxed-integer, always signed). These are all 64-bit types.
Children of `sinteger` and `uinteger` are `int8`, `int16`, `int32`, `int64`, `uint8`, `uint16`, `uint32`, `uint64`. These are the size you expect, value types, and not boxed. These only exist as memory types. In a variable, there is only `int` and `uint`. These are both 64-bit types. All integer operations (math, logical, shifts) are 64-bit.
The `binteger` is a boxed integer. It is a 61 bit signed integer (the other three bits are lost due to the number being boxed). There may actually be a `buinteger` (or `ubinteger`) but it doesn't exist in OpenGOAL at this point.
#### Weird Built-in types that aren't supported yet.
- `vu-function`
- `link-block`
- `connectable`
- `file-stream`
- `inline-array` (class)
## Unknown Areas
### Inline Array Class
Inline Array Class
--------------------
There's a weird `inline-array-class` type that's not fully understood yet. It uses `heap-base`.
Heap Base
--------------
### Heap Base
This is a field in `type`. What does it mean? It's zero for most types (at least the early types).
Second Size Field
-------------------
### Second Size Field
There are two fields in `type` for storing the size. The first one stores the exact size, and by default the second stores the size rounded up to the nearest 16 bytes. Why? Who uses it? Does it ever get changed?
The Type System
-------------------
The type system will store:
- The types of all global variables (this includes functions)
- Information about all types:
- Fields/specific details on how to load from memory, alignment, sign extension, size in arrays, etc...
- Parent type
- Methods not defined for the parent.
It's important that all of the type-related info is stored/calculated in a single location. The proof of concept compiler did not have the equivalent of `TypeSystem` and scattered field/array access logic all over the place. This was extremely confusing to get right.
## TODO
If type information is specified multiple times, and is also inconsistent, the TypeSystem can be configured to either throw an exception or print a warning.
This will be a big improvement over the "proof of concept" compiler which did not handle this situation well. When debugging GOAL you will often put the same file through the compiler again and again, changing functions, but not types. In this case, there should be no warnings. If the type does change, it should warn (as old code may exist that uses the old type layout), but shouldn't cause the compiler to abort, error, or do something very unexpected.
Method System
--------------
All type definitions should also define all the methods, in the order they appear in the vtable. I suspect GOAL had this as well because the method ordering otherwise seems random, and in some cases impossible to get right unless (at least) the number of methods was specified in the type declaration.
Todo
-----
- [ ] Kernel types that are built-in
- [ ] Signed/unsigned for a few built-in type fields
- [ ] Tests for field placement logic (probably a full compiler test?)
@ -196,4 +233,3 @@ Todo
- [ ] Stuff for decompiler
- [ ] What field is here?
- [ ] Export all deftypes