Since we are trying to now guide people to use `codespan-reporting` directly without relying on codespan, it makes sense to make it more prominent in the README.
This adds the `SimpleFile` and `SimpleFiles` structures to the `codespan_reporting` crate to get people started with their programming languages. It also inverts the dependency between `codespan` and `codespan_reporting`. That way users of `codespan_reporting` no longer have to depend on `codespan` if they don’t want to.
**THIS IS A BREAKING CHANGE**
When porting Arret from `codespan` 0.3 (see etaoins/arret#160) one of
the missing features was being able use non-`String` types for storing
source.
Arret has two potential uses for this:
1. When loading native libraries exposing Arret functions (known as RFI
libraries) their Arret type is specified as a per-function
`&'static str`. An example can be seen here:
dfc41bed1c/stdlib/rust/list.rs (L8)
With `codespan` 0.3 we used `Cow<'static str>` to avoid allocating a
new `String` for every RFI function by pointing directly inside the
loaded library's binary.
2. Arret is multithreaded but needs to maintain a global `Files`
instance to allocate `FileId`s. It uses a read-write lock for thread
safety and is sensitive to contention on that lock in some
circumstances.
Because we need a `FileId` in our AST we can't parse the source to
AST until we've added it to `Files`. However, because `Files` takes
ownership of the source we need to hold a read lock on `Files` to use
`files.source()`. The effectively serialises loading source files
across all threads.
Previously we used `Arc<FileMap>` to drop the lock on `CodeMap` and
then parse the `FileMap` locklessly. The API doesn't allow this
anymore but we could use `Source = Arc<str>` to similar effect.
Unfortunately the above two uses are mutually exclusive using standard
library types. However, either one is an improvement over `String` and a
crate type like `supercow` could potentially satisfy both uses.
This is a breaking change due to adding a new generic parameter to
`Files`. I initially attempted to use a default of `String` but this
doesn't work for `Files::new` due to rust-lang/rust#27336. See
rust-lang/wg-allocators#1 for an analogous case to this.
This was caught by Clippy. `Span` is an 8 byte copy type so it makes
sense to take it by value instead of reference. This is consistent with
what `merge` was already doing.
When porting from `codespan` 0.3 Arret I needed to track the `FileId`
separately from the `Span` now that the file isn't implied in the span
itself. Having a simple struct like this works for 90% of the cases:
```rust
struct Span {
file_id: codespan::FileId,
codespan_span: codespan::Span,
}
```
However, this requires that every time we parse any source we require a
`FileId` which requires a `codespan::Files` to create. This is overkill
in a couple of cases:
1. Unit tests that just need to parse simple Arret strings as a
shorthand for building our AST
2. Syntax highlighting in the REPL
The compromise in etaoins/arret#160 is to use `Option<FileId>`. I'm not
100% happy with that as the ultimate solution but it was the easiest
path to take during porting. However, this makes the size of the above
struct 13 bytes which will typically be rounded up to 16 bytes for
alignment and padding.
We can use `std::num::NonZeroU32` instead and offset all of our indexes
by 1. This allows the Rust compiler to use 0 to represent the `None`
case and bring the size down to 12 bytes. Because the struct only needs
an alignment of 4 bytes this is more likely to stay as 12 bytes in most
contexts.
A nicer solution would be if Rust had e.g. `NonMaxU32` but it appears
the Rust team instead wants to use const generics so arbitrary niche
values can be specified.
In Arret nearly every intermediate data structure contains a `Span` so
our generated code can be mapped back to source location. However,
they're rarely used until we need to report a diagnostic or generate
debugging information.
This led to a common pattern in unit tests where we use a dummy span to
test intermediate transformations where we aren't starting from actual
Arret source. This used to be a `const` called `EMPTY_SPAN` which can be
used to build other `const` data structures used for testing.
In etaoins/arret#160 we needed to switch to an `empty_span()` function
because there is no longer a `const` constructor for `Span`.
We can't make `Span::new` `const` yet because const functions are not
powerful enough. However, `Span::initial` is perfect for this use case.