* Remove duplicated section from error_management.md The section explaining the three different error types was duplicated (with minimal changes between the two sections). This (small) PR removes the redundancy. * Update doc/error_management.md Co-authored-by: Geoffroy Couprie <geo.couprie@gmail.com>
14 KiB
Error management
nom's errors are designed with multiple needs in mind:
- indicate which parser failed and where in the input data
- accumulate more context as the error goes up the parser chain
- have a very low overhead, as errors are often discarded by the calling parser (examples:
many0
,alt
) - can be modified according to the user's needs, because some languages need a lot more information
To match these requirements, nom parsers have to return the following result type:
pub type IResult<I, O, E=nom::error::Error<I>> = Result<(I, O), nom::Err<E>>;
pub enum Err<E> {
Incomplete(Needed),
Error(E),
Failure(E),
}
The result is either an Ok((I, O))
containing the remaining input and the
parsed value, or an Err(nom::Err<E>)
with E
the error type.
nom::Err<E>
is an enum because combinators can have different behaviours
depending on the value. The Err<E>
enum expresses 3 conditions for a parser error:
Incomplete
indicates that a parser did not have enough data to decide. This can be returned by parsers found instreaming
submodules to indicate that we should buffer more data from a file or socket. Parsers in thecomplete
submodules assume that they have the entire input data, so if it was not sufficient, they will instead return aErr::Error
. When a parser returnsIncomplete
, we should accumulate more data in the buffer (example: reading from a socket) and call the parser againError
is a normal parser error. If a child parser of thealt
combinator returnsError
, it will try another child parserFailure
is an error from which we cannot recover: Thealt
combinator will not try other branches if a child parser returnsFailure
. If we know we were in the right branch (example: we found a correct prefix character but input after that was wrong), we can transform aErr::Error
into aErr::Failure
with thecut()
combinator
If we are running a parser and know it will not return Err::Incomplete
, we can
directly extract the error type from Err::Error
or Err::Failure
with the
finish()
method:
let parser_result: IResult<I, O, E> = parser(input);
let result: Result<(I, O), E> = parser_result.finish();
If we used a borrowed type as input, like &[u8]
or &str
, we might want to
convert it to an owned type to transmit it somewhere, with the to_owned()
method:
let result: Result<(&[u8], Value), Err<Vec<u8>>> =
parser(data).map_err(|e: E<&[u8]>| -> e.to_owned());
nom provides a powerful error system that can adapt to your needs: you can get reduced error information if you want to improve performance, or you can get a precise trace of parser application, with fine grained position information.
This is done through the third type parameter of IResult
, nom's parser result
type:
pub type IResult<I, O, E=nom::error::Error<I>> = Result<(I, O), Err<E>>;
pub enum Err<E> {
Incomplete(Needed),
Error(E),
Failure(E),
}
This error type is completely generic in nom's combinators, so you can choose exactly which error type you want to use when you define your parsers, or directly at the call site. See the JSON parser for an example of choosing different error types at the call site.
Common error types
the default error type: nom::error::Error
#[derive(Debug, PartialEq)]
pub struct Error<I> {
/// position of the error in the input data
pub input: I,
/// nom error code
pub code: ErrorKind,
}
This structure contains a nom::error::ErrorKind
indicating which kind of
parser encountered an error (example: ErrorKind::Tag
for the tag()
combinator), and the input position of the error.
This error type is fast and has very low overhead, so it is suitable for parsers that are called repeatedly, like in network protocols. It is very limited though, it will not tell you about the chain of parser calls, so it is not enough to write user friendly errors.
Example error returned in a JSON-like parser (from examples/json.rs
):
let data = " { \"a\"\t: 42,
\"b\": [ \"x\", \"y\", 12 ] ,
\"c\": { 1\"hello\" : \"world\"
}
} ";
// will print:
// Err(
// Failure(
// Error {
// input: "1\"hello\" : \"world\"\n }\n } ",
// code: Char,
// },
// ),
// )
println!(
"{:#?}\n",
json::<Error<&str>>(data)
);
getting more information: nom::error::VerboseError
The VerboseError<I>
type accumulates more information about the chain of
parsers that encountered an error:
#[derive(Clone, Debug, PartialEq)]
pub struct VerboseError<I> {
/// List of errors accumulated by `VerboseError`, containing the affected
/// part of input data, and some context
pub errors: crate::lib::std::vec::Vec<(I, VerboseErrorKind)>,
}
#[derive(Clone, Debug, PartialEq)]
/// Error context for `VerboseError`
pub enum VerboseErrorKind {
/// Static string added by the `context` function
Context(&'static str),
/// Indicates which character was expected by the `char` function
Char(char),
/// Error kind given by various nom parsers
Nom(ErrorKind),
}
It contains the input position and error code for each of those parsers.
It does not accumulate errors from the different branches of alt
, it will
only contain errors from the last branch it tried.
It can be used along with the nom::error::context
combinator to inform about
the parser chain:
context(
"string",
preceded(char('\"'), cut(terminated(parse_str, char('\"')))),
)(i)
It is not very usable if printed directly:
// parsed verbose: Err(
// Failure(
// VerboseError {
// errors: [
// (
// "1\"hello\" : \"world\"\n }\n } ",
// Char(
// '}',
// ),
// ),
// (
// "{ 1\"hello\" : \"world\"\n }\n } ",
// Context(
// "map",
// ),
// ),
// (
// "{ \"a\"\t: 42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } ",
// Context(
// "map",
// ),
// ),
// ],
// },
// ),
// )
println!("parsed verbose: {:#?}", json::<VerboseError<&str>>(data));
But by looking at the original input and the chain of errors, we can build
a more user friendly error message. The nom::error::convert_error
function
can build such a message.
let e = json::<VerboseError<&str>>(data).finish().err().unwrap();
// here we use the `convert_error` function, to transform a `VerboseError<&str>`
// into a printable trace.
//
// This will print:
// verbose errors - `json::<VerboseError<&str>>(data)`:
// 0: at line 2:
// "c": { 1"hello" : "world"
// ^
// expected '}', found 1
//
// 1: at line 2, in map:
// "c": { 1"hello" : "world"
// ^
//
// 2: at line 0, in map:
// { "a" : 42,
// ^
println!(
"verbose errors - `json::<VerboseError<&str>>(data)`:\n{}",
convert_error(data, e)
);
Note that VerboseError
and convert_error
are meant as a starting point for
language errors, but that they cannot cover all use cases. So a custom
convert_error
function should probably be written.
Improving usability: nom_locate and nom-supreme
These crates were developed to improve the user experience when writing nom parsers.
nom_locate
nom_locate wraps the input data in a Span
type that can be understood by nom parsers. That type provides location
information, like line and column.
nom-supreme
nom-supreme provides the ErrorTree<I>
error
type, that provides the same chain of parser errors as VerboseError
, but also
accumulates errors from the various branches tried by alt
.
With this error type, you can explore everything that has been tried by the parser.
The ParseError
trait
If those error types are not enough, we can define our own, by implementing
the ParseError<I>
trait. All nom combinators are generic over that trait
for their errors, so we only need to define it in the parser result type,
and it will be used everywhere.
pub trait ParseError<I>: Sized {
/// Creates an error from the input position and an [ErrorKind]
fn from_error_kind(input: I, kind: ErrorKind) -> Self;
/// Combines an existing error with a new one created from the input
/// position and an [ErrorKind]. This is useful when backtracking
/// through a parse tree, accumulating error context on the way
fn append(input: I, kind: ErrorKind, other: Self) -> Self;
/// Creates an error from an input position and an expected character
fn from_char(input: I, _: char) -> Self {
Self::from_error_kind(input, ErrorKind::Char)
}
/// Combines two existing errors. This function is used to compare errors
/// generated in various branches of `alt`
fn or(self, other: Self) -> Self {
other
}
}
Any error type has to implement that trait, that requires ways to build an error:
from_error_kind
: From the input position and theErrorKind
enum that indicates in which parser we got an errorappend
: Allows the creation of a chain of errors as we backtrack through the parser tree (various combinators will add more context)from_char
: Creates an error that indicates which character we were expectingor
: In combinators likealt
, allows choosing between errors from various branches (or accumulating them)
We can also implement the ContextError
trait to support the context()
combinator used by VerboseError<I>
:
pub trait ContextError<I>: Sized {
fn add_context(_input: I, _ctx: &'static str, other: Self) -> Self {
other
}
}
And there is also the FromExternalError<I, E>
used by map_res
to wrap
errors returned by other functions:
pub trait FromExternalError<I, ExternalError> {
fn from_external_error(input: I, kind: ErrorKind, e: ExternalError) -> Self;
}
Example usage
Let's define a debugging error type, that will print something every time an error is generated. This will give us a good insight into what the parser tried. Since errors can be combined with each other, we want it to keep some info on the error that was just returned. We'll just store that in a string:
struct DebugError {
message: String,
}
Now let's implement ParseError
and ContextError
on it:
impl ParseError<&str> for DebugError {
// on one line, we show the error code and the input that caused it
fn from_error_kind(input: &str, kind: ErrorKind) -> Self {
let message = format!("{:?}:\t{:?}\n", kind, input);
println!("{}", message);
DebugError { message }
}
// if combining multiple errors, we show them one after the other
fn append(input: &str, kind: ErrorKind, other: Self) -> Self {
let message = format!("{}{:?}:\t{:?}\n", other.message, kind, input);
println!("{}", message);
DebugError { message }
}
fn from_char(input: &str, c: char) -> Self {
let message = format!("'{}':\t{:?}\n", c, input);
println!("{}", message);
DebugError { message }
}
fn or(self, other: Self) -> Self {
let message = format!("{}\tOR\n{}\n", self.message, other.message);
println!("{}", message);
DebugError { message }
}
}
impl ContextError<&str> for DebugError {
fn add_context(input: &str, ctx: &'static str, other: Self) -> Self {
let message = format!("{}\"{}\":\t{:?}\n", other.message, ctx, input);
println!("{}", message);
DebugError { message }
}
}
So when calling our JSON parser with this error type, we will get a trace of all the times a parser stoppped and backtracked:
println!("debug: {:#?}", root::<DebugError>(data));
AlphaNumeric: "\"\t: 42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } "
'{': "42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } "
'{': "42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } "
"map": "42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } "
[..]
AlphaNumeric: "\": { 1\"hello\" : \"world\"\n }\n } "
'"': "1\"hello\" : \"world\"\n }\n } "
'"': "1\"hello\" : \"world\"\n }\n } "
"string": "1\"hello\" : \"world\"\n }\n } "
'}': "1\"hello\" : \"world\"\n }\n } "
'}': "1\"hello\" : \"world\"\n }\n } "
"map": "{ 1\"hello\" : \"world\"\n }\n } "
'}': "1\"hello\" : \"world\"\n }\n } "
"map": "{ 1\"hello\" : \"world\"\n }\n } "
"map": "{ \"a\"\t: 42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } "
debug: Err(
Failure(
DebugError {
message: "'}':\t\"1\\\"hello\\\" : \\\"world\\\"\\n }\\n } \"\n\"map\":\t\"{ 1\\\"hello\\\" : \\\"world
\\"\\n }\\n } \"\n\"map\":\t\"{ \\\"a\\\"\\t: 42,\\n \\\"b\\\": [ \\\"x\\\", \\\"y\\\", 12 ] ,\\n \\\"c\\\": { 1\
\"hello\\\" : \\\"world\\\"\\n }\\n } \"\n",
},
),
)
Here we can see that when parsing { 1\"hello\" : \"world\"\n }\n }
, after
getting past the initial {
, we tried:
- parsing a
"
because we're expecting a key name, and that parser was part of the "string" parser - parsing a
}
because the map might be empty. When this fails, we backtrack, through 2 recursive map parsers:
'}': "1\"hello\" : \"world\"\n }\n } "
"map": "{ 1\"hello\" : \"world\"\n }\n } "
"map": "{ \"a\"\t: 42,\n \"b\": [ \"x\", \"y\", 12 ] ,\n \"c\": { 1\"hello\" : \"world\"\n }\n } "
Debugging parsers
While you are writing your parsers, you will sometimes need to follow which part of the parser sees which part of the input.
To that end, nom provides the dbg_dmp
function that will observe
a parser's input and output, and print a hexdump of the input if there was an
error. Here is what it could return:
fn f(i: &[u8]) -> IResult<&[u8], &[u8]> {
dbg_dmp(tag("abcd"), "tag")(i)
}
let a = &b"efghijkl"[..];
// Will print the following message:
// tag: Error(Error(Error { input: [101, 102, 103, 104, 105, 106, 107, 108], code: Tag })) at:
// 00000000 65 66 67 68 69 6a 6b 6c efghijkl
f(a);
You can go further with the nom-trace crate