ee41947db3
Test files for icu_segmenter are large, so I add build system reviewers as reviewers. Differential Revision: https://phabricator.services.mozilla.com/D213008 |
||
---|---|---|
.. | ||
benches | ||
examples | ||
src | ||
.cargo-checksum.json | ||
Cargo.lock | ||
Cargo.toml | ||
LICENSE | ||
README.md |
zerovec
Zero-copy vector abstractions for arbitrary types, backed by byte slices.
zerovec
enables a far wider range of types — beyond just &[u8]
and &str
— to participate in
zero-copy deserialization from byte slices. It is serde
compatible and comes equipped with
proc macros
Clients upgrading to zerovec
benefit from zero heap allocations when deserializing
read-only data.
This crate has four main types:
ZeroVec<'a, T>
(andZeroSlice<T>
) for fixed-width types likeu32
VarZeroVec<'a, T>
(andVarZeroSlice<T>
) for variable-width types likestr
ZeroMap<'a, K, V>
to map fromK
toV
ZeroMap2d<'a, K0, K1, V>
to map from the pair(K0, K1)
toV
The first two are intended as close-to-drop-in replacements for Vec<T>
in Serde structs. The third and fourth are
intended as a replacement for HashMap
or LiteMap
. When used with Serde derives, be sure to apply
#[serde(borrow)]
to these types, same as one would for Cow<'a, T>
.
ZeroVec<'a, T>
, VarZeroVec<'a, T>
, ZeroMap<'a, K, V>
, and ZeroMap2d<'a, K0, K1, V>
all behave like
Cow<'a, T>
in that they abstract over either borrowed or owned data. When performing deserialization
from human-readable formats (like json
and xml
), typically these types will allocate and fully own their data, whereas if deserializing
from binary formats like bincode
and postcard
, these types will borrow data directly from the buffer being deserialized from,
avoiding allocations and only performing validity checks. As such, this crate can be pretty fast (see below for more information)
on deserialization.
See the design doc for details on how this crate works under the hood.
Cargo features
This crate has several optional Cargo features:
serde
: Allows serializing and deserializingzerovec
's abstractions viaserde
yoke
: Enables implementations ofYokeable
from theyoke
crate, which is also useful in situations involving a lot of zero-copy deserialization.derive
: Makes it easier to use custom types in these collections by providing the#[make_ule]
and#[make_varule]
proc macros, which generate appropriateULE
andVarULE
-conformant types for a given "normal" type.std
: Enabledstd::Error
implementations for error types. This crate is by defaultno_std
with a dependency onalloc
.
Examples
Serialize and deserialize a struct with ZeroVec and VarZeroVec with Bincode:
use zerovec::{VarZeroVec, ZeroVec};
// This example requires the "serde" feature
#[derive(serde::Serialize, serde::Deserialize)]
pub struct DataStruct<'data> {
#[serde(borrow)]
nums: ZeroVec<'data, u32>,
#[serde(borrow)]
chars: ZeroVec<'data, char>,
#[serde(borrow)]
strs: VarZeroVec<'data, str>,
}
let data = DataStruct {
nums: ZeroVec::from_slice_or_alloc(&[211, 281, 421, 461]),
chars: ZeroVec::alloc_from_slice(&['ö', '冇', 'म']),
strs: VarZeroVec::from(&["hello", "world"]),
};
let bincode_bytes =
bincode::serialize(&data).expect("Serialization should be successful");
assert_eq!(bincode_bytes.len(), 67);
let deserialized: DataStruct = bincode::deserialize(&bincode_bytes)
.expect("Deserialization should be successful");
assert_eq!(deserialized.nums.first(), Some(211));
assert_eq!(deserialized.chars.get(1), Some('冇'));
assert_eq!(deserialized.strs.get(1), Some("world"));
// The deserialization will not have allocated anything
assert!(!deserialized.nums.is_owned());
Use custom types inside of ZeroVec:
use zerovec::{ZeroVec, VarZeroVec, ZeroMap};
use std::borrow::Cow;
use zerovec::ule::encode_varule_to_box;
// custom fixed-size ULE type for ZeroVec
#[zerovec::make_ule(DateULE)]
#[derive(Copy, Clone, PartialEq, Eq, Ord, PartialOrd, serde::Serialize, serde::Deserialize)]
struct Date {
y: u64,
m: u8,
d: u8
}
// custom variable sized VarULE type for VarZeroVec
#[zerovec::make_varule(PersonULE)]
#[zerovec::derive(Serialize, Deserialize)] // add Serde impls to PersonULE
#[derive(Clone, PartialEq, Eq, Ord, PartialOrd, serde::Serialize, serde::Deserialize)]
struct Person<'a> {
birthday: Date,
favorite_character: char,
#[serde(borrow)]
name: Cow<'a, str>,
}
#[derive(serde::Serialize, serde::Deserialize)]
struct Data<'a> {
#[serde(borrow)]
important_dates: ZeroVec<'a, Date>,
// note: VarZeroVec always must reference the ULE type directly
#[serde(borrow)]
important_people: VarZeroVec<'a, PersonULE>,
#[serde(borrow)]
birthdays_to_people: ZeroMap<'a, Date, PersonULE>
}
let person1 = Person {
birthday: Date { y: 1990, m: 9, d: 7},
favorite_character: 'π',
name: Cow::from("Kate")
};
let person2 = Person {
birthday: Date { y: 1960, m: 5, d: 25},
favorite_character: '冇',
name: Cow::from("Jesse")
};
let important_dates = ZeroVec::alloc_from_slice(&[Date { y: 1943, m: 3, d: 20}, Date { y: 1976, m: 8, d: 2}, Date { y: 1998, m: 2, d: 15}]);
let important_people = VarZeroVec::from(&[&person1, &person2]);
let mut birthdays_to_people: ZeroMap<Date, PersonULE> = ZeroMap::new();
// `.insert_var_v()` is slightly more convenient over `.insert()` for custom ULE types
birthdays_to_people.insert_var_v(&person1.birthday, &person1);
birthdays_to_people.insert_var_v(&person2.birthday, &person2);
let data = Data { important_dates, important_people, birthdays_to_people };
let bincode_bytes = bincode::serialize(&data)
.expect("Serialization should be successful");
assert_eq!(bincode_bytes.len(), 168);
let deserialized: Data = bincode::deserialize(&bincode_bytes)
.expect("Deserialization should be successful");
assert_eq!(deserialized.important_dates.get(0).unwrap().y, 1943);
assert_eq!(&deserialized.important_people.get(1).unwrap().name, "Jesse");
assert_eq!(&deserialized.important_people.get(0).unwrap().name, "Kate");
assert_eq!(&deserialized.birthdays_to_people.get(&person1.birthday).unwrap().name, "Kate");
} // feature = serde and derive
Performance
zerovec
is designed for fast deserialization from byte buffers with zero memory allocations
while minimizing performance regressions for common vector operations.
Benchmark results on x86_64:
Operation | Vec<T> |
zerovec |
---|---|---|
Deserialize vec of 100 u32 |
233.18 ns | 14.120 ns |
Compute sum of vec of 100 u32 (read every element) |
8.7472 ns | 10.775 ns |
Binary search vec of 1000 u32 50 times |
442.80 ns | 472.51 ns |
Deserialize vec of 100 strings | 7.3740 μs* | 1.4495 μs |
Count chars in vec of 100 strings (read every element) | 747.50 ns | 955.28 ns |
Binary search vec of 500 strings 10 times | 466.09 ns | 790.33 ns |
* This result is reported for Vec<String>
. However, Serde also supports deserializing to the partially-zero-copy Vec<&str>
; this gives 1.8420 μs, much faster than Vec<String>
but a bit slower than zerovec
.
Operation | HashMap<K,V> |
LiteMap<K,V> |
ZeroMap<K,V> |
---|---|---|---|
Deserialize a small map | 2.72 μs | 1.28 μs | 480 ns |
Deserialize a large map | 50.5 ms | 18.3 ms | 3.74 ms |
Look up from a small deserialized map | 49 ns | 42 ns | 54 ns |
Look up from a large deserialized map | 51 ns | 155 ns | 213 ns |
Small = 16 elements, large = 131,072 elements. Maps contain <String, String>
.
The benches used to generate the above table can be found in the benches
directory in the project repository.
zeromap
benches are named by convention, e.g. zeromap/deserialize/small
, zeromap/lookup/large
. The type
is appended for baseline comparisons, e.g. zeromap/lookup/small/hashmap
.
More Information
For more information on development, authorship, contributing etc. please visit ICU4X home page
.