2017-02-27 00:05:27 +00:00
|
|
|
/*
|
|
|
|
* The little filesystem
|
|
|
|
*
|
2019-07-19 00:43:49 +00:00
|
|
|
* Copyright (c) 2017, Arm Limited. All rights reserved.
|
|
|
|
* SPDX-License-Identifier: BSD-3-Clause
|
2017-02-27 00:05:27 +00:00
|
|
|
*/
|
|
|
|
#include "lfs.h"
|
2017-03-25 21:20:31 +00:00
|
|
|
#include "lfs_util.h"
|
2017-02-27 00:05:27 +00:00
|
|
|
|
2019-08-03 14:17:47 +00:00
|
|
|
#define LFS_BLOCK_NULL ((lfs_block_t)-1)
|
|
|
|
#define LFS_BLOCK_INLINE ((lfs_block_t)-2)
|
2017-02-27 00:05:27 +00:00
|
|
|
|
2017-04-30 16:19:37 +00:00
|
|
|
/// Caching block device operations ///
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
static inline void lfs_cache_drop(lfs_t *lfs, lfs_cache_t *rcache) {
|
|
|
|
// do not zero, cheaper if cache is readonly or only going to be
|
|
|
|
// written with identical data (during relocates)
|
|
|
|
(void)lfs;
|
2019-08-03 14:17:47 +00:00
|
|
|
rcache->block = LFS_BLOCK_NULL;
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline void lfs_cache_zero(lfs_t *lfs, lfs_cache_t *pcache) {
|
|
|
|
// zero to avoid information leak
|
2019-04-12 13:41:42 +00:00
|
|
|
memset(pcache->buffer, 0xff, lfs->cfg->cache_size);
|
2019-08-03 14:17:47 +00:00
|
|
|
pcache->block = LFS_BLOCK_NULL;
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs_bd_read(lfs_t *lfs,
|
|
|
|
const lfs_cache_t *pcache, lfs_cache_t *rcache, lfs_size_t hint,
|
2018-08-04 19:48:27 +00:00
|
|
|
lfs_block_t block, lfs_off_t off,
|
|
|
|
void *buffer, lfs_size_t size) {
|
2017-04-22 18:30:40 +00:00
|
|
|
uint8_t *data = buffer;
|
2019-12-30 11:56:27 +00:00
|
|
|
if ((off+size > lfs->cfg->block_size) || (block == LFS_BLOCK_NULL)) {
|
Modified lfs_dir_compact to avoid redundant erases during split
The commit machine in littlefs has three stages: commit, compact, and
then split. First we try to append our commit to the metadata log, if
that fails we try to compact the metadata log to remove duplicates and make
room for the commit, if that still fails we split the metadata into two
metadata-pairs and try again. Each stage is less efficient but also less
frequent.
However, in the case that we're filling up a directory with new files,
such as the bootstrap process in setting up a new system, we must pass
through all three stages rather quickly in order to get enough
metadata-pairs to hold all of our files. This means we'll compact,
split, and then need to compact again. This creates more erases than is
needed in the optimal case, which can be a big cost on disks with an
expensive erase operation.
In theory, we can actually avoid this redundant erase by reusing the
data we wrote out in the first attempt to compact. In practice, this
trick is very complicated to pull off.
1. We may need to cache a half-completed program while we write out the
new metadata-pair. We need to write out the second pair first in
order to get our new tail before we complete our first metadata-pair.
This requires two pcaches, which we don't have
The solution here is to just drop our cache and reconstruct what if
would have been. This needs to be perfect down to the byte level
because we don't have knowledge of where our cache lines are.
2. We may have written out entries that are then moved to the new
metadata-pair.
The solution here isn't pretty but it works, we just add a delete
tag for any entry that was moved over.
In the end the solution ends up a bit hacky, with different layers poked
through the commit logic in order to manage writes at the byte level
from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
|
|
|
return LFS_ERR_CORRUPT;
|
|
|
|
}
|
2017-04-22 18:30:40 +00:00
|
|
|
|
|
|
|
while (size > 0) {
|
2018-10-02 20:42:07 +00:00
|
|
|
lfs_size_t diff = size;
|
|
|
|
|
2018-08-04 19:48:27 +00:00
|
|
|
if (pcache && block == pcache->block &&
|
|
|
|
off < pcache->off + pcache->size) {
|
2018-10-02 20:42:07 +00:00
|
|
|
if (off >= pcache->off) {
|
|
|
|
// is already in pcache?
|
|
|
|
diff = lfs_min(diff, pcache->size - (off-pcache->off));
|
|
|
|
memcpy(data, &pcache->buffer[off-pcache->off], diff);
|
2017-04-22 18:30:40 +00:00
|
|
|
|
2018-10-02 20:42:07 +00:00
|
|
|
data += diff;
|
|
|
|
off += diff;
|
|
|
|
size -= diff;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
// pcache takes priority
|
|
|
|
diff = lfs_min(diff, pcache->off-off);
|
2017-04-30 16:19:37 +00:00
|
|
|
}
|
|
|
|
|
2018-08-04 19:48:27 +00:00
|
|
|
if (block == rcache->block &&
|
|
|
|
off < rcache->off + rcache->size) {
|
2018-10-02 20:42:07 +00:00
|
|
|
if (off >= rcache->off) {
|
|
|
|
// is already in rcache?
|
|
|
|
diff = lfs_min(diff, rcache->size - (off-rcache->off));
|
|
|
|
memcpy(data, &rcache->buffer[off-rcache->off], diff);
|
|
|
|
|
|
|
|
data += diff;
|
|
|
|
off += diff;
|
|
|
|
size -= diff;
|
|
|
|
continue;
|
2018-08-04 19:48:27 +00:00
|
|
|
}
|
2017-04-22 18:30:40 +00:00
|
|
|
|
2018-10-02 20:42:07 +00:00
|
|
|
// rcache takes priority
|
|
|
|
diff = lfs_min(diff, rcache->off-off);
|
2017-04-22 18:30:40 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// load to cache, first condition can no longer fail
|
2018-03-17 15:28:14 +00:00
|
|
|
LFS_ASSERT(block < lfs->cfg->block_count);
|
2017-04-30 16:19:37 +00:00
|
|
|
rcache->block = block;
|
2018-10-02 20:42:07 +00:00
|
|
|
rcache->off = lfs_aligndown(off, lfs->cfg->read_size);
|
2019-05-21 22:21:52 +00:00
|
|
|
rcache->size = lfs_min(
|
|
|
|
lfs_min(
|
|
|
|
lfs_alignup(off+hint, lfs->cfg->read_size),
|
|
|
|
lfs->cfg->block_size)
|
|
|
|
- rcache->off,
|
|
|
|
lfs->cfg->cache_size);
|
2017-04-30 16:19:37 +00:00
|
|
|
int err = lfs->cfg->read(lfs->cfg, rcache->block,
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
rcache->off, rcache->buffer, rcache->size);
|
2019-07-16 20:55:29 +00:00
|
|
|
LFS_ASSERT(err <= 0);
|
2017-04-22 18:30:40 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
2017-03-25 21:20:31 +00:00
|
|
|
}
|
2017-02-27 00:05:27 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
enum {
|
|
|
|
LFS_CMP_EQ = 0,
|
|
|
|
LFS_CMP_LT = 1,
|
|
|
|
LFS_CMP_GT = 2,
|
|
|
|
};
|
|
|
|
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
static int lfs_bd_cmp(lfs_t *lfs,
|
|
|
|
const lfs_cache_t *pcache, lfs_cache_t *rcache, lfs_size_t hint,
|
2018-08-04 19:48:27 +00:00
|
|
|
lfs_block_t block, lfs_off_t off,
|
|
|
|
const void *buffer, lfs_size_t size) {
|
2017-06-24 05:43:05 +00:00
|
|
|
const uint8_t *data = buffer;
|
|
|
|
|
|
|
|
for (lfs_off_t i = 0; i < size; i++) {
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
uint8_t dat;
|
|
|
|
int err = lfs_bd_read(lfs,
|
|
|
|
pcache, rcache, hint-i,
|
|
|
|
block, off+i, &dat, 1);
|
2017-06-24 05:43:05 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
if (dat != data[i]) {
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
return (dat < data[i]) ? LFS_CMP_LT : LFS_CMP_GT;
|
2017-06-24 05:43:05 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
return LFS_CMP_EQ;
|
2017-06-24 05:43:05 +00:00
|
|
|
}
|
|
|
|
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
static int lfs_bd_flush(lfs_t *lfs,
|
2018-08-04 19:48:27 +00:00
|
|
|
lfs_cache_t *pcache, lfs_cache_t *rcache, bool validate) {
|
2019-08-03 14:17:47 +00:00
|
|
|
if (pcache->block != LFS_BLOCK_NULL && pcache->block != LFS_BLOCK_INLINE) {
|
2018-03-17 15:28:14 +00:00
|
|
|
LFS_ASSERT(pcache->block < lfs->cfg->block_count);
|
2018-08-04 19:48:27 +00:00
|
|
|
lfs_size_t diff = lfs_alignup(pcache->size, lfs->cfg->prog_size);
|
2017-06-24 05:43:05 +00:00
|
|
|
int err = lfs->cfg->prog(lfs->cfg, pcache->block,
|
2018-08-04 19:48:27 +00:00
|
|
|
pcache->off, pcache->buffer, diff);
|
2019-07-16 20:55:29 +00:00
|
|
|
LFS_ASSERT(err <= 0);
|
2017-04-30 16:19:37 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2017-04-22 18:30:40 +00:00
|
|
|
|
2018-08-04 19:48:27 +00:00
|
|
|
if (validate) {
|
|
|
|
// check data on disk
|
2018-08-05 01:33:09 +00:00
|
|
|
lfs_cache_drop(lfs, rcache);
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
int res = lfs_bd_cmp(lfs,
|
|
|
|
NULL, rcache, diff,
|
|
|
|
pcache->block, pcache->off, pcache->buffer, diff);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (res < 0) {
|
|
|
|
return res;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (res != LFS_CMP_EQ) {
|
|
|
|
return LFS_ERR_CORRUPT;
|
2017-06-24 05:43:05 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-07-06 16:14:30 +00:00
|
|
|
lfs_cache_zero(lfs, pcache);
|
2017-04-22 18:30:40 +00:00
|
|
|
}
|
|
|
|
|
2017-04-30 16:19:37 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
static int lfs_bd_sync(lfs_t *lfs,
|
|
|
|
lfs_cache_t *pcache, lfs_cache_t *rcache, bool validate) {
|
|
|
|
lfs_cache_drop(lfs, rcache);
|
|
|
|
|
|
|
|
int err = lfs_bd_flush(lfs, pcache, rcache, validate);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-07-16 20:55:29 +00:00
|
|
|
err = lfs->cfg->sync(lfs->cfg);
|
|
|
|
LFS_ASSERT(err <= 0);
|
|
|
|
return err;
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs_bd_prog(lfs_t *lfs,
|
2018-08-04 19:48:27 +00:00
|
|
|
lfs_cache_t *pcache, lfs_cache_t *rcache, bool validate,
|
|
|
|
lfs_block_t block, lfs_off_t off,
|
|
|
|
const void *buffer, lfs_size_t size) {
|
2017-04-30 16:19:37 +00:00
|
|
|
const uint8_t *data = buffer;
|
2019-08-03 14:17:47 +00:00
|
|
|
LFS_ASSERT(block != LFS_BLOCK_NULL);
|
Added internal lfs_dir_set, an umbrella to dir append/update/remove operations
This move was surprisingly complex, but offers the ultimate opportunity for
code reuse in terms of resizable entries. Instead of needing to provide
separate functions for adding and removing entries, adding and removing
entries can just be viewed as changing an entry's size to-and-from zero.
Unfortunately, it's not _quite_ that simple, since append and remove
hide some relatively complex operations for when directory blocks
overflow or need to be cleaned up.
However, with enough shoehorning, and a new committer type that allows
specifying recursive commit lists (is this now a push-down automata?),
it does seem to be possible to shove all of the entry update logic into
a single function.
Sidenote, I switched back to an enum-based DSL, since the addition of a
recursive region opcode breaks the consistency of what needs to be
passed to the DSL callback functions. It's much simpler to handle each
opcode explicitly inside a recursive lfs_commit_region function.
2018-03-27 22:57:07 +00:00
|
|
|
LFS_ASSERT(off + size <= lfs->cfg->block_size);
|
2017-04-30 16:19:37 +00:00
|
|
|
|
2017-04-22 18:30:40 +00:00
|
|
|
while (size > 0) {
|
2018-08-04 19:48:27 +00:00
|
|
|
if (block == pcache->block &&
|
|
|
|
off >= pcache->off &&
|
|
|
|
off < pcache->off + lfs->cfg->cache_size) {
|
|
|
|
// already fits in pcache?
|
2017-04-22 18:30:40 +00:00
|
|
|
lfs_size_t diff = lfs_min(size,
|
2018-08-04 19:48:27 +00:00
|
|
|
lfs->cfg->cache_size - (off-pcache->off));
|
2017-06-24 05:43:05 +00:00
|
|
|
memcpy(&pcache->buffer[off-pcache->off], data, diff);
|
2017-04-22 18:30:40 +00:00
|
|
|
|
|
|
|
data += diff;
|
|
|
|
off += diff;
|
|
|
|
size -= diff;
|
2017-04-30 16:19:37 +00:00
|
|
|
|
2019-05-31 05:58:48 +00:00
|
|
|
pcache->size = lfs_max(pcache->size, off - pcache->off);
|
2018-08-04 19:48:27 +00:00
|
|
|
if (pcache->size == lfs->cfg->cache_size) {
|
2017-06-24 05:43:05 +00:00
|
|
|
// eagerly flush out pcache if we fill up
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
int err = lfs_bd_flush(lfs, pcache, rcache, validate);
|
2017-04-30 16:19:37 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-04-22 18:30:40 +00:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2017-06-24 05:43:05 +00:00
|
|
|
// pcache must have been flushed, either by programming and
|
|
|
|
// entire block or manually flushing the pcache
|
2019-08-03 14:17:47 +00:00
|
|
|
LFS_ASSERT(pcache->block == LFS_BLOCK_NULL);
|
2017-04-22 18:30:40 +00:00
|
|
|
|
2017-06-24 05:43:05 +00:00
|
|
|
// prepare pcache, first condition can no longer fail
|
|
|
|
pcache->block = block;
|
2018-08-04 19:48:27 +00:00
|
|
|
pcache->off = lfs_aligndown(off, lfs->cfg->prog_size);
|
|
|
|
pcache->size = 0;
|
2017-04-22 18:30:40 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
2017-03-25 21:20:31 +00:00
|
|
|
}
|
|
|
|
|
2017-06-24 05:43:05 +00:00
|
|
|
static int lfs_bd_erase(lfs_t *lfs, lfs_block_t block) {
|
2018-03-17 15:28:14 +00:00
|
|
|
LFS_ASSERT(block < lfs->cfg->block_count);
|
2019-07-16 20:55:29 +00:00
|
|
|
int err = lfs->cfg->erase(lfs->cfg, block);
|
|
|
|
LFS_ASSERT(err <= 0);
|
|
|
|
return err;
|
2017-03-25 21:20:31 +00:00
|
|
|
}
|
2017-02-27 00:05:27 +00:00
|
|
|
|
2017-03-25 21:20:31 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
/// Small type-level utilities ///
|
|
|
|
// operations on block pairs
|
2018-08-05 04:57:43 +00:00
|
|
|
static inline void lfs_pair_swap(lfs_block_t pair[2]) {
|
2017-03-25 21:20:31 +00:00
|
|
|
lfs_block_t t = pair[0];
|
|
|
|
pair[0] = pair[1];
|
|
|
|
pair[1] = t;
|
2017-03-12 20:11:52 +00:00
|
|
|
}
|
|
|
|
|
2018-08-05 04:57:43 +00:00
|
|
|
static inline bool lfs_pair_isnull(const lfs_block_t pair[2]) {
|
2019-08-03 14:17:47 +00:00
|
|
|
return pair[0] == LFS_BLOCK_NULL || pair[1] == LFS_BLOCK_NULL;
|
2017-04-18 03:27:06 +00:00
|
|
|
}
|
|
|
|
|
2018-08-05 04:57:43 +00:00
|
|
|
static inline int lfs_pair_cmp(
|
2017-04-01 15:44:17 +00:00
|
|
|
const lfs_block_t paira[2],
|
|
|
|
const lfs_block_t pairb[2]) {
|
2017-04-29 17:41:53 +00:00
|
|
|
return !(paira[0] == pairb[0] || paira[1] == pairb[1] ||
|
|
|
|
paira[0] == pairb[1] || paira[1] == pairb[0]);
|
2017-04-01 15:44:17 +00:00
|
|
|
}
|
|
|
|
|
2018-08-05 04:57:43 +00:00
|
|
|
static inline bool lfs_pair_sync(
|
2017-05-14 17:01:45 +00:00
|
|
|
const lfs_block_t paira[2],
|
|
|
|
const lfs_block_t pairb[2]) {
|
|
|
|
return (paira[0] == pairb[0] && paira[1] == pairb[1]) ||
|
|
|
|
(paira[0] == pairb[1] && paira[1] == pairb[0]);
|
|
|
|
}
|
|
|
|
|
2018-08-05 04:57:43 +00:00
|
|
|
static inline void lfs_pair_fromle32(lfs_block_t pair[2]) {
|
2018-08-01 23:10:24 +00:00
|
|
|
pair[0] = lfs_fromle32(pair[0]);
|
|
|
|
pair[1] = lfs_fromle32(pair[1]);
|
|
|
|
}
|
|
|
|
|
2018-08-05 04:57:43 +00:00
|
|
|
static inline void lfs_pair_tole32(lfs_block_t pair[2]) {
|
2018-08-01 23:10:24 +00:00
|
|
|
pair[0] = lfs_tole32(pair[0]);
|
|
|
|
pair[1] = lfs_tole32(pair[1]);
|
|
|
|
}
|
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
// operations on 32-bit entry tags
|
|
|
|
typedef uint32_t lfs_tag_t;
|
|
|
|
typedef int32_t lfs_stag_t;
|
Added root entry and expanding superblocks
Expanding superblocks has been on my wishlist for a while. The basic
idea is that instead of maintaining a fixed offset blocks {0, 1} to the
the root directory (1 pointer), we maintain a dynamically sized
linked-list of superblocks that point to the actual root. If the number
of writes to the root exceeds some value, we increase the size of the
superblock linked-list.
This can leverage existing metadata-pair operations. The revision count for
metadata-pairs provides some knowledge on how much wear we've put on the
superblock, and the threaded linked-list can also be reused for this
purpose. This means superblock expansion is both optional and cheap to
implement.
Expanding superblocks helps both extremely small and extremely large filesystem
(extreme being relative of course). On the small end, we can actually
collapse the superblock into the root directory and drop the hard requirement
of 4-blocks for the superblock. On the large end, our superblock will
now last longer than the rest of the filesystem. Each time we expand,
the number of cycles until the superblock dies is increased by a power.
Before we were stuck with this layout:
level cycles limit layout
1 E^2 390 MiB s0 -> root
Now we expand every time a fixed offset is exceeded:
level cycles limit layout
0 E 4 KiB s0+root
1 E^2 390 MiB s0 -> root
2 E^3 37 TiB s0 -> s1 -> root
3 E^4 3.6 EiB s0 -> s1 -> s2 -> root
...
Where the cycles are the number of cycles before death, and the limit is
the worst-case size a filesystem where early superblock death becomes a
concern (all writes to root using this formula: E^|s| = E*B, E = erase
cycles = 100000, B = block count, assuming 4096 byte blocks).
Note we can also store copies of the superblock entry on the expanded
superblocks. This may help filesystem recover tools in the future.
2018-08-06 18:30:51 +00:00
|
|
|
|
2018-07-13 00:07:56 +00:00
|
|
|
#define LFS_MKTAG(type, id, size) \
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
(((lfs_tag_t)(type) << 20) | ((lfs_tag_t)(id) << 10) | (lfs_tag_t)(size))
|
2018-05-19 23:25:47 +00:00
|
|
|
|
2020-01-20 23:35:45 +00:00
|
|
|
#define LFS_MKTAG_IF(cond, type, id, size) \
|
|
|
|
((cond) ? LFS_MKTAG(type, id, size) : LFS_MKTAG(LFS_FROM_NOOP, 0, 0))
|
|
|
|
|
|
|
|
#define LFS_MKTAG_IF_ELSE(cond, type1, id1, size1, type2, id2, size2) \
|
2020-02-12 17:31:34 +00:00
|
|
|
((cond) ? LFS_MKTAG(type1, id1, size1) : LFS_MKTAG(type2, id2, size2))
|
2020-01-20 23:35:45 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
static inline bool lfs_tag_isvalid(lfs_tag_t tag) {
|
2018-05-19 23:25:47 +00:00
|
|
|
return !(tag & 0x80000000);
|
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
static inline bool lfs_tag_isdelete(lfs_tag_t tag) {
|
|
|
|
return ((int32_t)(tag << 22) >> 22) == -1;
|
2018-05-19 23:25:47 +00:00
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
static inline uint16_t lfs_tag_type1(lfs_tag_t tag) {
|
|
|
|
return (tag & 0x70000000) >> 20;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline uint16_t lfs_tag_type3(lfs_tag_t tag) {
|
|
|
|
return (tag & 0x7ff00000) >> 20;
|
Added support for deleting attributes
littlefs has a mechanism for deleting file entries, but it doesn't have
a mechanism for deleting individual tags. This _is_ sufficient for a
filesystem, but limits our flexibility. Deleting attributes would be
useful in the custom attribute API and for future improvements (hint the
child pointers in B-trees).
However, deleteing attributes is tricky. We can't just omit the
attribute, since we can only add new tags. Additionally, we need a way
to track what attributes have been deleted during compaction, which
currently relies on writing out attributes to disk.
The solution here is pretty nifty. First we have to come up with a way
to represent a "deleted" attribute. Rather than adding an additional
bit to the already squished tag structure, we use a -1 length field,
specifically 0xfff. Now we can commit a delete attribute, and this
deleted tag acts as a place holder during compacts.
However our delete tag will never leave our metadata log. We need some
way to discard our delete tag if we know it's the only representation of
that tag on the metadata log. Ah! We know it's the only tag if it's in
the first commit on the metadata log. So we add an additional bit to the
CRC entry to indicate if we're on the first commit, and use that to
decide if we need to keep delete tags around.
Now we have working tag deletion.
Interestingly enough, tag deletion is actually indirectly more efficient
than entry deletion, since compacting entries requires multiple passes,
whereas tag deletion gets cleaned up lazily. However we can't adopt the
same strategy in entry deletion because of the compact ordering of
entries. Tag deletion works because tag types are unique and static.
Managing entry deletion in this manner would require static id
allocation, which would cause problems when creating files, running out
of space, and disallow arbitrary insertions of files.
2018-09-09 22:48:11 +00:00
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
static inline uint8_t lfs_tag_chunk(lfs_tag_t tag) {
|
|
|
|
return (tag & 0x0ff00000) >> 20;
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
static inline int8_t lfs_tag_splice(lfs_tag_t tag) {
|
|
|
|
return (int8_t)lfs_tag_chunk(tag);
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
static inline uint16_t lfs_tag_id(lfs_tag_t tag) {
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
return (tag & 0x000ffc00) >> 10;
|
2018-05-19 23:25:47 +00:00
|
|
|
}
|
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
static inline lfs_size_t lfs_tag_size(lfs_tag_t tag) {
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
return tag & 0x000003ff;
|
2018-05-19 23:25:47 +00:00
|
|
|
}
|
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
static inline lfs_size_t lfs_tag_dsize(lfs_tag_t tag) {
|
2018-09-09 23:48:18 +00:00
|
|
|
return sizeof(tag) + lfs_tag_size(tag + lfs_tag_isdelete(tag));
|
Added support for deleting attributes
littlefs has a mechanism for deleting file entries, but it doesn't have
a mechanism for deleting individual tags. This _is_ sufficient for a
filesystem, but limits our flexibility. Deleting attributes would be
useful in the custom attribute API and for future improvements (hint the
child pointers in B-trees).
However, deleteing attributes is tricky. We can't just omit the
attribute, since we can only add new tags. Additionally, we need a way
to track what attributes have been deleted during compaction, which
currently relies on writing out attributes to disk.
The solution here is pretty nifty. First we have to come up with a way
to represent a "deleted" attribute. Rather than adding an additional
bit to the already squished tag structure, we use a -1 length field,
specifically 0xfff. Now we can commit a delete attribute, and this
deleted tag acts as a place holder during compacts.
However our delete tag will never leave our metadata log. We need some
way to discard our delete tag if we know it's the only representation of
that tag on the metadata log. Ah! We know it's the only tag if it's in
the first commit on the metadata log. So we add an additional bit to the
CRC entry to indicate if we're on the first commit, and use that to
decide if we need to keep delete tags around.
Now we have working tag deletion.
Interestingly enough, tag deletion is actually indirectly more efficient
than entry deletion, since compacting entries requires multiple passes,
whereas tag deletion gets cleaned up lazily. However we can't adopt the
same strategy in entry deletion because of the compact ordering of
entries. Tag deletion works because tag types are unique and static.
Managing entry deletion in this manner would require static id
allocation, which would cause problems when creating files, running out
of space, and disallow arbitrary insertions of files.
2018-09-09 22:48:11 +00:00
|
|
|
}
|
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
// operations on attributes in attribute lists
|
|
|
|
struct lfs_mattr {
|
|
|
|
lfs_tag_t tag;
|
|
|
|
const void *buffer;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct lfs_diskoff {
|
|
|
|
lfs_block_t block;
|
|
|
|
lfs_off_t off;
|
|
|
|
};
|
|
|
|
|
2019-01-08 14:52:03 +00:00
|
|
|
#define LFS_MKATTRS(...) \
|
|
|
|
(struct lfs_mattr[]){__VA_ARGS__}, \
|
|
|
|
sizeof((struct lfs_mattr[]){__VA_ARGS__}) / sizeof(struct lfs_mattr)
|
|
|
|
|
2019-01-04 23:23:36 +00:00
|
|
|
// operations on global state
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
static inline void lfs_gstate_xor(lfs_gstate_t *a, const lfs_gstate_t *b) {
|
2019-01-04 23:23:36 +00:00
|
|
|
for (int i = 0; i < 3; i++) {
|
|
|
|
((uint32_t*)a)[i] ^= ((const uint32_t*)b)[i];
|
2018-07-31 13:07:36 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
static inline bool lfs_gstate_iszero(const lfs_gstate_t *a) {
|
2019-01-04 23:23:36 +00:00
|
|
|
for (int i = 0; i < 3; i++) {
|
|
|
|
if (((uint32_t*)a)[i] != 0) {
|
2018-07-31 13:07:36 +00:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return true;
|
2018-07-02 03:29:42 +00:00
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
static inline bool lfs_gstate_hasorphans(const lfs_gstate_t *a) {
|
2019-01-04 23:23:36 +00:00
|
|
|
return lfs_tag_size(a->tag);
|
2018-07-31 13:07:36 +00:00
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
static inline uint8_t lfs_gstate_getorphans(const lfs_gstate_t *a) {
|
2019-01-04 23:23:36 +00:00
|
|
|
return lfs_tag_size(a->tag);
|
2018-08-01 23:10:24 +00:00
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
static inline bool lfs_gstate_hasmove(const lfs_gstate_t *a) {
|
2019-01-04 23:23:36 +00:00
|
|
|
return lfs_tag_type1(a->tag);
|
2018-07-31 13:07:36 +00:00
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
static inline bool lfs_gstate_hasmovehere(const lfs_gstate_t *a,
|
2019-01-04 23:23:36 +00:00
|
|
|
const lfs_block_t *pair) {
|
|
|
|
return lfs_tag_type1(a->tag) && lfs_pair_cmp(a->pair, pair) == 0;
|
2018-07-31 13:07:36 +00:00
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
static inline void lfs_gstate_fromle32(lfs_gstate_t *a) {
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
a->tag = lfs_fromle32(a->tag);
|
|
|
|
a->pair[0] = lfs_fromle32(a->pair[0]);
|
|
|
|
a->pair[1] = lfs_fromle32(a->pair[1]);
|
2019-01-04 23:23:36 +00:00
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
static inline void lfs_gstate_tole32(lfs_gstate_t *a) {
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
a->tag = lfs_tole32(a->tag);
|
|
|
|
a->pair[0] = lfs_tole32(a->pair[0]);
|
|
|
|
a->pair[1] = lfs_tole32(a->pair[1]);
|
2018-07-02 03:29:42 +00:00
|
|
|
}
|
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
// other endianness operations
|
|
|
|
static void lfs_ctz_fromle32(struct lfs_ctz *ctz) {
|
|
|
|
ctz->head = lfs_fromle32(ctz->head);
|
|
|
|
ctz->size = lfs_fromle32(ctz->size);
|
|
|
|
}
|
2018-07-02 03:29:42 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
static void lfs_ctz_tole32(struct lfs_ctz *ctz) {
|
|
|
|
ctz->head = lfs_tole32(ctz->head);
|
|
|
|
ctz->size = lfs_tole32(ctz->size);
|
|
|
|
}
|
2018-05-19 23:25:47 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
static inline void lfs_superblock_fromle32(lfs_superblock_t *superblock) {
|
|
|
|
superblock->version = lfs_fromle32(superblock->version);
|
|
|
|
superblock->block_size = lfs_fromle32(superblock->block_size);
|
|
|
|
superblock->block_count = lfs_fromle32(superblock->block_count);
|
2018-10-02 23:28:37 +00:00
|
|
|
superblock->name_max = lfs_fromle32(superblock->name_max);
|
2018-10-21 02:02:25 +00:00
|
|
|
superblock->file_max = lfs_fromle32(superblock->file_max);
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
superblock->attr_max = lfs_fromle32(superblock->attr_max);
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
2018-05-19 23:25:47 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
static inline void lfs_superblock_tole32(lfs_superblock_t *superblock) {
|
|
|
|
superblock->version = lfs_tole32(superblock->version);
|
|
|
|
superblock->block_size = lfs_tole32(superblock->block_size);
|
|
|
|
superblock->block_count = lfs_tole32(superblock->block_count);
|
2018-10-02 23:28:37 +00:00
|
|
|
superblock->name_max = lfs_tole32(superblock->name_max);
|
2018-10-21 02:02:25 +00:00
|
|
|
superblock->file_max = lfs_tole32(superblock->file_max);
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
superblock->attr_max = lfs_tole32(superblock->attr_max);
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
2018-07-13 01:43:55 +00:00
|
|
|
|
2018-09-09 14:01:06 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
/// Internal operations predeclared here ///
|
2018-09-12 06:34:03 +00:00
|
|
|
static int lfs_dir_commit(lfs_t *lfs, lfs_mdir_t *dir,
|
2019-01-08 14:52:03 +00:00
|
|
|
const struct lfs_mattr *attrs, int attrcount);
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
static int lfs_dir_compact(lfs_t *lfs,
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_mdir_t *dir, const struct lfs_mattr *attrs, int attrcount,
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
lfs_mdir_t *source, uint16_t begin, uint16_t end);
|
2019-07-09 22:51:15 +00:00
|
|
|
static int lfs_file_outline(lfs_t *lfs, lfs_file_t *file);
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
static int lfs_file_flush(lfs_t *lfs, lfs_file_t *file);
|
|
|
|
static void lfs_fs_preporphans(lfs_t *lfs, int8_t orphans);
|
|
|
|
static void lfs_fs_prepmove(lfs_t *lfs,
|
|
|
|
uint16_t id, const lfs_block_t pair[2]);
|
2018-09-11 03:07:59 +00:00
|
|
|
static int lfs_fs_pred(lfs_t *lfs, const lfs_block_t dir[2],
|
|
|
|
lfs_mdir_t *pdir);
|
|
|
|
static lfs_stag_t lfs_fs_parent(lfs_t *lfs, const lfs_block_t dir[2],
|
|
|
|
lfs_mdir_t *parent);
|
|
|
|
static int lfs_fs_relocate(lfs_t *lfs,
|
|
|
|
const lfs_block_t oldpair[2], lfs_block_t newpair[2]);
|
2020-02-09 15:05:37 +00:00
|
|
|
int lfs_fs_traverseraw(lfs_t *lfs,
|
|
|
|
int (*cb)(void *data, lfs_block_t block), void *data,
|
|
|
|
bool includeorphans);
|
2018-09-11 03:07:59 +00:00
|
|
|
static int lfs_fs_forceconsistency(lfs_t *lfs);
|
|
|
|
static int lfs_deinit(lfs_t *lfs);
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
#ifdef LFS_MIGRATE
|
|
|
|
static int lfs1_traverse(lfs_t *lfs,
|
|
|
|
int (*cb)(void*, lfs_block_t), void *data);
|
|
|
|
#endif
|
2018-07-13 20:04:31 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
/// Block allocator ///
|
|
|
|
static int lfs_alloc_lookahead(void *p, lfs_block_t block) {
|
|
|
|
lfs_t *lfs = (lfs_t*)p;
|
|
|
|
lfs_block_t off = ((block - lfs->free.off)
|
|
|
|
+ lfs->cfg->block_count) % lfs->cfg->block_count;
|
2018-09-09 23:48:18 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
if (off < lfs->free.size) {
|
|
|
|
lfs->free.buffer[off / 32] |= 1U << (off % 32);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs_alloc(lfs_t *lfs, lfs_block_t *block) {
|
|
|
|
while (true) {
|
|
|
|
while (lfs->free.i != lfs->free.size) {
|
|
|
|
lfs_block_t off = lfs->free.i;
|
|
|
|
lfs->free.i += 1;
|
|
|
|
lfs->free.ack -= 1;
|
|
|
|
|
|
|
|
if (!(lfs->free.buffer[off / 32] & (1U << (off % 32)))) {
|
|
|
|
// found a free block
|
|
|
|
*block = (lfs->free.off + off) % lfs->cfg->block_count;
|
|
|
|
|
|
|
|
// eagerly find next off so an alloc ack can
|
|
|
|
// discredit old lookahead blocks
|
|
|
|
while (lfs->free.i != lfs->free.size &&
|
|
|
|
(lfs->free.buffer[lfs->free.i / 32]
|
|
|
|
& (1U << (lfs->free.i % 32)))) {
|
|
|
|
lfs->free.i += 1;
|
|
|
|
lfs->free.ack -= 1;
|
2018-07-13 20:04:31 +00:00
|
|
|
}
|
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
return 0;
|
2018-07-13 20:04:31 +00:00
|
|
|
}
|
2018-09-09 14:01:06 +00:00
|
|
|
}
|
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
// check if we have looked at all blocks since last ack
|
|
|
|
if (lfs->free.ack == 0) {
|
2019-07-16 20:40:26 +00:00
|
|
|
LFS_ERROR("No more free space %"PRIu32,
|
2018-09-11 03:07:59 +00:00
|
|
|
lfs->free.i + lfs->free.off);
|
|
|
|
return LFS_ERR_NOSPC;
|
2018-07-13 20:04:31 +00:00
|
|
|
}
|
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
lfs->free.off = (lfs->free.off + lfs->free.size)
|
|
|
|
% lfs->cfg->block_count;
|
2018-10-03 01:18:30 +00:00
|
|
|
lfs->free.size = lfs_min(8*lfs->cfg->lookahead_size, lfs->free.ack);
|
2018-09-11 03:07:59 +00:00
|
|
|
lfs->free.i = 0;
|
|
|
|
|
|
|
|
// find mask of free blocks from tree
|
2018-10-03 01:18:30 +00:00
|
|
|
memset(lfs->free.buffer, 0, lfs->cfg->lookahead_size);
|
2020-02-09 15:05:37 +00:00
|
|
|
int err = lfs_fs_traverseraw(lfs, lfs_alloc_lookahead, lfs, true);
|
2018-07-13 20:04:31 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
2018-07-13 20:04:31 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
static void lfs_alloc_ack(lfs_t *lfs) {
|
|
|
|
lfs->free.ack = lfs->cfg->block_count;
|
2018-07-13 20:04:31 +00:00
|
|
|
}
|
|
|
|
|
Modified lfs_dir_compact to avoid redundant erases during split
The commit machine in littlefs has three stages: commit, compact, and
then split. First we try to append our commit to the metadata log, if
that fails we try to compact the metadata log to remove duplicates and make
room for the commit, if that still fails we split the metadata into two
metadata-pairs and try again. Each stage is less efficient but also less
frequent.
However, in the case that we're filling up a directory with new files,
such as the bootstrap process in setting up a new system, we must pass
through all three stages rather quickly in order to get enough
metadata-pairs to hold all of our files. This means we'll compact,
split, and then need to compact again. This creates more erases than is
needed in the optimal case, which can be a big cost on disks with an
expensive erase operation.
In theory, we can actually avoid this redundant erase by reusing the
data we wrote out in the first attempt to compact. In practice, this
trick is very complicated to pull off.
1. We may need to cache a half-completed program while we write out the
new metadata-pair. We need to write out the second pair first in
order to get our new tail before we complete our first metadata-pair.
This requires two pcaches, which we don't have
The solution here is to just drop our cache and reconstruct what if
would have been. This needs to be perfect down to the byte level
because we don't have knowledge of where our cache lines are.
2. We may have written out entries that are then moved to the new
metadata-pair.
The solution here isn't pretty but it works, we just add a delete
tag for any entry that was moved over.
In the end the solution ends up a bit hacky, with different layers poked
through the commit logic in order to manage writes at the byte level
from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
/// Metadata pair and directory operations ///
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
static lfs_stag_t lfs_dir_getslice(lfs_t *lfs, const lfs_mdir_t *dir,
|
|
|
|
lfs_tag_t gmask, lfs_tag_t gtag,
|
|
|
|
lfs_off_t goff, void *gbuffer, lfs_size_t gsize) {
|
2018-09-11 03:07:59 +00:00
|
|
|
lfs_off_t off = dir->off;
|
|
|
|
lfs_tag_t ntag = dir->etag;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_stag_t gdiff = 0;
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
if (lfs_gstate_hasmovehere(&lfs->gdisk, dir->pair) &&
|
|
|
|
lfs_tag_id(gmask) != 0 &&
|
|
|
|
lfs_tag_id(lfs->gdisk.tag) <= lfs_tag_id(gtag)) {
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// synthetic moves
|
|
|
|
gdiff -= LFS_MKTAG(0, 1, 0);
|
|
|
|
}
|
Modified lfs_dir_compact to avoid redundant erases during split
The commit machine in littlefs has three stages: commit, compact, and
then split. First we try to append our commit to the metadata log, if
that fails we try to compact the metadata log to remove duplicates and make
room for the commit, if that still fails we split the metadata into two
metadata-pairs and try again. Each stage is less efficient but also less
frequent.
However, in the case that we're filling up a directory with new files,
such as the bootstrap process in setting up a new system, we must pass
through all three stages rather quickly in order to get enough
metadata-pairs to hold all of our files. This means we'll compact,
split, and then need to compact again. This creates more erases than is
needed in the optimal case, which can be a big cost on disks with an
expensive erase operation.
In theory, we can actually avoid this redundant erase by reusing the
data we wrote out in the first attempt to compact. In practice, this
trick is very complicated to pull off.
1. We may need to cache a half-completed program while we write out the
new metadata-pair. We need to write out the second pair first in
order to get our new tail before we complete our first metadata-pair.
This requires two pcaches, which we don't have
The solution here is to just drop our cache and reconstruct what if
would have been. This needs to be perfect down to the byte level
because we don't have knowledge of where our cache lines are.
2. We may have written out entries that are then moved to the new
metadata-pair.
The solution here isn't pretty but it works, we just add a delete
tag for any entry that was moved over.
In the end the solution ends up a bit hacky, with different layers poked
through the commit logic in order to manage writes at the byte level
from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
// iterate over dir block backwards (for faster lookups)
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
while (off >= sizeof(lfs_tag_t) + lfs_tag_dsize(ntag)) {
|
|
|
|
off -= lfs_tag_dsize(ntag);
|
|
|
|
lfs_tag_t tag = ntag;
|
|
|
|
int err = lfs_bd_read(lfs,
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
NULL, &lfs->rcache, sizeof(ntag),
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
dir->pair[0], off, &ntag, sizeof(ntag));
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2018-07-29 20:03:23 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
ntag = (lfs_frombe32(ntag) ^ tag) & 0x7fffffff;
|
2018-05-28 07:08:16 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (lfs_tag_id(gmask) != 0 &&
|
|
|
|
lfs_tag_type1(tag) == LFS_TYPE_SPLICE &&
|
|
|
|
lfs_tag_id(tag) <= lfs_tag_id(gtag - gdiff)) {
|
|
|
|
if (tag == (LFS_MKTAG(LFS_TYPE_CREATE, 0, 0) |
|
|
|
|
(LFS_MKTAG(0, 0x3ff, 0) & (gtag - gdiff)))) {
|
|
|
|
// found where we were created
|
|
|
|
return LFS_ERR_NOENT;
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
2018-05-28 07:08:16 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// move around splices
|
|
|
|
gdiff += LFS_MKTAG(0, lfs_tag_splice(tag), 0);
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
2018-05-19 23:25:47 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if ((gmask & tag) == (gmask & (gtag - gdiff))) {
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
if (lfs_tag_isdelete(tag)) {
|
|
|
|
return LFS_ERR_NOENT;
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
lfs_size_t diff = lfs_min(lfs_tag_size(tag), gsize);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
err = lfs_bd_read(lfs,
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
NULL, &lfs->rcache, diff,
|
|
|
|
dir->pair[0], off+sizeof(tag)+goff, gbuffer, diff);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
}
|
|
|
|
|
2019-01-22 22:21:16 +00:00
|
|
|
memset((uint8_t*)gbuffer + diff, 0, gsize - diff);
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
return tag + gdiff;
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
return LFS_ERR_NOENT;
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
}
|
|
|
|
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
static lfs_stag_t lfs_dir_get(lfs_t *lfs, const lfs_mdir_t *dir,
|
|
|
|
lfs_tag_t gmask, lfs_tag_t gtag, void *buffer) {
|
|
|
|
return lfs_dir_getslice(lfs, dir,
|
|
|
|
gmask, gtag,
|
|
|
|
0, buffer, lfs_tag_size(gtag));
|
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs_dir_getread(lfs_t *lfs, const lfs_mdir_t *dir,
|
|
|
|
const lfs_cache_t *pcache, lfs_cache_t *rcache, lfs_size_t hint,
|
|
|
|
lfs_tag_t gmask, lfs_tag_t gtag,
|
|
|
|
lfs_off_t off, void *buffer, lfs_size_t size) {
|
|
|
|
uint8_t *data = buffer;
|
|
|
|
if (off+size > lfs->cfg->block_size) {
|
|
|
|
return LFS_ERR_CORRUPT;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (size > 0) {
|
|
|
|
lfs_size_t diff = size;
|
|
|
|
|
2019-08-03 14:17:47 +00:00
|
|
|
if (pcache && pcache->block == LFS_BLOCK_INLINE &&
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
off < pcache->off + pcache->size) {
|
|
|
|
if (off >= pcache->off) {
|
|
|
|
// is already in pcache?
|
|
|
|
diff = lfs_min(diff, pcache->size - (off-pcache->off));
|
|
|
|
memcpy(data, &pcache->buffer[off-pcache->off], diff);
|
|
|
|
|
|
|
|
data += diff;
|
|
|
|
off += diff;
|
|
|
|
size -= diff;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
// pcache takes priority
|
|
|
|
diff = lfs_min(diff, pcache->off-off);
|
|
|
|
}
|
|
|
|
|
2019-08-03 14:17:47 +00:00
|
|
|
if (rcache->block == LFS_BLOCK_INLINE &&
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
off < rcache->off + rcache->size) {
|
|
|
|
if (off >= rcache->off) {
|
|
|
|
// is already in rcache?
|
|
|
|
diff = lfs_min(diff, rcache->size - (off-rcache->off));
|
|
|
|
memcpy(data, &rcache->buffer[off-rcache->off], diff);
|
|
|
|
|
|
|
|
data += diff;
|
|
|
|
off += diff;
|
|
|
|
size -= diff;
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
// rcache takes priority
|
|
|
|
diff = lfs_min(diff, rcache->off-off);
|
|
|
|
}
|
|
|
|
|
|
|
|
// load to cache, first condition can no longer fail
|
2019-08-03 14:17:47 +00:00
|
|
|
rcache->block = LFS_BLOCK_INLINE;
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
rcache->off = lfs_aligndown(off, lfs->cfg->read_size);
|
|
|
|
rcache->size = lfs_min(lfs_alignup(off+hint, lfs->cfg->read_size),
|
|
|
|
lfs->cfg->cache_size);
|
|
|
|
int err = lfs_dir_getslice(lfs, dir, gmask, gtag,
|
|
|
|
rcache->off, rcache->buffer, rcache->size);
|
2019-05-31 05:58:48 +00:00
|
|
|
if (err < 0) {
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
static int lfs_dir_traverse_filter(void *p,
|
|
|
|
lfs_tag_t tag, const void *buffer) {
|
|
|
|
lfs_tag_t *filtertag = p;
|
|
|
|
(void)buffer;
|
|
|
|
|
2019-07-09 22:51:15 +00:00
|
|
|
// which mask depends on unique bit in tag structure
|
|
|
|
uint32_t mask = (tag & LFS_MKTAG(0x100, 0, 0))
|
|
|
|
? LFS_MKTAG(0x7ff, 0x3ff, 0)
|
|
|
|
: LFS_MKTAG(0x700, 0x3ff, 0);
|
|
|
|
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
// check for redundancy
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if ((mask & tag) == (mask & *filtertag) ||
|
2019-07-09 22:51:15 +00:00
|
|
|
lfs_tag_isdelete(*filtertag) ||
|
|
|
|
(LFS_MKTAG(0x7ff, 0x3ff, 0) & tag) == (
|
|
|
|
LFS_MKTAG(LFS_TYPE_DELETE, 0, 0) |
|
|
|
|
(LFS_MKTAG(0, 0x3ff, 0) & *filtertag))) {
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
// check if we need to adjust for created/deleted tags
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (lfs_tag_type1(tag) == LFS_TYPE_SPLICE &&
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
lfs_tag_id(tag) <= lfs_tag_id(*filtertag)) {
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
*filtertag += LFS_MKTAG(0, lfs_tag_splice(tag), 0);
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
static int lfs_dir_traverse(lfs_t *lfs,
|
|
|
|
const lfs_mdir_t *dir, lfs_off_t off, lfs_tag_t ptag,
|
2020-01-20 23:35:45 +00:00
|
|
|
const struct lfs_mattr *attrs, int attrcount,
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_tag_t tmask, lfs_tag_t ttag,
|
|
|
|
uint16_t begin, uint16_t end, int16_t diff,
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
int (*cb)(void *data, lfs_tag_t tag, const void *buffer), void *data) {
|
2019-01-04 23:23:36 +00:00
|
|
|
// iterate over directory and attrs
|
|
|
|
while (true) {
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
lfs_tag_t tag;
|
|
|
|
const void *buffer;
|
|
|
|
struct lfs_diskoff disk;
|
|
|
|
if (off+lfs_tag_dsize(ptag) < dir->off) {
|
|
|
|
off += lfs_tag_dsize(ptag);
|
|
|
|
int err = lfs_bd_read(lfs,
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
NULL, &lfs->rcache, sizeof(tag),
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
dir->pair[0], off, &tag, sizeof(tag));
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
tag = (lfs_frombe32(tag) ^ ptag) | 0x80000000;
|
|
|
|
disk.block = dir->pair[0];
|
|
|
|
disk.off = off+sizeof(lfs_tag_t);
|
|
|
|
buffer = &disk;
|
|
|
|
ptag = tag;
|
2019-01-04 23:23:36 +00:00
|
|
|
} else if (attrcount > 0) {
|
2019-01-08 14:52:03 +00:00
|
|
|
tag = attrs[0].tag;
|
|
|
|
buffer = attrs[0].buffer;
|
|
|
|
attrs += 1;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
attrcount -= 1;
|
2019-01-04 23:23:36 +00:00
|
|
|
} else {
|
|
|
|
return 0;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
lfs_tag_t mask = LFS_MKTAG(0x7ff, 0, 0);
|
|
|
|
if ((mask & tmask & tag) != (mask & tmask & ttag)) {
|
|
|
|
continue;
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// do we need to filter? inlining the filtering logic here allows
|
|
|
|
// for some minor optimizations
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (lfs_tag_id(tmask) != 0) {
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
// scan for duplicates and update tag based on creates/deletes
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
int filter = lfs_dir_traverse(lfs,
|
2020-01-20 23:35:45 +00:00
|
|
|
dir, off, ptag, attrs, attrcount,
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
0, 0, 0, 0, 0,
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
lfs_dir_traverse_filter, &tag);
|
|
|
|
if (filter < 0) {
|
|
|
|
return filter;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (filter) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
// in filter range?
|
|
|
|
if (!(lfs_tag_id(tag) >= begin && lfs_tag_id(tag) < end)) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// handle special cases for mcu-side operations
|
2019-01-08 14:52:03 +00:00
|
|
|
if (lfs_tag_type3(tag) == LFS_FROM_NOOP) {
|
|
|
|
// do nothing
|
|
|
|
} else if (lfs_tag_type3(tag) == LFS_FROM_MOVE) {
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
uint16_t fromid = lfs_tag_size(tag);
|
|
|
|
uint16_t toid = lfs_tag_id(tag);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
int err = lfs_dir_traverse(lfs,
|
2020-02-10 04:43:20 +00:00
|
|
|
buffer, 0, 0xffffffff, NULL, 0,
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
LFS_MKTAG(0x600, 0x3ff, 0),
|
|
|
|
LFS_MKTAG(LFS_TYPE_STRUCT, 0, 0),
|
|
|
|
fromid, fromid+1, toid-fromid+diff,
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
cb, data);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
} else if (lfs_tag_type3(tag) == LFS_FROM_USERATTRS) {
|
2019-01-08 14:52:03 +00:00
|
|
|
for (unsigned i = 0; i < lfs_tag_size(tag); i++) {
|
|
|
|
const struct lfs_attr *a = buffer;
|
|
|
|
int err = cb(data, LFS_MKTAG(LFS_TYPE_USERATTR + a[i].type,
|
|
|
|
lfs_tag_id(tag) + diff, a[i].size), a[i].buffer);
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
} else {
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
int err = cb(data, tag + LFS_MKTAG(0, diff, 0), buffer);
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
|
|
|
}
|
2018-05-19 23:25:47 +00:00
|
|
|
}
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static lfs_stag_t lfs_dir_fetchmatch(lfs_t *lfs,
|
2018-09-12 06:34:03 +00:00
|
|
|
lfs_mdir_t *dir, const lfs_block_t pair[2],
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_tag_t fmask, lfs_tag_t ftag, uint16_t *id,
|
2018-09-11 03:07:59 +00:00
|
|
|
int (*cb)(void *data, lfs_tag_t tag, const void *buffer), void *data) {
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// we can find tag very efficiently during a fetch, since we're already
|
|
|
|
// scanning the entire directory
|
|
|
|
lfs_stag_t besttag = -1;
|
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
// find the block with the most recent revision
|
2019-02-12 06:01:28 +00:00
|
|
|
uint32_t revs[2] = {0, 0};
|
2018-09-11 03:07:59 +00:00
|
|
|
int r = 0;
|
|
|
|
for (int i = 0; i < 2; i++) {
|
|
|
|
int err = lfs_bd_read(lfs,
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
NULL, &lfs->rcache, sizeof(revs[i]),
|
2018-09-11 03:07:59 +00:00
|
|
|
pair[i], 0, &revs[i], sizeof(revs[i]));
|
|
|
|
revs[i] = lfs_fromle32(revs[i]);
|
|
|
|
if (err && err != LFS_ERR_CORRUPT) {
|
2018-05-19 23:25:47 +00:00
|
|
|
return err;
|
|
|
|
}
|
2018-09-11 03:07:59 +00:00
|
|
|
|
2019-02-12 06:01:28 +00:00
|
|
|
if (err != LFS_ERR_CORRUPT &&
|
|
|
|
lfs_scmp(revs[i], revs[(i+1)%2]) > 0) {
|
2018-09-11 03:07:59 +00:00
|
|
|
r = i;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
dir->pair[0] = pair[(r+0)%2];
|
|
|
|
dir->pair[1] = pair[(r+1)%2];
|
|
|
|
dir->rev = revs[(r+0)%2];
|
|
|
|
dir->off = 0; // nonzero = found some commits
|
2018-09-11 03:07:59 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// now scan tags to fetch the actual dir and find possible match
|
2018-09-11 03:07:59 +00:00
|
|
|
for (int i = 0; i < 2; i++) {
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_off_t off = 0;
|
2020-02-10 04:43:20 +00:00
|
|
|
lfs_tag_t ptag = 0xffffffff;
|
2018-09-11 03:07:59 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
uint16_t tempcount = 0;
|
2019-08-03 14:17:47 +00:00
|
|
|
lfs_block_t temptail[2] = {LFS_BLOCK_NULL, LFS_BLOCK_NULL};
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
bool tempsplit = false;
|
|
|
|
lfs_stag_t tempbesttag = besttag;
|
2018-09-11 03:07:59 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
dir->rev = lfs_tole32(dir->rev);
|
2020-02-10 04:43:20 +00:00
|
|
|
uint32_t crc = lfs_crc(0xffffffff, &dir->rev, sizeof(dir->rev));
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
dir->rev = lfs_fromle32(dir->rev);
|
2018-09-11 03:07:59 +00:00
|
|
|
|
|
|
|
while (true) {
|
|
|
|
// extract next tag
|
|
|
|
lfs_tag_t tag;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
off += lfs_tag_dsize(ptag);
|
2018-09-11 03:07:59 +00:00
|
|
|
int err = lfs_bd_read(lfs,
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
NULL, &lfs->rcache, lfs->cfg->block_size,
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
dir->pair[0], off, &tag, sizeof(tag));
|
2018-05-19 23:25:47 +00:00
|
|
|
if (err) {
|
2018-09-11 03:07:59 +00:00
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
// can't continue?
|
|
|
|
dir->erased = false;
|
|
|
|
break;
|
|
|
|
}
|
2018-05-19 23:25:47 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
crc = lfs_crc(crc, &tag, sizeof(tag));
|
Tweaked tag endianness to catch power-loss after <1 word is written
There was an interesting subtlety with the existing layout of tags that
could become a problem in the future. Basically, littlefs avoids writing to
any region of storage it is not absolutely sure has been erased
beforehand. This is a part of limiting the number of assumptions about
storage. It's possible a storage technology can't support writes without
erases in a way that is undetectable at write time (Maybe changing a bit
without an erase decreases the longevity of the information stored on
the bit).
But the existing layout had a very tiny corner case where this wasn't
true. Consider the location of the valid bit in the tag struct:
[1|--- 31 ---]
^--- valid bit
The responsibility of this bit is to indicate if an attempt has been
made to write the following commit. If it is not set (the specific value
is dependent on a previous read and identified by the preceeding commit),
the assumption is that it is safe to write to the next region because it
has been erased previously. If it is set, we check if the next commit is
valid, if it isn't (because of CRC failure, likely due to power-loss), we
discard the commit. But because an attempt has been made to write to
that storage, we must then do a compaction to move to the other block in
the metadata-pair.
This plan looks good on paper, but what does it look like on storage?
The problem is that words in littlefs are in little-endian. So on
storage the tag actually looks like this:
[- 8 -|- 8 -|- 8 -|1|- 7 -]
^-- valid bit
This means that we don't actually set the valid bit before writing the
tag! We write the lower bytes first. If we lose power, we may have
written 3 bytes without this fact being detectable.
We could restructure the tag structure to store the valid bit lower,
however because none of the fields are 7 bits, this would make the
extraction more costly, and we then lose the ability to check this
valid bit with a sign comparison.
The simple solution is to just store the tag in big-endian. A small
benefit is that this will actually have a negative code cost on
big-endian machines.
This mixture of endiannesses is frustrating, however it is a pragmatic
solution with only a 20-byte code size cost.
2018-10-22 21:42:30 +00:00
|
|
|
tag = lfs_frombe32(tag) ^ ptag;
|
2018-09-11 03:07:59 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// next commit not yet programmed or we're not in valid range
|
2020-02-09 16:02:41 +00:00
|
|
|
if (!lfs_tag_isvalid(tag)) {
|
Added better handling of large program sizes (> 1024)
The issue here is how commits handle padding to the nearest program
size. This is done by exploiting the size field of the LFS_TYPE_CRC
tag that completes the commit. Unfortunately, during developement, the
size field shrank in size to make room for more type information,
limiting the size field to 1024.
Normally this isn't a problem, as very rarely do program sizes exceed
1024 bytes. However, using a simulated block device, user earlephilhower
found that exceeding 1024 caused littlefs to crash.
To make this corner case behave in a more user friendly manner, I've
modified this situtation to treat >1024 program sizes as small commits
that don't match the prog size. As a part of this, littlefs also needed
to understand that non-matching commits indicate an "unerased" dir
block, which would be needed for portability (something which notably
lacks testing).
This raises the question of if the tag size field size needs to be
reconsidered, but to change that at this point would need a new major
version.
found by earlephilhower
2019-04-09 21:06:43 +00:00
|
|
|
dir->erased = (lfs_tag_type1(ptag) == LFS_TYPE_CRC &&
|
2020-02-09 16:02:41 +00:00
|
|
|
dir->off % lfs->cfg->prog_size == 0);
|
|
|
|
break;
|
|
|
|
} else if (off + lfs_tag_dsize(tag) > lfs->cfg->block_size) {
|
|
|
|
dir->erased = false;
|
2018-09-11 03:07:59 +00:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
ptag = tag;
|
2018-09-11 03:07:59 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (lfs_tag_type1(tag) == LFS_TYPE_CRC) {
|
2018-09-11 03:07:59 +00:00
|
|
|
// check the crc attr
|
|
|
|
uint32_t dcrc;
|
|
|
|
err = lfs_bd_read(lfs,
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
NULL, &lfs->rcache, lfs->cfg->block_size,
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
dir->pair[0], off+sizeof(tag), &dcrc, sizeof(dcrc));
|
2018-09-11 03:07:59 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
dir->erased = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
dcrc = lfs_fromle32(dcrc);
|
|
|
|
|
|
|
|
if (crc != dcrc) {
|
|
|
|
dir->erased = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
// reset the next bit if we need to
|
2019-09-09 03:53:50 +00:00
|
|
|
ptag ^= (lfs_tag_t)(lfs_tag_chunk(tag) & 1U) << 31;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
|
|
|
|
// toss our crc into the filesystem seed for
|
|
|
|
// pseudorandom numbers
|
2018-09-11 03:07:59 +00:00
|
|
|
lfs->seed ^= crc;
|
|
|
|
|
|
|
|
// update with what's found so far
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
besttag = tempbesttag;
|
2018-09-11 03:07:59 +00:00
|
|
|
dir->off = off + lfs_tag_dsize(tag);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
dir->etag = ptag;
|
|
|
|
dir->count = tempcount;
|
|
|
|
dir->tail[0] = temptail[0];
|
|
|
|
dir->tail[1] = temptail[1];
|
|
|
|
dir->split = tempsplit;
|
2018-09-11 03:07:59 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// reset crc
|
2020-02-10 04:43:20 +00:00
|
|
|
crc = 0xffffffff;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
continue;
|
|
|
|
}
|
2018-09-11 03:07:59 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// crc the entry first, hopefully leaving it in the cache
|
|
|
|
for (lfs_off_t j = sizeof(tag); j < lfs_tag_dsize(tag); j++) {
|
|
|
|
uint8_t dat;
|
|
|
|
err = lfs_bd_read(lfs,
|
|
|
|
NULL, &lfs->rcache, lfs->cfg->block_size,
|
|
|
|
dir->pair[0], off+j, &dat, 1);
|
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
dir->erased = false;
|
|
|
|
break;
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
return err;
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
crc = lfs_crc(crc, &dat, 1);
|
|
|
|
}
|
2018-09-11 03:07:59 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// directory modification tags?
|
|
|
|
if (lfs_tag_type1(tag) == LFS_TYPE_NAME) {
|
|
|
|
// increase count of files if necessary
|
|
|
|
if (lfs_tag_id(tag) >= tempcount) {
|
|
|
|
tempcount = lfs_tag_id(tag) + 1;
|
|
|
|
}
|
|
|
|
} else if (lfs_tag_type1(tag) == LFS_TYPE_SPLICE) {
|
|
|
|
tempcount += lfs_tag_splice(tag);
|
|
|
|
|
|
|
|
if (tag == (LFS_MKTAG(LFS_TYPE_DELETE, 0, 0) |
|
|
|
|
(LFS_MKTAG(0, 0x3ff, 0) & tempbesttag))) {
|
|
|
|
tempbesttag |= 0x80000000;
|
|
|
|
} else if (tempbesttag != -1 &&
|
|
|
|
lfs_tag_id(tag) <= lfs_tag_id(tempbesttag)) {
|
|
|
|
tempbesttag += LFS_MKTAG(0, lfs_tag_splice(tag), 0);
|
|
|
|
}
|
|
|
|
} else if (lfs_tag_type1(tag) == LFS_TYPE_TAIL) {
|
|
|
|
tempsplit = (lfs_tag_chunk(tag) & 1);
|
2019-01-04 23:23:36 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
err = lfs_bd_read(lfs,
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
NULL, &lfs->rcache, lfs->cfg->block_size,
|
2019-01-04 23:23:36 +00:00
|
|
|
dir->pair[0], off+sizeof(tag), &temptail, 8);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
dir->erased = false;
|
|
|
|
break;
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
|
|
|
}
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_pair_fromle32(temptail);
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// found a match for our fetcher?
|
|
|
|
if ((fmask & tag) == (fmask & ftag)) {
|
|
|
|
int res = cb(data, tag, &(struct lfs_diskoff){
|
|
|
|
dir->pair[0], off+sizeof(tag)});
|
|
|
|
if (res < 0) {
|
|
|
|
if (res == LFS_ERR_CORRUPT) {
|
|
|
|
dir->erased = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
return res;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (res == LFS_CMP_EQ) {
|
|
|
|
// found a match
|
|
|
|
tempbesttag = tag;
|
Fixed lfs_dir_fetchmatch not understanding overwritten tags
Sometimes small, single line code change hides behind it a complicated
story. This is one of those times.
If you look at this diff, you may note that this is a case of
lfs_dir_fetchmatch not correctly handling a tag that invalidates a
callback used to search for some condition, in this case a search for a
parent, which is invalidated by a later dir tag overwritting the
previous dir pair.
But how can this happen? Dir-pair-tags are only overwritten during
relocations (when a block goes bad or exceeds the block_cycles config
option for dynamic wear-leveling). Other dir operations create new
directory entries. And the only lfs_dir_fetchmatch condition that relies
on overwrites (as opposed to proper deletes) is when we need to find a
directory's parent, an operation that only occurs during a _different_
relocation. And a false _positive_, can only happen if we don't have a
parent. Which is really unlikely when we search for directory parents!
This bug and minimal test case was found by Matthew Renzelmann. In a
unfortunate series of events, first a file creation causes a directory
split to occur. This creates a new, orphaned metadata-pair containing
our new file. However, the revision count on this metadata-pair
indicates the pair is due for relocation as a part of wear-leveling.
Normally, this is fine, even though this metadata-pair has no parent,
the lfs_dir_find should return ENOENT and continue without error.
However, here we get hit by our fetchmatch bug. A previous, unrelated
relocation overwrites a pair which just happens to contain the block
allocated for a new metadata-pair. When we search for a parent,
lfs_dir_fetchmatch incorrectly finds this old, outdated metadata pair
and incorrectly tells our orphan it's found its parent.
As you can imagine the orphan's dissapointment must be immense.
So an unfortunately timed dir split triggers a relocation which
incorrectly finds a previously written parent that has been outdated
by another relocation.
As a solution we can outdate our found tag if it is overwritten by
an exact match during lfs_dir_fetchmatch.
As a part of this I started adding a new set of tests: tests/test_relocations,
for aggressive relocations tests. This is already by appended to by
another PR. I suspect relocations is relatively under-tested and is
becoming more important due to recent improvements in wear-leveling.
2019-11-26 07:21:42 +00:00
|
|
|
} else if ((LFS_MKTAG(0x7ff, 0x3ff, 0) & tag) ==
|
|
|
|
(LFS_MKTAG(0x7ff, 0x3ff, 0) & tempbesttag)) {
|
|
|
|
// found an identical tag, but contents didn't match
|
|
|
|
// this must mean that our besttag has been overwritten
|
|
|
|
tempbesttag = -1;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
} else if (res == LFS_CMP_GT &&
|
|
|
|
lfs_tag_id(tag) <= lfs_tag_id(tempbesttag)) {
|
|
|
|
// found a greater match, keep track to keep things sorted
|
|
|
|
tempbesttag = tag | 0x80000000;
|
|
|
|
}
|
|
|
|
}
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// consider what we have good enough
|
|
|
|
if (dir->off > 0) {
|
|
|
|
// synthetic move
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
if (lfs_gstate_hasmovehere(&lfs->gdisk, dir->pair)) {
|
|
|
|
if (lfs_tag_id(lfs->gdisk.tag) == lfs_tag_id(besttag)) {
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
besttag |= 0x80000000;
|
|
|
|
} else if (besttag != -1 &&
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs_tag_id(lfs->gdisk.tag) < lfs_tag_id(besttag)) {
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
besttag -= LFS_MKTAG(0, 1, 0);
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
2018-05-19 23:25:47 +00:00
|
|
|
}
|
2018-09-11 03:07:59 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// found tag? or found best id?
|
|
|
|
if (id) {
|
|
|
|
*id = lfs_min(lfs_tag_id(besttag), dir->count);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (lfs_tag_isvalid(besttag)) {
|
|
|
|
return besttag;
|
|
|
|
} else if (lfs_tag_id(besttag) < dir->count) {
|
|
|
|
return LFS_ERR_NOENT;
|
|
|
|
} else {
|
|
|
|
return 0;
|
|
|
|
}
|
2018-05-19 23:25:47 +00:00
|
|
|
}
|
2018-09-11 03:07:59 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// failed, try the other block?
|
2018-09-11 03:07:59 +00:00
|
|
|
lfs_pair_swap(dir->pair);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
dir->rev = revs[(r+1)%2];
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
|
|
|
|
2019-07-27 01:09:24 +00:00
|
|
|
LFS_ERROR("Corrupted dir pair at %"PRIx32" %"PRIx32,
|
2018-09-11 03:07:59 +00:00
|
|
|
dir->pair[0], dir->pair[1]);
|
|
|
|
return LFS_ERR_CORRUPT;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs_dir_fetch(lfs_t *lfs,
|
|
|
|
lfs_mdir_t *dir, const lfs_block_t pair[2]) {
|
2019-05-17 11:48:37 +00:00
|
|
|
// note, mask=-1, tag=-1 can never match a tag since this
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// pattern has the invalid bit set
|
2019-11-26 22:42:49 +00:00
|
|
|
return (int)lfs_dir_fetchmatch(lfs, dir, pair,
|
|
|
|
(lfs_tag_t)-1, (lfs_tag_t)-1, NULL, NULL, NULL);
|
2018-05-19 23:25:47 +00:00
|
|
|
}
|
|
|
|
|
2019-01-04 23:23:36 +00:00
|
|
|
static int lfs_dir_getgstate(lfs_t *lfs, const lfs_mdir_t *dir,
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs_gstate_t *gstate) {
|
|
|
|
lfs_gstate_t temp;
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_stag_t res = lfs_dir_get(lfs, dir, LFS_MKTAG(0x7ff, 0, 0),
|
|
|
|
LFS_MKTAG(LFS_TYPE_MOVESTATE, 0, sizeof(temp)), &temp);
|
2018-09-15 03:02:39 +00:00
|
|
|
if (res < 0 && res != LFS_ERR_NOENT) {
|
|
|
|
return res;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (res != LFS_ERR_NOENT) {
|
2019-01-04 23:23:36 +00:00
|
|
|
// xor together to find resulting gstate
|
2019-01-22 22:09:51 +00:00
|
|
|
lfs_gstate_fromle32(&temp);
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_gstate_xor(gstate, &temp);
|
2018-09-15 03:02:39 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs_dir_getinfo(lfs_t *lfs, lfs_mdir_t *dir,
|
|
|
|
uint16_t id, struct lfs_info *info) {
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (id == 0x3ff) {
|
2018-09-15 03:02:39 +00:00
|
|
|
// special case for root
|
|
|
|
strcpy(info->name, "/");
|
|
|
|
info->type = LFS_TYPE_DIR;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_stag_t tag = lfs_dir_get(lfs, dir, LFS_MKTAG(0x780, 0x3ff, 0),
|
|
|
|
LFS_MKTAG(LFS_TYPE_NAME, id, lfs->name_max+1), info->name);
|
2018-09-15 03:02:39 +00:00
|
|
|
if (tag < 0) {
|
2019-10-01 05:24:17 +00:00
|
|
|
return (int)tag;
|
2018-09-15 03:02:39 +00:00
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
info->type = lfs_tag_type3(tag);
|
2018-09-15 03:02:39 +00:00
|
|
|
|
|
|
|
struct lfs_ctz ctz;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
tag = lfs_dir_get(lfs, dir, LFS_MKTAG(0x700, 0x3ff, 0),
|
2018-09-15 03:02:39 +00:00
|
|
|
LFS_MKTAG(LFS_TYPE_STRUCT, id, sizeof(ctz)), &ctz);
|
|
|
|
if (tag < 0) {
|
2019-10-01 05:24:17 +00:00
|
|
|
return (int)tag;
|
2018-09-15 03:02:39 +00:00
|
|
|
}
|
|
|
|
lfs_ctz_fromle32(&ctz);
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (lfs_tag_type3(tag) == LFS_TYPE_CTZSTRUCT) {
|
2018-09-15 03:02:39 +00:00
|
|
|
info->size = ctz.size;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
} else if (lfs_tag_type3(tag) == LFS_TYPE_INLINESTRUCT) {
|
2018-09-15 03:02:39 +00:00
|
|
|
info->size = lfs_tag_size(tag);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct lfs_dir_find_match {
|
|
|
|
lfs_t *lfs;
|
|
|
|
const void *name;
|
|
|
|
lfs_size_t size;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int lfs_dir_find_match(void *data,
|
|
|
|
lfs_tag_t tag, const void *buffer) {
|
|
|
|
struct lfs_dir_find_match *name = data;
|
|
|
|
lfs_t *lfs = name->lfs;
|
|
|
|
const struct lfs_diskoff *disk = buffer;
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// compare with disk
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
lfs_size_t diff = lfs_min(name->size, lfs_tag_size(tag));
|
|
|
|
int res = lfs_bd_cmp(lfs,
|
|
|
|
NULL, &lfs->rcache, diff,
|
|
|
|
disk->block, disk->off, name->name, diff);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (res != LFS_CMP_EQ) {
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
return res;
|
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// only equal if our size is still the same
|
|
|
|
if (name->size != lfs_tag_size(tag)) {
|
|
|
|
return (name->size < lfs_tag_size(tag)) ? LFS_CMP_LT : LFS_CMP_GT;
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// found a match!
|
|
|
|
return LFS_CMP_EQ;
|
2018-09-15 03:02:39 +00:00
|
|
|
}
|
|
|
|
|
2019-09-29 22:28:03 +00:00
|
|
|
static lfs_stag_t lfs_dir_find(lfs_t *lfs, lfs_mdir_t *dir,
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
const char **path, uint16_t *id) {
|
2018-09-15 03:02:39 +00:00
|
|
|
// we reduce path to a single name if we can find it
|
|
|
|
const char *name = *path;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (id) {
|
|
|
|
*id = 0x3ff;
|
|
|
|
}
|
2018-09-15 03:02:39 +00:00
|
|
|
|
|
|
|
// default to root dir
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_stag_t tag = LFS_MKTAG(LFS_TYPE_DIR, 0x3ff, 0);
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
dir->tail[0] = lfs->root[0];
|
|
|
|
dir->tail[1] = lfs->root[1];
|
2018-09-15 03:02:39 +00:00
|
|
|
|
|
|
|
while (true) {
|
|
|
|
nextname:
|
|
|
|
// skip slashes
|
|
|
|
name += strspn(name, "/");
|
|
|
|
lfs_size_t namelen = strcspn(name, "/");
|
|
|
|
|
|
|
|
// skip '.' and root '..'
|
|
|
|
if ((namelen == 1 && memcmp(name, ".", 1) == 0) ||
|
|
|
|
(namelen == 2 && memcmp(name, "..", 2) == 0)) {
|
|
|
|
name += namelen;
|
|
|
|
goto nextname;
|
|
|
|
}
|
|
|
|
|
|
|
|
// skip if matched by '..' in name
|
|
|
|
const char *suffix = name + namelen;
|
|
|
|
lfs_size_t sufflen;
|
|
|
|
int depth = 1;
|
|
|
|
while (true) {
|
|
|
|
suffix += strspn(suffix, "/");
|
|
|
|
sufflen = strcspn(suffix, "/");
|
|
|
|
if (sufflen == 0) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (sufflen == 2 && memcmp(suffix, "..", 2) == 0) {
|
|
|
|
depth -= 1;
|
|
|
|
if (depth == 0) {
|
|
|
|
name = suffix + sufflen;
|
|
|
|
goto nextname;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
depth += 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
suffix += sufflen;
|
|
|
|
}
|
|
|
|
|
|
|
|
// found path
|
|
|
|
if (name[0] == '\0') {
|
|
|
|
return tag;
|
|
|
|
}
|
|
|
|
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
// update what we've found so far
|
|
|
|
*path = name;
|
2018-09-15 03:02:39 +00:00
|
|
|
|
|
|
|
// only continue if we hit a directory
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (lfs_tag_type3(tag) != LFS_TYPE_DIR) {
|
2018-09-15 03:02:39 +00:00
|
|
|
return LFS_ERR_NOTDIR;
|
|
|
|
}
|
|
|
|
|
|
|
|
// grab the entry data
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (lfs_tag_id(tag) != 0x3ff) {
|
|
|
|
lfs_stag_t res = lfs_dir_get(lfs, dir, LFS_MKTAG(0x700, 0x3ff, 0),
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
LFS_MKTAG(LFS_TYPE_STRUCT, lfs_tag_id(tag), 8), dir->tail);
|
2018-09-15 03:02:39 +00:00
|
|
|
if (res < 0) {
|
|
|
|
return res;
|
|
|
|
}
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
lfs_pair_fromle32(dir->tail);
|
2018-09-15 03:02:39 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// find entry matching name
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
while (true) {
|
|
|
|
tag = lfs_dir_fetchmatch(lfs, dir, dir->tail,
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
LFS_MKTAG(0x780, 0, 0),
|
|
|
|
LFS_MKTAG(LFS_TYPE_NAME, 0, namelen),
|
|
|
|
// are we last name?
|
|
|
|
(strchr(name, '/') == NULL) ? id : NULL,
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
lfs_dir_find_match, &(struct lfs_dir_find_match){
|
|
|
|
lfs, name, namelen});
|
|
|
|
if (tag < 0) {
|
|
|
|
return tag;
|
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (tag) {
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (!dir->split) {
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
return LFS_ERR_NOENT;
|
|
|
|
}
|
2018-09-15 03:02:39 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// to next name
|
|
|
|
name += namelen;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
// commit logic
|
|
|
|
struct lfs_commit {
|
|
|
|
lfs_block_t block;
|
|
|
|
lfs_off_t off;
|
|
|
|
lfs_tag_t ptag;
|
|
|
|
uint32_t crc;
|
|
|
|
|
|
|
|
lfs_off_t begin;
|
|
|
|
lfs_off_t end;
|
|
|
|
};
|
|
|
|
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
static int lfs_dir_commitprog(lfs_t *lfs, struct lfs_commit *commit,
|
2018-09-11 03:07:59 +00:00
|
|
|
const void *buffer, lfs_size_t size) {
|
|
|
|
int err = lfs_bd_prog(lfs,
|
|
|
|
&lfs->pcache, &lfs->rcache, false,
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
commit->block, commit->off ,
|
|
|
|
(const uint8_t*)buffer, size);
|
2018-09-11 03:07:59 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
commit->crc = lfs_crc(commit->crc, buffer, size);
|
|
|
|
commit->off += size;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
static int lfs_dir_commitattr(lfs_t *lfs, struct lfs_commit *commit,
|
2018-09-11 03:07:59 +00:00
|
|
|
lfs_tag_t tag, const void *buffer) {
|
|
|
|
// check if we fit
|
|
|
|
lfs_size_t dsize = lfs_tag_dsize(tag);
|
|
|
|
if (commit->off + dsize > commit->end) {
|
|
|
|
return LFS_ERR_NOSPC;
|
|
|
|
}
|
2018-05-26 00:04:01 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
// write out tag
|
Tweaked tag endianness to catch power-loss after <1 word is written
There was an interesting subtlety with the existing layout of tags that
could become a problem in the future. Basically, littlefs avoids writing to
any region of storage it is not absolutely sure has been erased
beforehand. This is a part of limiting the number of assumptions about
storage. It's possible a storage technology can't support writes without
erases in a way that is undetectable at write time (Maybe changing a bit
without an erase decreases the longevity of the information stored on
the bit).
But the existing layout had a very tiny corner case where this wasn't
true. Consider the location of the valid bit in the tag struct:
[1|--- 31 ---]
^--- valid bit
The responsibility of this bit is to indicate if an attempt has been
made to write the following commit. If it is not set (the specific value
is dependent on a previous read and identified by the preceeding commit),
the assumption is that it is safe to write to the next region because it
has been erased previously. If it is set, we check if the next commit is
valid, if it isn't (because of CRC failure, likely due to power-loss), we
discard the commit. But because an attempt has been made to write to
that storage, we must then do a compaction to move to the other block in
the metadata-pair.
This plan looks good on paper, but what does it look like on storage?
The problem is that words in littlefs are in little-endian. So on
storage the tag actually looks like this:
[- 8 -|- 8 -|- 8 -|1|- 7 -]
^-- valid bit
This means that we don't actually set the valid bit before writing the
tag! We write the lower bytes first. If we lose power, we may have
written 3 bytes without this fact being detectable.
We could restructure the tag structure to store the valid bit lower,
however because none of the fields are 7 bits, this would make the
extraction more costly, and we then lose the ability to check this
valid bit with a sign comparison.
The simple solution is to just store the tag in big-endian. A small
benefit is that this will actually have a negative code cost on
big-endian machines.
This mixture of endiannesses is frustrating, however it is a pragmatic
solution with only a 20-byte code size cost.
2018-10-22 21:42:30 +00:00
|
|
|
lfs_tag_t ntag = lfs_tobe32((tag & 0x7fffffff) ^ commit->ptag);
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
int err = lfs_dir_commitprog(lfs, commit, &ntag, sizeof(ntag));
|
2018-09-11 03:07:59 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2018-05-26 00:04:01 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
if (!(tag & 0x80000000)) {
|
|
|
|
// from memory
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
err = lfs_dir_commitprog(lfs, commit, buffer, dsize-sizeof(tag));
|
2018-09-11 03:07:59 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
// from disk
|
|
|
|
const struct lfs_diskoff *disk = buffer;
|
|
|
|
for (lfs_off_t i = 0; i < dsize-sizeof(tag); i++) {
|
|
|
|
// rely on caching to make this efficient
|
|
|
|
uint8_t dat;
|
|
|
|
err = lfs_bd_read(lfs,
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
NULL, &lfs->rcache, dsize-sizeof(tag)-i,
|
2018-09-11 03:07:59 +00:00
|
|
|
disk->block, disk->off+i, &dat, 1);
|
2018-07-12 23:11:18 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2018-05-26 00:04:01 +00:00
|
|
|
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
err = lfs_dir_commitprog(lfs, commit, &dat, 1);
|
2018-09-11 03:07:59 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
2018-09-09 14:01:06 +00:00
|
|
|
}
|
|
|
|
}
|
2018-05-28 14:17:44 +00:00
|
|
|
}
|
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
commit->ptag = tag & 0x7fffffff;
|
2018-05-28 07:08:16 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
static int lfs_dir_commitcrc(lfs_t *lfs, struct lfs_commit *commit) {
|
2020-02-10 04:43:20 +00:00
|
|
|
const lfs_off_t off1 = commit->off;
|
|
|
|
const uint32_t crc1 = commit->crc;
|
2018-07-13 20:04:31 +00:00
|
|
|
// align to program units
|
2020-02-10 04:43:20 +00:00
|
|
|
const lfs_off_t end = lfs_alignup(off1 + 2*sizeof(uint32_t),
|
2018-07-13 20:04:31 +00:00
|
|
|
lfs->cfg->prog_size);
|
|
|
|
|
2019-07-24 19:24:29 +00:00
|
|
|
// create crc tags to fill up remainder of commit, note that
|
2020-02-10 04:43:20 +00:00
|
|
|
// padding is not crced, which lets fetches skip padding but
|
2019-07-24 19:24:29 +00:00
|
|
|
// makes committing a bit more complicated
|
|
|
|
while (commit->off < end) {
|
|
|
|
lfs_off_t off = commit->off + sizeof(lfs_tag_t);
|
|
|
|
lfs_off_t noff = lfs_min(end - off, 0x3fe) + off;
|
|
|
|
if (noff < end) {
|
|
|
|
noff = lfs_min(noff, end - 2*sizeof(uint32_t));
|
|
|
|
}
|
2018-07-13 20:04:31 +00:00
|
|
|
|
2019-07-24 19:24:29 +00:00
|
|
|
// read erased state from next program unit
|
2020-02-10 04:43:20 +00:00
|
|
|
lfs_tag_t tag = 0xffffffff;
|
2019-07-24 19:24:29 +00:00
|
|
|
int err = lfs_bd_read(lfs,
|
|
|
|
NULL, &lfs->rcache, sizeof(tag),
|
|
|
|
commit->block, noff, &tag, sizeof(tag));
|
|
|
|
if (err && err != LFS_ERR_CORRUPT) {
|
|
|
|
return err;
|
|
|
|
}
|
2018-07-13 20:04:31 +00:00
|
|
|
|
2019-07-24 19:24:29 +00:00
|
|
|
// build crc tag
|
|
|
|
bool reset = ~lfs_frombe32(tag) >> 31;
|
|
|
|
tag = LFS_MKTAG(LFS_TYPE_CRC + reset, 0x3ff, noff - off);
|
2018-07-13 20:04:31 +00:00
|
|
|
|
2019-07-24 19:24:29 +00:00
|
|
|
// write out crc
|
|
|
|
uint32_t footer[2];
|
|
|
|
footer[0] = lfs_tobe32(tag ^ commit->ptag);
|
|
|
|
commit->crc = lfs_crc(commit->crc, &footer[0], sizeof(footer[0]));
|
|
|
|
footer[1] = lfs_tole32(commit->crc);
|
|
|
|
err = lfs_bd_prog(lfs,
|
|
|
|
&lfs->pcache, &lfs->rcache, false,
|
|
|
|
commit->block, commit->off, &footer, sizeof(footer));
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-07-24 19:24:29 +00:00
|
|
|
commit->off += sizeof(tag)+lfs_tag_size(tag);
|
2019-09-09 03:53:50 +00:00
|
|
|
commit->ptag = tag ^ ((lfs_tag_t)reset << 31);
|
2020-02-10 04:43:20 +00:00
|
|
|
commit->crc = 0xffffffff; // reset crc for next "commit"
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
}
|
|
|
|
|
2019-07-24 19:24:29 +00:00
|
|
|
// flush buffers
|
|
|
|
int err = lfs_bd_sync(lfs, &lfs->pcache, &lfs->rcache, false);
|
2018-07-13 20:04:31 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-07-24 19:24:29 +00:00
|
|
|
// successful commit, check checksums to make sure
|
|
|
|
lfs_off_t off = commit->begin;
|
2020-02-10 04:43:20 +00:00
|
|
|
lfs_off_t noff = off1 + sizeof(uint32_t);
|
2019-07-24 19:24:29 +00:00
|
|
|
while (off < end) {
|
2020-02-10 04:43:20 +00:00
|
|
|
uint32_t crc = 0xffffffff;
|
2019-07-24 19:24:29 +00:00
|
|
|
for (lfs_off_t i = off; i < noff+sizeof(uint32_t); i++) {
|
2020-02-10 04:43:20 +00:00
|
|
|
// check against written crc, may catch blocks that
|
|
|
|
// become readonly and match our commit size exactly
|
|
|
|
if (i == off1 && crc != crc1) {
|
|
|
|
return LFS_ERR_CORRUPT;
|
|
|
|
}
|
|
|
|
|
2019-07-24 19:24:29 +00:00
|
|
|
// leave it up to caching to make this efficient
|
|
|
|
uint8_t dat;
|
|
|
|
err = lfs_bd_read(lfs,
|
|
|
|
NULL, &lfs->rcache, noff+sizeof(uint32_t)-i,
|
|
|
|
commit->block, i, &dat, 1);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
crc = lfs_crc(crc, &dat, 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
// detected write error?
|
|
|
|
if (crc != 0) {
|
|
|
|
return LFS_ERR_CORRUPT;
|
|
|
|
}
|
|
|
|
|
|
|
|
// skip padding
|
|
|
|
off = lfs_min(end - noff, 0x3fe) + noff;
|
|
|
|
if (off < end) {
|
|
|
|
off = lfs_min(off, end - 2*sizeof(uint32_t));
|
|
|
|
}
|
|
|
|
noff = off + sizeof(uint32_t);
|
2018-07-13 20:04:31 +00:00
|
|
|
}
|
|
|
|
|
2018-07-02 03:29:42 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-08-01 15:24:59 +00:00
|
|
|
static int lfs_dir_alloc(lfs_t *lfs, lfs_mdir_t *dir) {
|
Added root entry and expanding superblocks
Expanding superblocks has been on my wishlist for a while. The basic
idea is that instead of maintaining a fixed offset blocks {0, 1} to the
the root directory (1 pointer), we maintain a dynamically sized
linked-list of superblocks that point to the actual root. If the number
of writes to the root exceeds some value, we increase the size of the
superblock linked-list.
This can leverage existing metadata-pair operations. The revision count for
metadata-pairs provides some knowledge on how much wear we've put on the
superblock, and the threaded linked-list can also be reused for this
purpose. This means superblock expansion is both optional and cheap to
implement.
Expanding superblocks helps both extremely small and extremely large filesystem
(extreme being relative of course). On the small end, we can actually
collapse the superblock into the root directory and drop the hard requirement
of 4-blocks for the superblock. On the large end, our superblock will
now last longer than the rest of the filesystem. Each time we expand,
the number of cycles until the superblock dies is increased by a power.
Before we were stuck with this layout:
level cycles limit layout
1 E^2 390 MiB s0 -> root
Now we expand every time a fixed offset is exceeded:
level cycles limit layout
0 E 4 KiB s0+root
1 E^2 390 MiB s0 -> root
2 E^3 37 TiB s0 -> s1 -> root
3 E^4 3.6 EiB s0 -> s1 -> s2 -> root
...
Where the cycles are the number of cycles before death, and the limit is
the worst-case size a filesystem where early superblock death becomes a
concern (all writes to root using this formula: E^|s| = E*B, E = erase
cycles = 100000, B = block count, assuming 4096 byte blocks).
Note we can also store copies of the superblock entry on the expanded
superblocks. This may help filesystem recover tools in the future.
2018-08-06 18:30:51 +00:00
|
|
|
// allocate pair of dir blocks (backwards, so we write block 1 first)
|
2018-05-19 23:25:47 +00:00
|
|
|
for (int i = 0; i < 2; i++) {
|
|
|
|
int err = lfs_alloc(lfs, &dir->pair[(i+1)%2]);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
// zero for reproducability in case initial block is unreadable
|
|
|
|
dir->rev = 0;
|
|
|
|
|
2018-05-19 23:25:47 +00:00
|
|
|
// rather than clobbering one of the blocks we just pretend
|
|
|
|
// the revision may be valid
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
int err = lfs_bd_read(lfs,
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
NULL, &lfs->rcache, sizeof(dir->rev),
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
dir->pair[0], 0, &dir->rev, sizeof(dir->rev));
|
2018-05-19 23:25:47 +00:00
|
|
|
dir->rev = lfs_fromle32(dir->rev);
|
2018-08-05 01:33:09 +00:00
|
|
|
if (err && err != LFS_ERR_CORRUPT) {
|
2018-05-19 23:25:47 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-01-30 03:53:56 +00:00
|
|
|
// make sure we don't immediately evict
|
|
|
|
dir->rev += dir->rev & 1;
|
|
|
|
|
2018-05-19 23:25:47 +00:00
|
|
|
// set defaults
|
|
|
|
dir->off = sizeof(dir->rev);
|
2020-02-10 04:43:20 +00:00
|
|
|
dir->etag = 0xffffffff;
|
2018-05-19 23:25:47 +00:00
|
|
|
dir->count = 0;
|
2019-08-03 14:17:47 +00:00
|
|
|
dir->tail[0] = LFS_BLOCK_NULL;
|
|
|
|
dir->tail[1] = LFS_BLOCK_NULL;
|
2018-05-21 05:56:20 +00:00
|
|
|
dir->erased = false;
|
2018-08-01 15:24:59 +00:00
|
|
|
dir->split = false;
|
2018-05-19 23:25:47 +00:00
|
|
|
|
|
|
|
// don't write out yet, let caller take care of that
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-01-04 23:23:36 +00:00
|
|
|
static int lfs_dir_drop(lfs_t *lfs, lfs_mdir_t *dir, lfs_mdir_t *tail) {
|
2018-09-12 06:34:03 +00:00
|
|
|
// steal state
|
2019-01-04 23:23:36 +00:00
|
|
|
int err = lfs_dir_getgstate(lfs, tail, &lfs->gdelta);
|
2018-09-15 03:02:39 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
2018-09-12 06:34:03 +00:00
|
|
|
}
|
|
|
|
|
2019-01-04 23:23:36 +00:00
|
|
|
// steal tail
|
|
|
|
lfs_pair_tole32(tail->tail);
|
2019-01-08 14:52:03 +00:00
|
|
|
err = lfs_dir_commit(lfs, dir, LFS_MKATTRS(
|
|
|
|
{LFS_MKTAG(LFS_TYPE_TAIL + tail->split, 0x3ff, 8), tail->tail}));
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_pair_fromle32(tail->tail);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
2018-09-12 06:34:03 +00:00
|
|
|
}
|
|
|
|
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
static int lfs_dir_split(lfs_t *lfs,
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_mdir_t *dir, const struct lfs_mattr *attrs, int attrcount,
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
lfs_mdir_t *source, uint16_t split, uint16_t end) {
|
|
|
|
// create tail directory
|
Fixed lfs_dir_fetchmatch not understanding overwritten tags
Sometimes small, single line code change hides behind it a complicated
story. This is one of those times.
If you look at this diff, you may note that this is a case of
lfs_dir_fetchmatch not correctly handling a tag that invalidates a
callback used to search for some condition, in this case a search for a
parent, which is invalidated by a later dir tag overwritting the
previous dir pair.
But how can this happen? Dir-pair-tags are only overwritten during
relocations (when a block goes bad or exceeds the block_cycles config
option for dynamic wear-leveling). Other dir operations create new
directory entries. And the only lfs_dir_fetchmatch condition that relies
on overwrites (as opposed to proper deletes) is when we need to find a
directory's parent, an operation that only occurs during a _different_
relocation. And a false _positive_, can only happen if we don't have a
parent. Which is really unlikely when we search for directory parents!
This bug and minimal test case was found by Matthew Renzelmann. In a
unfortunate series of events, first a file creation causes a directory
split to occur. This creates a new, orphaned metadata-pair containing
our new file. However, the revision count on this metadata-pair
indicates the pair is due for relocation as a part of wear-leveling.
Normally, this is fine, even though this metadata-pair has no parent,
the lfs_dir_find should return ENOENT and continue without error.
However, here we get hit by our fetchmatch bug. A previous, unrelated
relocation overwrites a pair which just happens to contain the block
allocated for a new metadata-pair. When we search for a parent,
lfs_dir_fetchmatch incorrectly finds this old, outdated metadata pair
and incorrectly tells our orphan it's found its parent.
As you can imagine the orphan's dissapointment must be immense.
So an unfortunately timed dir split triggers a relocation which
incorrectly finds a previously written parent that has been outdated
by another relocation.
As a solution we can outdate our found tag if it is overwritten by
an exact match during lfs_dir_fetchmatch.
As a part of this I started adding a new set of tests: tests/test_relocations,
for aggressive relocations tests. This is already by appended to by
another PR. I suspect relocations is relatively under-tested and is
becoming more important due to recent improvements in wear-leveling.
2019-11-26 07:21:42 +00:00
|
|
|
lfs_alloc_ack(lfs);
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
lfs_mdir_t tail;
|
|
|
|
int err = lfs_dir_alloc(lfs, &tail);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
tail.split = dir->split;
|
|
|
|
tail.tail[0] = dir->tail[0];
|
|
|
|
tail.tail[1] = dir->tail[1];
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
err = lfs_dir_compact(lfs, &tail, attrs, attrcount, source, split, end);
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
dir->tail[0] = tail.pair[0];
|
|
|
|
dir->tail[1] = tail.pair[1];
|
|
|
|
dir->split = true;
|
|
|
|
|
|
|
|
// update root if needed
|
|
|
|
if (lfs_pair_cmp(dir->pair, lfs->root) == 0 && split == 0) {
|
|
|
|
lfs->root[0] = tail.pair[0];
|
|
|
|
lfs->root[1] = tail.pair[1];
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs_dir_commit_size(void *p, lfs_tag_t tag, const void *buffer) {
|
|
|
|
lfs_size_t *size = p;
|
|
|
|
(void)buffer;
|
|
|
|
|
|
|
|
*size += lfs_tag_dsize(tag);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct lfs_dir_commit_commit {
|
|
|
|
lfs_t *lfs;
|
|
|
|
struct lfs_commit *commit;
|
|
|
|
};
|
|
|
|
|
|
|
|
static int lfs_dir_commit_commit(void *p, lfs_tag_t tag, const void *buffer) {
|
|
|
|
struct lfs_dir_commit_commit *commit = p;
|
|
|
|
return lfs_dir_commitattr(commit->lfs, commit->commit, tag, buffer);
|
|
|
|
}
|
|
|
|
|
2018-07-13 20:04:31 +00:00
|
|
|
static int lfs_dir_compact(lfs_t *lfs,
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_mdir_t *dir, const struct lfs_mattr *attrs, int attrcount,
|
2018-05-29 06:11:26 +00:00
|
|
|
lfs_mdir_t *source, uint16_t begin, uint16_t end) {
|
2018-05-28 07:08:16 +00:00
|
|
|
// save some state in case block is bad
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
const lfs_block_t oldpair[2] = {dir->pair[0], dir->pair[1]};
|
2018-05-28 07:08:16 +00:00
|
|
|
bool relocated = false;
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
bool tired = false;
|
2018-05-28 07:08:16 +00:00
|
|
|
|
2019-01-31 20:54:47 +00:00
|
|
|
// should we split?
|
|
|
|
while (end - begin > 1) {
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
// find size
|
|
|
|
lfs_size_t size = 0;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
int err = lfs_dir_traverse(lfs,
|
2020-02-10 04:43:20 +00:00
|
|
|
source, 0, 0xffffffff, attrs, attrcount,
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
LFS_MKTAG(0x400, 0x3ff, 0),
|
|
|
|
LFS_MKTAG(LFS_TYPE_NAME, 0, 0),
|
|
|
|
begin, end, -begin,
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
lfs_dir_commit_size, &size);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
Modified lfs_dir_compact to avoid redundant erases during split
The commit machine in littlefs has three stages: commit, compact, and
then split. First we try to append our commit to the metadata log, if
that fails we try to compact the metadata log to remove duplicates and make
room for the commit, if that still fails we split the metadata into two
metadata-pairs and try again. Each stage is less efficient but also less
frequent.
However, in the case that we're filling up a directory with new files,
such as the bootstrap process in setting up a new system, we must pass
through all three stages rather quickly in order to get enough
metadata-pairs to hold all of our files. This means we'll compact,
split, and then need to compact again. This creates more erases than is
needed in the optimal case, which can be a big cost on disks with an
expensive erase operation.
In theory, we can actually avoid this redundant erase by reusing the
data we wrote out in the first attempt to compact. In practice, this
trick is very complicated to pull off.
1. We may need to cache a half-completed program while we write out the
new metadata-pair. We need to write out the second pair first in
order to get our new tail before we complete our first metadata-pair.
This requires two pcaches, which we don't have
The solution here is to just drop our cache and reconstruct what if
would have been. This needs to be perfect down to the byte level
because we don't have knowledge of where our cache lines are.
2. We may have written out entries that are then moved to the new
metadata-pair.
The solution here isn't pretty but it works, we just add a delete
tag for any entry that was moved over.
In the end the solution ends up a bit hacky, with different layers poked
through the commit logic in order to manage writes at the byte level
from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
|
|
|
|
2019-01-04 23:23:36 +00:00
|
|
|
// space is complicated, we need room for tail, crc, gstate,
|
Modified lfs_dir_compact to avoid redundant erases during split
The commit machine in littlefs has three stages: commit, compact, and
then split. First we try to append our commit to the metadata log, if
that fails we try to compact the metadata log to remove duplicates and make
room for the commit, if that still fails we split the metadata into two
metadata-pairs and try again. Each stage is less efficient but also less
frequent.
However, in the case that we're filling up a directory with new files,
such as the bootstrap process in setting up a new system, we must pass
through all three stages rather quickly in order to get enough
metadata-pairs to hold all of our files. This means we'll compact,
split, and then need to compact again. This creates more erases than is
needed in the optimal case, which can be a big cost on disks with an
expensive erase operation.
In theory, we can actually avoid this redundant erase by reusing the
data we wrote out in the first attempt to compact. In practice, this
trick is very complicated to pull off.
1. We may need to cache a half-completed program while we write out the
new metadata-pair. We need to write out the second pair first in
order to get our new tail before we complete our first metadata-pair.
This requires two pcaches, which we don't have
The solution here is to just drop our cache and reconstruct what if
would have been. This needs to be perfect down to the byte level
because we don't have knowledge of where our cache lines are.
2. We may have written out entries that are then moved to the new
metadata-pair.
The solution here isn't pretty but it works, we just add a delete
tag for any entry that was moved over.
In the end the solution ends up a bit hacky, with different layers poked
through the commit logic in order to manage writes at the byte level
from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
|
|
|
// cleanup delete, and we cap at half a block to give room
|
2019-01-31 20:54:47 +00:00
|
|
|
// for metadata updates.
|
2019-04-09 23:56:53 +00:00
|
|
|
if (end - begin < 0xff &&
|
|
|
|
size <= lfs_min(lfs->cfg->block_size - 36,
|
|
|
|
lfs_alignup(lfs->cfg->block_size/2,
|
|
|
|
lfs->cfg->prog_size))) {
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
break;
|
|
|
|
}
|
Modified lfs_dir_compact to avoid redundant erases during split
The commit machine in littlefs has three stages: commit, compact, and
then split. First we try to append our commit to the metadata log, if
that fails we try to compact the metadata log to remove duplicates and make
room for the commit, if that still fails we split the metadata into two
metadata-pairs and try again. Each stage is less efficient but also less
frequent.
However, in the case that we're filling up a directory with new files,
such as the bootstrap process in setting up a new system, we must pass
through all three stages rather quickly in order to get enough
metadata-pairs to hold all of our files. This means we'll compact,
split, and then need to compact again. This creates more erases than is
needed in the optimal case, which can be a big cost on disks with an
expensive erase operation.
In theory, we can actually avoid this redundant erase by reusing the
data we wrote out in the first attempt to compact. In practice, this
trick is very complicated to pull off.
1. We may need to cache a half-completed program while we write out the
new metadata-pair. We need to write out the second pair first in
order to get our new tail before we complete our first metadata-pair.
This requires two pcaches, which we don't have
The solution here is to just drop our cache and reconstruct what if
would have been. This needs to be perfect down to the byte level
because we don't have knowledge of where our cache lines are.
2. We may have written out entries that are then moved to the new
metadata-pair.
The solution here isn't pretty but it works, we just add a delete
tag for any entry that was moved over.
In the end the solution ends up a bit hacky, with different layers poked
through the commit logic in order to manage writes at the byte level
from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
|
|
|
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
// can't fit, need to split, we should really be finding the
|
|
|
|
// largest size that fits with a small binary search, but right now
|
|
|
|
// it's not worth the code size
|
|
|
|
uint16_t split = (end - begin) / 2;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
err = lfs_dir_split(lfs, dir, attrs, attrcount,
|
|
|
|
source, begin+split, end);
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
if (err) {
|
|
|
|
// if we fail to split, we may be able to overcompact, unless
|
|
|
|
// we're too big for even the full block, in which case our
|
|
|
|
// only option is to error
|
2019-01-04 23:23:36 +00:00
|
|
|
if (err == LFS_ERR_NOSPC && size <= lfs->cfg->block_size - 36) {
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
end = begin + split;
|
|
|
|
}
|
|
|
|
|
|
|
|
// increment revision count
|
2020-02-09 16:02:41 +00:00
|
|
|
dir->rev += 1;
|
|
|
|
// If our revision count == n * block_cycles, we should force a relocation,
|
|
|
|
// this is how littlefs wear-levels at the metadata-pair level. Note that we
|
|
|
|
// actually use (block_cycles+1)|1, this is to avoid two corner cases:
|
|
|
|
// 1. block_cycles = 1, which would prevent relocations from terminating
|
|
|
|
// 2. block_cycles = 2n, which, due to aliasing, would only ever relocate
|
|
|
|
// one metadata block in the pair, effectively making this useless
|
2019-07-17 22:05:20 +00:00
|
|
|
if (lfs->cfg->block_cycles > 0 &&
|
2020-02-09 16:02:41 +00:00
|
|
|
(dir->rev % ((lfs->cfg->block_cycles+1)|1) == 0)) {
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
if (lfs_pair_cmp(dir->pair, (const lfs_block_t[2]){0, 1}) == 0) {
|
|
|
|
// oh no! we're writing too much to the superblock,
|
|
|
|
// should we expand?
|
|
|
|
lfs_ssize_t res = lfs_fs_size(lfs);
|
|
|
|
if (res < 0) {
|
|
|
|
return res;
|
|
|
|
}
|
|
|
|
|
|
|
|
// do we have extra space? littlefs can't reclaim this space
|
|
|
|
// by itself, so expand cautiously
|
|
|
|
if ((lfs_size_t)res < lfs->cfg->block_count/2) {
|
2020-02-09 16:02:41 +00:00
|
|
|
LFS_DEBUG("Expanding superblock at rev %"PRIu32, dir->rev);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
int err = lfs_dir_split(lfs, dir, attrs, attrcount,
|
|
|
|
source, begin, end);
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
if (err && err != LFS_ERR_NOSPC) {
|
|
|
|
return err;
|
2018-05-28 07:08:16 +00:00
|
|
|
}
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
|
|
|
|
// welp, we tried, if we ran out of space there's not much
|
|
|
|
// we can do, we'll error later if we've become frozen
|
|
|
|
if (!err) {
|
|
|
|
end = begin;
|
|
|
|
}
|
|
|
|
}
|
2019-08-07 21:58:13 +00:00
|
|
|
#ifdef LFS_MIGRATE
|
2019-11-26 19:59:45 +00:00
|
|
|
} else if (lfs->lfs1) {
|
|
|
|
// do not proactively relocate blocks during migrations, this
|
|
|
|
// can cause a number of failure states such: clobbering the
|
|
|
|
// v1 superblock if we relocate root, and invalidating directory
|
|
|
|
// pointers if we relocate the head of a directory. On top of
|
|
|
|
// this, relocations increase the overall complexity of
|
|
|
|
// lfs_migration, which is already a delicate operation.
|
Fixed issue where lfs_migrate would relocate root and corrupt superblock
Found during testing, the issue was with lfs_migrate in combination with
wear leveling.
Normally, we can expect lfs_migrate to be able to respect the user-configured
block_cycles. It already has allocation information on which blocks are
used by both v1 and v2, so it should be safe to relocate blocks as
needed.
However, this fell apart when root was relocated. If lfs_migrate found a
root that needed migration, it would happily relocate the root. This
would normally be fine, except relocating the root has a side-effect of
needed to update the superblock. Which, during migration, is in a
delicate state of containing both v1's and v2's superblocks in the same
metadata pair. If the superblock ends up needing to compact, this would
clobber the v1 superblock and corrupt the filesystem during migration.
The best fix I could come up with is to specifically dissallow migrating the
root directory during migration. Fortunately this is behind the
LFS_MIGRATE macro, so the code cost for this check is not normally paid.
2019-07-29 06:34:23 +00:00
|
|
|
#endif
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
} else {
|
|
|
|
// we're writing too much, time to relocate
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
tired = true;
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
goto relocate;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// begin loop to commit compaction to blocks until a compact sticks
|
|
|
|
while (true) {
|
2019-04-09 23:37:53 +00:00
|
|
|
{
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
// setup commit state
|
|
|
|
struct lfs_commit commit = {
|
|
|
|
.block = dir->pair[1],
|
|
|
|
.off = 0,
|
2020-02-10 04:43:20 +00:00
|
|
|
.ptag = 0xffffffff,
|
|
|
|
.crc = 0xffffffff,
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
|
|
|
|
.begin = 0,
|
|
|
|
.end = lfs->cfg->block_size - 8,
|
|
|
|
};
|
|
|
|
|
Modified lfs_dir_compact to avoid redundant erases during split
The commit machine in littlefs has three stages: commit, compact, and
then split. First we try to append our commit to the metadata log, if
that fails we try to compact the metadata log to remove duplicates and make
room for the commit, if that still fails we split the metadata into two
metadata-pairs and try again. Each stage is less efficient but also less
frequent.
However, in the case that we're filling up a directory with new files,
such as the bootstrap process in setting up a new system, we must pass
through all three stages rather quickly in order to get enough
metadata-pairs to hold all of our files. This means we'll compact,
split, and then need to compact again. This creates more erases than is
needed in the optimal case, which can be a big cost on disks with an
expensive erase operation.
In theory, we can actually avoid this redundant erase by reusing the
data we wrote out in the first attempt to compact. In practice, this
trick is very complicated to pull off.
1. We may need to cache a half-completed program while we write out the
new metadata-pair. We need to write out the second pair first in
order to get our new tail before we complete our first metadata-pair.
This requires two pcaches, which we don't have
The solution here is to just drop our cache and reconstruct what if
would have been. This needs to be perfect down to the byte level
because we don't have knowledge of where our cache lines are.
2. We may have written out entries that are then moved to the new
metadata-pair.
The solution here isn't pretty but it works, we just add a delete
tag for any entry that was moved over.
In the end the solution ends up a bit hacky, with different layers poked
through the commit logic in order to manage writes at the byte level
from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
|
|
|
// erase block to write to
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
int err = lfs_bd_erase(lfs, dir->pair[1]);
|
Modified lfs_dir_compact to avoid redundant erases during split
The commit machine in littlefs has three stages: commit, compact, and
then split. First we try to append our commit to the metadata log, if
that fails we try to compact the metadata log to remove duplicates and make
room for the commit, if that still fails we split the metadata into two
metadata-pairs and try again. Each stage is less efficient but also less
frequent.
However, in the case that we're filling up a directory with new files,
such as the bootstrap process in setting up a new system, we must pass
through all three stages rather quickly in order to get enough
metadata-pairs to hold all of our files. This means we'll compact,
split, and then need to compact again. This creates more erases than is
needed in the optimal case, which can be a big cost on disks with an
expensive erase operation.
In theory, we can actually avoid this redundant erase by reusing the
data we wrote out in the first attempt to compact. In practice, this
trick is very complicated to pull off.
1. We may need to cache a half-completed program while we write out the
new metadata-pair. We need to write out the second pair first in
order to get our new tail before we complete our first metadata-pair.
This requires two pcaches, which we don't have
The solution here is to just drop our cache and reconstruct what if
would have been. This needs to be perfect down to the byte level
because we don't have knowledge of where our cache lines are.
2. We may have written out entries that are then moved to the new
metadata-pair.
The solution here isn't pretty but it works, we just add a delete
tag for any entry that was moved over.
In the end the solution ends up a bit hacky, with different layers poked
through the commit logic in order to manage writes at the byte level
from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
goto relocate;
|
|
|
|
}
|
|
|
|
return err;
|
2018-05-28 07:08:16 +00:00
|
|
|
}
|
|
|
|
|
2018-10-21 02:02:25 +00:00
|
|
|
// write out header
|
2020-02-09 16:02:41 +00:00
|
|
|
dir->rev = lfs_tole32(dir->rev);
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
err = lfs_dir_commitprog(lfs, &commit,
|
2020-02-09 16:02:41 +00:00
|
|
|
&dir->rev, sizeof(dir->rev));
|
|
|
|
dir->rev = lfs_fromle32(dir->rev);
|
2018-10-21 02:02:25 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
goto relocate;
|
|
|
|
}
|
|
|
|
return err;
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
}
|
2018-05-28 07:08:16 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
// traverse the directory, this time writing out all unique tags
|
|
|
|
err = lfs_dir_traverse(lfs,
|
2020-02-10 04:43:20 +00:00
|
|
|
source, 0, 0xffffffff, attrs, attrcount,
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
LFS_MKTAG(0x400, 0x3ff, 0),
|
|
|
|
LFS_MKTAG(LFS_TYPE_NAME, 0, 0),
|
|
|
|
begin, end, -begin,
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
lfs_dir_commit_commit, &(struct lfs_dir_commit_commit){
|
|
|
|
lfs, &commit});
|
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
goto relocate;
|
2018-10-21 02:02:25 +00:00
|
|
|
}
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
return err;
|
2018-10-21 02:02:25 +00:00
|
|
|
}
|
2018-05-28 07:08:16 +00:00
|
|
|
|
2019-01-04 23:23:36 +00:00
|
|
|
// commit tail, which may be new after last size check
|
|
|
|
if (!lfs_pair_isnull(dir->tail)) {
|
|
|
|
lfs_pair_tole32(dir->tail);
|
|
|
|
err = lfs_dir_commitattr(lfs, &commit,
|
|
|
|
LFS_MKTAG(LFS_TYPE_TAIL + dir->split, 0x3ff, 8),
|
|
|
|
dir->tail);
|
|
|
|
lfs_pair_fromle32(dir->tail);
|
2018-10-21 02:02:25 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
goto relocate;
|
|
|
|
}
|
|
|
|
return err;
|
Modified lfs_dir_compact to avoid redundant erases during split
The commit machine in littlefs has three stages: commit, compact, and
then split. First we try to append our commit to the metadata log, if
that fails we try to compact the metadata log to remove duplicates and make
room for the commit, if that still fails we split the metadata into two
metadata-pairs and try again. Each stage is less efficient but also less
frequent.
However, in the case that we're filling up a directory with new files,
such as the bootstrap process in setting up a new system, we must pass
through all three stages rather quickly in order to get enough
metadata-pairs to hold all of our files. This means we'll compact,
split, and then need to compact again. This creates more erases than is
needed in the optimal case, which can be a big cost on disks with an
expensive erase operation.
In theory, we can actually avoid this redundant erase by reusing the
data we wrote out in the first attempt to compact. In practice, this
trick is very complicated to pull off.
1. We may need to cache a half-completed program while we write out the
new metadata-pair. We need to write out the second pair first in
order to get our new tail before we complete our first metadata-pair.
This requires two pcaches, which we don't have
The solution here is to just drop our cache and reconstruct what if
would have been. This needs to be perfect down to the byte level
because we don't have knowledge of where our cache lines are.
2. We may have written out entries that are then moved to the new
metadata-pair.
The solution here isn't pretty but it works, we just add a delete
tag for any entry that was moved over.
In the end the solution ends up a bit hacky, with different layers poked
through the commit logic in order to manage writes at the byte level
from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
|
|
|
}
|
2019-01-04 23:23:36 +00:00
|
|
|
}
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
// bring over gstate?
|
|
|
|
lfs_gstate_t delta = {0};
|
|
|
|
if (!relocated) {
|
|
|
|
lfs_gstate_xor(&delta, &lfs->gdisk);
|
|
|
|
lfs_gstate_xor(&delta, &lfs->gstate);
|
|
|
|
}
|
|
|
|
lfs_gstate_xor(&delta, &lfs->gdelta);
|
|
|
|
delta.tag &= ~LFS_MKTAG(0, 0, 0x3ff);
|
|
|
|
|
|
|
|
err = lfs_dir_getgstate(lfs, dir, &delta);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!lfs_gstate_iszero(&delta)) {
|
|
|
|
lfs_gstate_tole32(&delta);
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
err = lfs_dir_commitattr(lfs, &commit,
|
2019-01-04 23:23:36 +00:00
|
|
|
LFS_MKTAG(LFS_TYPE_MOVESTATE, 0x3ff,
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
sizeof(delta)), &delta);
|
2018-10-21 02:02:25 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
goto relocate;
|
|
|
|
}
|
|
|
|
return err;
|
Introduced xored-globals logic to fix fundamental problem with moves
This was a big roadblock for a while: with the new feature of inlined
files, the existing move logic was fundamentally flawed.
To pull off atomic moves between two different metadata-pairs, littlefs
uses a simple, if a bit clumsy trick.
1. Marks entry as "moving"
2. Copies entry to new metadata-pair
3. Deletes old entry
If power is lost before the move operation is completed, we will find the
"moving" tag. This means there may or may not be an incomplete move on
the filesystem. In this case, we simply search for the moved entry, if
we find it, we remove the old entry, otherwise we just remove the
"moving" tag.
This worked perfectly, until we introduced inlined files. See, unlike
the existing directory and ctz entries, inlined files have no guarantee
they are unique. There is nothing we can search for that will allow us
to find a moved file unless we assign entries globally-unique ids. (note
that moves are fundamentally rename operations, so searching for names
does not make sense).
---
Solving this problem required completely restructuring how littlefs
handled moves and pulled out a really old idea that had been left in the
cutting room floor back when littlefs was going through many
designs: xored-globals.
The problem xored-globals solves is the need to maintain some global state
via commits to these distributed, independent metadata-pairs. The idea
is that we can use some sort of symmetric operation, such as xor, to
introduces deltas of the global state that can be committed atomically
along with any other info to these metadata-pairs.
This means that to figure out our global state, we xor together the global
delta stored in every metadata-pair.
Which means any commit can update the global state atomically, opening
up a whole new set atomic possibilities.
There is a couple of downsides. These globals may end up with deltas on
every single metadata-pair, effectively duplicating the data for each
block. Additionally, these globals need to have multiple copies in RAM.
This means and globals need to be a bounded size and very small, since even
small globals will have a large footprint.
---
On top of xored-globals, it's trivial to fix our move logic. Here we've
added an indirect delete tag which allows us to atomically specify a
delete of any entry on the filesystem.
Our move operation is now:
1. Copy entry to new metadata-pair and atomically xor globals to
indirectly delete our original entry.
2. Delete the original entry and xor globals to remove the indirect
delete.
Extra exciting is that this now takes our relatively clumsy move
operation into a sexy guaranteed O(1) move operation with no searching
necessary (though we do need to xor globals during mount).
Also reintroduced entry struct, now with a specific purpose to describe
the metadata-pair + id combo needed by indirect deletes to locate an
entry.
2018-05-29 17:35:23 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
// complete commit with crc
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
err = lfs_dir_commitcrc(lfs, &commit);
|
2018-05-28 07:08:16 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
goto relocate;
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2020-02-09 16:02:41 +00:00
|
|
|
// successful compaction, swap dir pair to indicate most recent
|
2019-07-24 19:24:29 +00:00
|
|
|
LFS_ASSERT(commit.off % lfs->cfg->prog_size == 0);
|
2020-02-09 16:02:41 +00:00
|
|
|
lfs_pair_swap(dir->pair);
|
|
|
|
dir->count = end - begin;
|
|
|
|
dir->off = commit.off;
|
|
|
|
dir->etag = commit.ptag;
|
Fixed broken wear-leveling when block_cycles = 2n-1
This was an interesting issue found during a GitHub discussion with
rmollway and thrasher8390.
Blocks in the metadata-pair are relocated every "block_cycles", or, more
mathy, when rev % block_cycles == 0 as long as rev += 1 every block write.
But there's a problem, rev isn't += 1 every block write. There are two
blocks in a metadata-pair, so looking at it from each blocks
perspective, rev += 2 every block write.
This leads to a sort of aliasing issue, where, if block_cycles is
divisible by 2, one block in the metadata-pair is always relocated, and
the other block is _never_ relocated. Causing a complete failure of
block-level wear-leveling.
Fortunately, because of a previous workaround to avoid block_cycles = 1
(since this will cause the relocation algorithm to never terminate), the
actual math is rev % (block_cycles+1) == 0. This means the bug only
shows its head in the much less likely case where block_cycles is a
multiple of 2 plus 1, or, in more mathy terms, block_cycles = 2n+1 for
some n.
To workaround this we can bitwise or our block_cycles with 1 to force it
to never be a multiple of 2n.
(Maybe we should do this during initialization? But then block_cycles
would need to be mutable.)
---
There's a few unrelated changes mixed into this commit that shouldn't be
there since I added this as part of a branch of bug fixes I'm putting
together rather hastily, so unfortunately this is not easily cherry-pickable.
2020-02-09 14:52:20 +00:00
|
|
|
// update gstate
|
|
|
|
lfs->gdelta = (lfs_gstate_t){0};
|
|
|
|
if (!relocated) {
|
|
|
|
lfs->gdisk = lfs->gstate;
|
|
|
|
}
|
2018-05-28 07:08:16 +00:00
|
|
|
}
|
|
|
|
break;
|
|
|
|
|
|
|
|
relocate:
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
// commit was corrupted, drop caches and prepare to relocate block
|
2018-05-28 07:08:16 +00:00
|
|
|
relocated = true;
|
2018-08-05 01:33:09 +00:00
|
|
|
lfs_cache_drop(lfs, &lfs->pcache);
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
if (!tired) {
|
2019-07-27 01:09:24 +00:00
|
|
|
LFS_DEBUG("Bad block at %"PRIx32, dir->pair[1]);
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
}
|
2018-05-28 07:08:16 +00:00
|
|
|
|
|
|
|
// can't relocate superblock, filesystem is now frozen
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
if (lfs_pair_cmp(dir->pair, (const lfs_block_t[2]){0, 1}) == 0) {
|
|
|
|
LFS_WARN("Superblock %"PRIx32" has become unwritable", dir->pair[1]);
|
2018-08-13 19:20:40 +00:00
|
|
|
return LFS_ERR_NOSPC;
|
2018-05-28 07:08:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// relocate half of pair
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
int err = lfs_alloc(lfs, &dir->pair[1]);
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
if (err && (err != LFS_ERR_NOSPC || !tired)) {
|
2018-05-28 07:08:16 +00:00
|
|
|
return err;
|
|
|
|
}
|
Restructured block devices again for better test exploitation
Also finished migrating tests with test_relocations and test_exhaustion.
The issue I was running into when migrating these tests was a lack of
flexibility with what you could do with the block devices. It was
possible to hack in some hooks for things like bad blocks and power
loss, but it wasn't clean or easily extendable.
The solution here was to just put all of these test extensions into a
third block device, testbd, that uses the other two example block
devices internally.
testbd has several useful features for testing. Note this makes it a
pretty terrible block device _example_ since these hooks look more
complicated than a block device needs to be.
- testbd can simulate different erase values, supporting 1s, 0s, other byte
patterns, or no erases at all (which can cause surprising bugs). This
actually depends on the simulated erase values in ramdb and filebd.
I did try to move this out of rambd/filebd, but it's not possible to
simulate erases in testbd without buffering entire blocks and creating
an excessive amount of extra write operations.
- testbd also helps simulate power-loss by containing a "power cycles"
counter that is decremented every write operation until it calls exit.
This is notably faster than the previous gdb approach, which is
valuable since the reentrant tests tend to take a while to resolve.
- testbd also tracks wear, which can be manually set and read. This is
very useful for testing things like bad block handling, wear leveling,
or even changing the effective size of the block device at runtime.
2020-01-16 12:30:40 +00:00
|
|
|
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
tired = false;
|
2018-05-28 07:08:16 +00:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2020-02-09 16:02:41 +00:00
|
|
|
if (relocated) {
|
|
|
|
// update references if we relocated
|
|
|
|
LFS_DEBUG("Relocating %"PRIx32" %"PRIx32" -> %"PRIx32" %"PRIx32,
|
|
|
|
oldpair[0], oldpair[1], dir->pair[0], dir->pair[1]);
|
|
|
|
int err = lfs_fs_relocate(lfs, oldpair, dir->pair);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
2020-01-30 04:05:58 +00:00
|
|
|
|
2018-05-28 07:08:16 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-07-17 23:31:30 +00:00
|
|
|
static int lfs_dir_commit(lfs_t *lfs, lfs_mdir_t *dir,
|
2019-01-08 14:52:03 +00:00
|
|
|
const struct lfs_mattr *attrs, int attrcount) {
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
// check for any inline files that aren't RAM backed and
|
|
|
|
// forcefully evict them, needed for filesystem consistency
|
|
|
|
for (lfs_file_t *f = (lfs_file_t*)lfs->mlist; f; f = f->next) {
|
|
|
|
if (dir != &f->m && lfs_pair_cmp(f->m.pair, dir->pair) == 0 &&
|
|
|
|
f->type == LFS_TYPE_REG && (f->flags & LFS_F_INLINE) &&
|
|
|
|
f->ctz.size > lfs->cfg->cache_size) {
|
2019-07-09 22:51:15 +00:00
|
|
|
int err = lfs_file_outline(lfs, f);
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = lfs_file_flush(lfs, f);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-01-04 23:23:36 +00:00
|
|
|
// calculate changes to the directory
|
2020-02-09 16:02:41 +00:00
|
|
|
lfs_mdir_t olddir = *dir;
|
2020-01-20 23:35:45 +00:00
|
|
|
bool hasdelete = false;
|
2019-01-08 14:52:03 +00:00
|
|
|
for (int i = 0; i < attrcount; i++) {
|
|
|
|
if (lfs_tag_type3(attrs[i].tag) == LFS_TYPE_CREATE) {
|
2019-01-04 23:23:36 +00:00
|
|
|
dir->count += 1;
|
2019-01-08 14:52:03 +00:00
|
|
|
} else if (lfs_tag_type3(attrs[i].tag) == LFS_TYPE_DELETE) {
|
2018-07-17 23:31:30 +00:00
|
|
|
LFS_ASSERT(dir->count > 0);
|
|
|
|
dir->count -= 1;
|
2020-01-20 23:35:45 +00:00
|
|
|
hasdelete = true;
|
2019-01-08 14:52:03 +00:00
|
|
|
} else if (lfs_tag_type1(attrs[i].tag) == LFS_TYPE_TAIL) {
|
|
|
|
dir->tail[0] = ((lfs_block_t*)attrs[i].buffer)[0];
|
|
|
|
dir->tail[1] = ((lfs_block_t*)attrs[i].buffer)[1];
|
|
|
|
dir->split = (lfs_tag_chunk(attrs[i].tag) & 1);
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_pair_fromle32(dir->tail);
|
2018-10-21 16:25:48 +00:00
|
|
|
}
|
|
|
|
}
|
2018-07-17 23:31:30 +00:00
|
|
|
|
2018-10-21 16:25:48 +00:00
|
|
|
// should we actually drop the directory block?
|
2020-01-20 23:35:45 +00:00
|
|
|
if (hasdelete && dir->count == 0) {
|
2018-10-21 16:25:48 +00:00
|
|
|
lfs_mdir_t pdir;
|
|
|
|
int err = lfs_fs_pred(lfs, dir->pair, &pdir);
|
|
|
|
if (err && err != LFS_ERR_NOENT) {
|
Fixed broken wear-leveling when block_cycles = 2n-1
This was an interesting issue found during a GitHub discussion with
rmollway and thrasher8390.
Blocks in the metadata-pair are relocated every "block_cycles", or, more
mathy, when rev % block_cycles == 0 as long as rev += 1 every block write.
But there's a problem, rev isn't += 1 every block write. There are two
blocks in a metadata-pair, so looking at it from each blocks
perspective, rev += 2 every block write.
This leads to a sort of aliasing issue, where, if block_cycles is
divisible by 2, one block in the metadata-pair is always relocated, and
the other block is _never_ relocated. Causing a complete failure of
block-level wear-leveling.
Fortunately, because of a previous workaround to avoid block_cycles = 1
(since this will cause the relocation algorithm to never terminate), the
actual math is rev % (block_cycles+1) == 0. This means the bug only
shows its head in the much less likely case where block_cycles is a
multiple of 2 plus 1, or, in more mathy terms, block_cycles = 2n+1 for
some n.
To workaround this we can bitwise or our block_cycles with 1 to force it
to never be a multiple of 2n.
(Maybe we should do this during initialization? But then block_cycles
would need to be mutable.)
---
There's a few unrelated changes mixed into this commit that shouldn't be
there since I added this as part of a branch of bug fixes I'm putting
together rather hastily, so unfortunately this is not easily cherry-pickable.
2020-02-09 14:52:20 +00:00
|
|
|
*dir = olddir;
|
2018-10-21 16:25:48 +00:00
|
|
|
return err;
|
2018-07-17 23:31:30 +00:00
|
|
|
}
|
2018-09-09 14:01:06 +00:00
|
|
|
|
2018-10-21 16:25:48 +00:00
|
|
|
if (err != LFS_ERR_NOENT && pdir.split) {
|
2020-02-09 16:02:41 +00:00
|
|
|
err = lfs_dir_drop(lfs, &pdir, dir);
|
|
|
|
if (err) {
|
|
|
|
*dir = olddir;
|
|
|
|
return err;
|
|
|
|
}
|
2018-10-21 16:25:48 +00:00
|
|
|
}
|
2018-07-17 23:31:30 +00:00
|
|
|
}
|
|
|
|
|
2019-04-09 23:56:53 +00:00
|
|
|
if (dir->erased || dir->count >= 0xff) {
|
2018-07-17 23:31:30 +00:00
|
|
|
// try to commit
|
2018-05-29 05:50:47 +00:00
|
|
|
struct lfs_commit commit = {
|
|
|
|
.block = dir->pair[0],
|
|
|
|
.off = dir->off,
|
|
|
|
.ptag = dir->etag,
|
2020-02-10 04:43:20 +00:00
|
|
|
.crc = 0xffffffff,
|
2018-07-13 20:04:31 +00:00
|
|
|
|
|
|
|
.begin = dir->off,
|
|
|
|
.end = lfs->cfg->block_size - 8,
|
2018-05-29 05:50:47 +00:00
|
|
|
};
|
|
|
|
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
// traverse attrs that need to be written out
|
|
|
|
lfs_pair_tole32(dir->tail);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
int err = lfs_dir_traverse(lfs,
|
2020-01-20 23:35:45 +00:00
|
|
|
dir, dir->off, dir->etag, attrs, attrcount,
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
0, 0, 0, 0, 0,
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
lfs_dir_commit_commit, &(struct lfs_dir_commit_commit){
|
|
|
|
lfs, &commit});
|
|
|
|
lfs_pair_fromle32(dir->tail);
|
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_NOSPC || err == LFS_ERR_CORRUPT) {
|
|
|
|
goto compact;
|
2018-05-29 05:50:47 +00:00
|
|
|
}
|
Fixed broken wear-leveling when block_cycles = 2n-1
This was an interesting issue found during a GitHub discussion with
rmollway and thrasher8390.
Blocks in the metadata-pair are relocated every "block_cycles", or, more
mathy, when rev % block_cycles == 0 as long as rev += 1 every block write.
But there's a problem, rev isn't += 1 every block write. There are two
blocks in a metadata-pair, so looking at it from each blocks
perspective, rev += 2 every block write.
This leads to a sort of aliasing issue, where, if block_cycles is
divisible by 2, one block in the metadata-pair is always relocated, and
the other block is _never_ relocated. Causing a complete failure of
block-level wear-leveling.
Fortunately, because of a previous workaround to avoid block_cycles = 1
(since this will cause the relocation algorithm to never terminate), the
actual math is rev % (block_cycles+1) == 0. This means the bug only
shows its head in the much less likely case where block_cycles is a
multiple of 2 plus 1, or, in more mathy terms, block_cycles = 2n+1 for
some n.
To workaround this we can bitwise or our block_cycles with 1 to force it
to never be a multiple of 2n.
(Maybe we should do this during initialization? But then block_cycles
would need to be mutable.)
---
There's a few unrelated changes mixed into this commit that shouldn't be
there since I added this as part of a branch of bug fixes I'm putting
together rather hastily, so unfortunately this is not easily cherry-pickable.
2020-02-09 14:52:20 +00:00
|
|
|
*dir = olddir;
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
return err;
|
2018-05-29 05:50:47 +00:00
|
|
|
}
|
2018-05-28 07:08:16 +00:00
|
|
|
|
2018-09-12 06:34:03 +00:00
|
|
|
// commit any global diffs if we have any
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs_gstate_t delta = {0};
|
|
|
|
lfs_gstate_xor(&delta, &lfs->gstate);
|
|
|
|
lfs_gstate_xor(&delta, &lfs->gdisk);
|
|
|
|
lfs_gstate_xor(&delta, &lfs->gdelta);
|
|
|
|
delta.tag &= ~LFS_MKTAG(0, 0, 0x3ff);
|
|
|
|
if (!lfs_gstate_iszero(&delta)) {
|
|
|
|
err = lfs_dir_getgstate(lfs, dir, &delta);
|
2018-09-15 03:02:39 +00:00
|
|
|
if (err) {
|
Fixed broken wear-leveling when block_cycles = 2n-1
This was an interesting issue found during a GitHub discussion with
rmollway and thrasher8390.
Blocks in the metadata-pair are relocated every "block_cycles", or, more
mathy, when rev % block_cycles == 0 as long as rev += 1 every block write.
But there's a problem, rev isn't += 1 every block write. There are two
blocks in a metadata-pair, so looking at it from each blocks
perspective, rev += 2 every block write.
This leads to a sort of aliasing issue, where, if block_cycles is
divisible by 2, one block in the metadata-pair is always relocated, and
the other block is _never_ relocated. Causing a complete failure of
block-level wear-leveling.
Fortunately, because of a previous workaround to avoid block_cycles = 1
(since this will cause the relocation algorithm to never terminate), the
actual math is rev % (block_cycles+1) == 0. This means the bug only
shows its head in the much less likely case where block_cycles is a
multiple of 2 plus 1, or, in more mathy terms, block_cycles = 2n+1 for
some n.
To workaround this we can bitwise or our block_cycles with 1 to force it
to never be a multiple of 2n.
(Maybe we should do this during initialization? But then block_cycles
would need to be mutable.)
---
There's a few unrelated changes mixed into this commit that shouldn't be
there since I added this as part of a branch of bug fixes I'm putting
together rather hastily, so unfortunately this is not easily cherry-pickable.
2020-02-09 14:52:20 +00:00
|
|
|
*dir = olddir;
|
2018-09-15 03:02:39 +00:00
|
|
|
return err;
|
2018-09-12 06:34:03 +00:00
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs_gstate_tole32(&delta);
|
2019-01-04 23:23:36 +00:00
|
|
|
err = lfs_dir_commitattr(lfs, &commit,
|
|
|
|
LFS_MKTAG(LFS_TYPE_MOVESTATE, 0x3ff,
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
sizeof(delta)), &delta);
|
2018-09-12 06:34:03 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_NOSPC || err == LFS_ERR_CORRUPT) {
|
|
|
|
goto compact;
|
|
|
|
}
|
Fixed broken wear-leveling when block_cycles = 2n-1
This was an interesting issue found during a GitHub discussion with
rmollway and thrasher8390.
Blocks in the metadata-pair are relocated every "block_cycles", or, more
mathy, when rev % block_cycles == 0 as long as rev += 1 every block write.
But there's a problem, rev isn't += 1 every block write. There are two
blocks in a metadata-pair, so looking at it from each blocks
perspective, rev += 2 every block write.
This leads to a sort of aliasing issue, where, if block_cycles is
divisible by 2, one block in the metadata-pair is always relocated, and
the other block is _never_ relocated. Causing a complete failure of
block-level wear-leveling.
Fortunately, because of a previous workaround to avoid block_cycles = 1
(since this will cause the relocation algorithm to never terminate), the
actual math is rev % (block_cycles+1) == 0. This means the bug only
shows its head in the much less likely case where block_cycles is a
multiple of 2 plus 1, or, in more mathy terms, block_cycles = 2n+1 for
some n.
To workaround this we can bitwise or our block_cycles with 1 to force it
to never be a multiple of 2n.
(Maybe we should do this during initialization? But then block_cycles
would need to be mutable.)
---
There's a few unrelated changes mixed into this commit that shouldn't be
there since I added this as part of a branch of bug fixes I'm putting
together rather hastily, so unfortunately this is not easily cherry-pickable.
2020-02-09 14:52:20 +00:00
|
|
|
*dir = olddir;
|
2018-09-12 06:34:03 +00:00
|
|
|
return err;
|
2018-07-02 03:29:42 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-09-12 06:34:03 +00:00
|
|
|
// finalize commit with the crc
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
err = lfs_dir_commitcrc(lfs, &commit);
|
2018-05-29 05:50:47 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_NOSPC || err == LFS_ERR_CORRUPT) {
|
|
|
|
goto compact;
|
|
|
|
}
|
Fixed broken wear-leveling when block_cycles = 2n-1
This was an interesting issue found during a GitHub discussion with
rmollway and thrasher8390.
Blocks in the metadata-pair are relocated every "block_cycles", or, more
mathy, when rev % block_cycles == 0 as long as rev += 1 every block write.
But there's a problem, rev isn't += 1 every block write. There are two
blocks in a metadata-pair, so looking at it from each blocks
perspective, rev += 2 every block write.
This leads to a sort of aliasing issue, where, if block_cycles is
divisible by 2, one block in the metadata-pair is always relocated, and
the other block is _never_ relocated. Causing a complete failure of
block-level wear-leveling.
Fortunately, because of a previous workaround to avoid block_cycles = 1
(since this will cause the relocation algorithm to never terminate), the
actual math is rev % (block_cycles+1) == 0. This means the bug only
shows its head in the much less likely case where block_cycles is a
multiple of 2 plus 1, or, in more mathy terms, block_cycles = 2n+1 for
some n.
To workaround this we can bitwise or our block_cycles with 1 to force it
to never be a multiple of 2n.
(Maybe we should do this during initialization? But then block_cycles
would need to be mutable.)
---
There's a few unrelated changes mixed into this commit that shouldn't be
there since I added this as part of a branch of bug fixes I'm putting
together rather hastily, so unfortunately this is not easily cherry-pickable.
2020-02-09 14:52:20 +00:00
|
|
|
*dir = olddir;
|
2018-05-29 05:50:47 +00:00
|
|
|
return err;
|
|
|
|
}
|
2018-05-28 07:08:16 +00:00
|
|
|
|
2018-07-13 20:04:31 +00:00
|
|
|
// successful commit, update dir
|
2019-07-24 19:24:29 +00:00
|
|
|
LFS_ASSERT(commit.off % lfs->cfg->prog_size == 0);
|
2018-05-29 05:50:47 +00:00
|
|
|
dir->off = commit.off;
|
|
|
|
dir->etag = commit.ptag;
|
2020-01-20 23:35:45 +00:00
|
|
|
// and update gstate
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs->gdisk = lfs->gstate;
|
|
|
|
lfs->gdelta = (lfs_gstate_t){0};
|
2018-10-21 02:02:25 +00:00
|
|
|
} else {
|
2018-08-05 01:33:09 +00:00
|
|
|
compact:
|
|
|
|
// fall back to compaction
|
|
|
|
lfs_cache_drop(lfs, &lfs->pcache);
|
2018-09-12 06:34:03 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
int err = lfs_dir_compact(lfs, dir, attrs, attrcount,
|
|
|
|
dir, 0, dir->count);
|
2018-08-05 01:33:09 +00:00
|
|
|
if (err) {
|
Fixed broken wear-leveling when block_cycles = 2n-1
This was an interesting issue found during a GitHub discussion with
rmollway and thrasher8390.
Blocks in the metadata-pair are relocated every "block_cycles", or, more
mathy, when rev % block_cycles == 0 as long as rev += 1 every block write.
But there's a problem, rev isn't += 1 every block write. There are two
blocks in a metadata-pair, so looking at it from each blocks
perspective, rev += 2 every block write.
This leads to a sort of aliasing issue, where, if block_cycles is
divisible by 2, one block in the metadata-pair is always relocated, and
the other block is _never_ relocated. Causing a complete failure of
block-level wear-leveling.
Fortunately, because of a previous workaround to avoid block_cycles = 1
(since this will cause the relocation algorithm to never terminate), the
actual math is rev % (block_cycles+1) == 0. This means the bug only
shows its head in the much less likely case where block_cycles is a
multiple of 2 plus 1, or, in more mathy terms, block_cycles = 2n+1 for
some n.
To workaround this we can bitwise or our block_cycles with 1 to force it
to never be a multiple of 2n.
(Maybe we should do this during initialization? But then block_cycles
would need to be mutable.)
---
There's a few unrelated changes mixed into this commit that shouldn't be
there since I added this as part of a branch of bug fixes I'm putting
together rather hastily, so unfortunately this is not easily cherry-pickable.
2020-02-09 14:52:20 +00:00
|
|
|
*dir = olddir;
|
2018-08-05 01:33:09 +00:00
|
|
|
return err;
|
|
|
|
}
|
2018-07-17 23:31:30 +00:00
|
|
|
}
|
2018-05-29 05:50:47 +00:00
|
|
|
|
2020-01-20 23:35:45 +00:00
|
|
|
// this complicated bit of logic is for fixing up any active
|
|
|
|
// metadata-pairs that we may have affected
|
|
|
|
//
|
|
|
|
// note we have to make two passes since the mdir passed to
|
|
|
|
// lfs_dir_commit could also be in this list, and even then
|
|
|
|
// we need to copy the pair so they don't get clobbered if we refetch
|
|
|
|
// our mdir.
|
2018-09-11 03:07:59 +00:00
|
|
|
for (struct lfs_mlist *d = lfs->mlist; d; d = d->next) {
|
2020-02-09 16:02:41 +00:00
|
|
|
if (&d->m != dir && lfs_pair_cmp(d->m.pair, olddir.pair) == 0) {
|
2020-01-20 23:35:45 +00:00
|
|
|
d->m = *dir;
|
|
|
|
for (int i = 0; i < attrcount; i++) {
|
|
|
|
if (lfs_tag_type3(attrs[i].tag) == LFS_TYPE_DELETE &&
|
|
|
|
d->id == lfs_tag_id(attrs[i].tag)) {
|
|
|
|
d->m.pair[0] = LFS_BLOCK_NULL;
|
|
|
|
d->m.pair[1] = LFS_BLOCK_NULL;
|
|
|
|
} else if (lfs_tag_type3(attrs[i].tag) == LFS_TYPE_DELETE &&
|
|
|
|
d->id > lfs_tag_id(attrs[i].tag)) {
|
|
|
|
d->id -= 1;
|
|
|
|
if (d->type == LFS_TYPE_DIR) {
|
|
|
|
((lfs_dir_t*)d)->pos -= 1;
|
|
|
|
}
|
|
|
|
} else if (lfs_tag_type3(attrs[i].tag) == LFS_TYPE_CREATE &&
|
|
|
|
d->id >= lfs_tag_id(attrs[i].tag)) {
|
|
|
|
d->id += 1;
|
|
|
|
if (d->type == LFS_TYPE_DIR) {
|
|
|
|
((lfs_dir_t*)d)->pos += 1;
|
|
|
|
}
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
}
|
2018-07-13 20:04:31 +00:00
|
|
|
}
|
2020-01-20 23:35:45 +00:00
|
|
|
}
|
|
|
|
}
|
2018-07-13 20:04:31 +00:00
|
|
|
|
2020-01-20 23:35:45 +00:00
|
|
|
for (struct lfs_mlist *d = lfs->mlist; d; d = d->next) {
|
2020-02-09 16:02:41 +00:00
|
|
|
if (lfs_pair_cmp(d->m.pair, olddir.pair) == 0) {
|
2018-08-01 15:24:59 +00:00
|
|
|
while (d->id >= d->m.count && d->m.split) {
|
|
|
|
// we split and id is on tail now
|
|
|
|
d->id -= d->m.count;
|
|
|
|
int err = lfs_dir_fetch(lfs, &d->m, d->m.tail);
|
2018-07-13 20:04:31 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2017-04-18 03:27:06 +00:00
|
|
|
/// Top level directory operations ///
|
2018-05-21 05:56:20 +00:00
|
|
|
int lfs_mkdir(lfs_t *lfs, const char *path) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mkdir(%p, \"%s\")", (void*)lfs, path);
|
2018-05-21 05:56:20 +00:00
|
|
|
// deorphan if we haven't yet, needed at most once after poweron
|
2018-08-01 10:52:48 +00:00
|
|
|
int err = lfs_fs_forceconsistency(lfs);
|
2018-07-31 13:07:36 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mkdir -> %d", err);
|
2018-07-31 13:07:36 +00:00
|
|
|
return err;
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
struct lfs_mlist cwd;
|
|
|
|
cwd.next = lfs->mlist;
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
uint16_t id;
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
err = lfs_dir_find(lfs, &cwd.m, &path, &id);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (!(err == LFS_ERR_NOENT && id != 0x3ff)) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mkdir -> %d", (err < 0) ? err : LFS_ERR_EXIST);
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
return (err < 0) ? err : LFS_ERR_EXIST;
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// check that name fits
|
|
|
|
lfs_size_t nlen = strlen(path);
|
2018-08-05 01:10:08 +00:00
|
|
|
if (nlen > lfs->name_max) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mkdir -> %d", LFS_ERR_NAMETOOLONG);
|
2018-05-21 05:56:20 +00:00
|
|
|
return LFS_ERR_NAMETOOLONG;
|
|
|
|
}
|
|
|
|
|
|
|
|
// build up new directory
|
|
|
|
lfs_alloc_ack(lfs);
|
2018-05-29 06:11:26 +00:00
|
|
|
lfs_mdir_t dir;
|
2018-08-01 15:24:59 +00:00
|
|
|
err = lfs_dir_alloc(lfs, &dir);
|
2018-05-21 05:56:20 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mkdir -> %d", err);
|
2018-05-21 05:56:20 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-10-05 23:22:33 +00:00
|
|
|
// find end of list
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs_mdir_t pred = cwd.m;
|
2018-10-05 23:22:33 +00:00
|
|
|
while (pred.split) {
|
|
|
|
err = lfs_dir_fetch(lfs, &pred, pred.tail);
|
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mkdir -> %d", err);
|
2018-10-05 23:22:33 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// setup dir
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_pair_tole32(pred.tail);
|
2019-01-08 14:52:03 +00:00
|
|
|
err = lfs_dir_commit(lfs, &dir, LFS_MKATTRS(
|
|
|
|
{LFS_MKTAG(LFS_TYPE_SOFTTAIL, 0x3ff, 8), pred.tail}));
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_pair_fromle32(pred.tail);
|
2018-05-21 05:56:20 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mkdir -> %d", err);
|
2018-05-21 05:56:20 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-10-05 23:22:33 +00:00
|
|
|
// current block end of list?
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
if (cwd.m.split) {
|
2018-10-05 23:22:33 +00:00
|
|
|
// update tails, this creates a desync
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_fs_preporphans(lfs, +1);
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
|
|
|
|
// it's possible our predecessor has to be relocated, and if
|
|
|
|
// our parent is our predecessor's predecessor, this could have
|
|
|
|
// caused our parent to go out of date, fortunately we can hook
|
|
|
|
// ourselves into littlefs to catch this
|
|
|
|
cwd.type = 0;
|
|
|
|
cwd.id = 0;
|
|
|
|
lfs->mlist = &cwd;
|
|
|
|
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_pair_tole32(dir.pair);
|
2019-01-08 14:52:03 +00:00
|
|
|
err = lfs_dir_commit(lfs, &pred, LFS_MKATTRS(
|
|
|
|
{LFS_MKTAG(LFS_TYPE_SOFTTAIL, 0x3ff, 8), dir.pair}));
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_pair_fromle32(dir.pair);
|
2018-10-05 23:22:33 +00:00
|
|
|
if (err) {
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs->mlist = cwd.next;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mkdir -> %d", err);
|
2018-10-05 23:22:33 +00:00
|
|
|
return err;
|
|
|
|
}
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
|
|
|
|
lfs->mlist = cwd.next;
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_fs_preporphans(lfs, -1);
|
2018-10-05 23:22:33 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// now insert into our parent block
|
2018-08-05 04:57:43 +00:00
|
|
|
lfs_pair_tole32(dir.pair);
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
err = lfs_dir_commit(lfs, &cwd.m, LFS_MKATTRS(
|
2019-02-12 06:11:01 +00:00
|
|
|
{LFS_MKTAG(LFS_TYPE_CREATE, id, 0), NULL},
|
2019-01-08 14:52:03 +00:00
|
|
|
{LFS_MKTAG(LFS_TYPE_DIR, id, nlen), path},
|
|
|
|
{LFS_MKTAG(LFS_TYPE_DIRSTRUCT, id, 8), dir.pair},
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
{LFS_MKTAG_IF(!cwd.m.split,
|
2020-01-20 23:35:45 +00:00
|
|
|
LFS_TYPE_SOFTTAIL, 0x3ff, 8), dir.pair}));
|
2018-08-05 04:57:43 +00:00
|
|
|
lfs_pair_fromle32(dir.pair);
|
2018-07-13 00:07:56 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mkdir -> %d", err);
|
2018-07-13 00:07:56 +00:00
|
|
|
return err;
|
|
|
|
}
|
2018-05-21 05:56:20 +00:00
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mkdir -> %d", 0);
|
2018-05-21 05:56:20 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-05-26 18:50:06 +00:00
|
|
|
int lfs_dir_open(lfs_t *lfs, lfs_dir_t *dir, const char *path) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_open(%p, %p, \"%s\")", (void*)lfs, (void*)dir, path);
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
lfs_stag_t tag = lfs_dir_find(lfs, &dir->m, &path, NULL);
|
2018-07-13 01:22:06 +00:00
|
|
|
if (tag < 0) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_dir_open -> %"PRId32, tag);
|
2018-07-13 01:22:06 +00:00
|
|
|
return tag;
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (lfs_tag_type3(tag) != LFS_TYPE_DIR) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_open -> %d", LFS_ERR_NOTDIR);
|
2018-07-09 17:51:31 +00:00
|
|
|
return LFS_ERR_NOTDIR;
|
|
|
|
}
|
|
|
|
|
2018-07-13 01:43:55 +00:00
|
|
|
lfs_block_t pair[2];
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (lfs_tag_id(tag) == 0x3ff) {
|
2018-05-26 18:50:06 +00:00
|
|
|
// handle root dir separately
|
2018-07-13 01:43:55 +00:00
|
|
|
pair[0] = lfs->root[0];
|
|
|
|
pair[1] = lfs->root[1];
|
2018-05-26 18:50:06 +00:00
|
|
|
} else {
|
|
|
|
// get dir pair from parent
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_stag_t res = lfs_dir_get(lfs, &dir->m, LFS_MKTAG(0x700, 0x3ff, 0),
|
2018-08-05 04:57:43 +00:00
|
|
|
LFS_MKTAG(LFS_TYPE_STRUCT, lfs_tag_id(tag), 8), pair);
|
2018-07-13 01:43:55 +00:00
|
|
|
if (res < 0) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_dir_open -> %"PRId32, res);
|
2018-07-13 01:43:55 +00:00
|
|
|
return res;
|
2018-05-26 18:50:06 +00:00
|
|
|
}
|
2018-08-05 04:57:43 +00:00
|
|
|
lfs_pair_fromle32(pair);
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
|
|
|
|
2018-05-26 18:50:06 +00:00
|
|
|
// fetch first pair
|
2018-07-13 01:43:55 +00:00
|
|
|
int err = lfs_dir_fetch(lfs, &dir->m, pair);
|
2018-05-21 05:56:20 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_open -> %d", err);
|
2018-05-21 05:56:20 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-05-29 05:50:47 +00:00
|
|
|
// setup entry
|
2018-05-29 06:11:26 +00:00
|
|
|
dir->head[0] = dir->m.pair[0];
|
|
|
|
dir->head[1] = dir->m.pair[1];
|
2018-05-21 05:56:20 +00:00
|
|
|
dir->id = 0;
|
2018-05-29 05:50:47 +00:00
|
|
|
dir->pos = 0;
|
2018-05-21 05:56:20 +00:00
|
|
|
|
2018-08-01 15:24:59 +00:00
|
|
|
// add to list of mdirs
|
|
|
|
dir->type = LFS_TYPE_DIR;
|
|
|
|
dir->next = (lfs_dir_t*)lfs->mlist;
|
2018-09-11 03:07:59 +00:00
|
|
|
lfs->mlist = (struct lfs_mlist*)dir;
|
2018-05-21 05:56:20 +00:00
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_open -> %d", 0);
|
2018-05-21 05:56:20 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-05-26 18:50:06 +00:00
|
|
|
int lfs_dir_close(lfs_t *lfs, lfs_dir_t *dir) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_close(%p, %p)", (void*)lfs, (void*)dir);
|
2018-08-01 15:24:59 +00:00
|
|
|
// remove from list of mdirs
|
2018-09-11 03:07:59 +00:00
|
|
|
for (struct lfs_mlist **p = &lfs->mlist; *p; p = &(*p)->next) {
|
|
|
|
if (*p == (struct lfs_mlist*)dir) {
|
2018-08-01 15:24:59 +00:00
|
|
|
*p = (*p)->next;
|
2018-05-21 05:56:20 +00:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_close -> %d", 0);
|
2018-05-21 05:56:20 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-05-26 18:50:06 +00:00
|
|
|
int lfs_dir_read(lfs_t *lfs, lfs_dir_t *dir, struct lfs_info *info) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_read(%p, %p, %p)",
|
|
|
|
(void*)lfs, (void*)dir, (void*)info);
|
2018-05-21 05:56:20 +00:00
|
|
|
memset(info, 0, sizeof(*info));
|
|
|
|
|
|
|
|
// special offset for '.' and '..'
|
|
|
|
if (dir->pos == 0) {
|
|
|
|
info->type = LFS_TYPE_DIR;
|
|
|
|
strcpy(info->name, ".");
|
|
|
|
dir->pos += 1;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_read -> %d", true);
|
|
|
|
return true;
|
2018-05-21 05:56:20 +00:00
|
|
|
} else if (dir->pos == 1) {
|
|
|
|
info->type = LFS_TYPE_DIR;
|
|
|
|
strcpy(info->name, "..");
|
|
|
|
dir->pos += 1;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_read -> %d", true);
|
|
|
|
return true;
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
while (true) {
|
2018-05-29 06:11:26 +00:00
|
|
|
if (dir->id == dir->m.count) {
|
|
|
|
if (!dir->m.split) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_read -> %d", false);
|
2018-05-21 05:56:20 +00:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2018-05-29 06:11:26 +00:00
|
|
|
int err = lfs_dir_fetch(lfs, &dir->m, dir->m.tail);
|
2018-05-21 05:56:20 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_read -> %d", err);
|
2018-05-21 05:56:20 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
dir->id = 0;
|
|
|
|
}
|
|
|
|
|
2018-05-29 06:11:26 +00:00
|
|
|
int err = lfs_dir_getinfo(lfs, &dir->m, dir->id, info);
|
2018-05-26 18:50:06 +00:00
|
|
|
if (err && err != LFS_ERR_NOENT) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_read -> %d", err);
|
2018-05-21 05:56:20 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
dir->id += 1;
|
2018-05-26 18:50:06 +00:00
|
|
|
if (err != LFS_ERR_NOENT) {
|
|
|
|
break;
|
|
|
|
}
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
dir->pos += 1;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_read -> %d", true);
|
2018-05-21 05:56:20 +00:00
|
|
|
return true;
|
|
|
|
}
|
2017-03-13 00:41:08 +00:00
|
|
|
|
2018-05-26 18:50:06 +00:00
|
|
|
int lfs_dir_seek(lfs_t *lfs, lfs_dir_t *dir, lfs_off_t off) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_seek(%p, %p, %"PRIu32")",
|
|
|
|
(void*)lfs, (void*)dir, off);
|
2018-05-26 00:04:01 +00:00
|
|
|
// simply walk from head dir
|
2018-05-26 18:50:06 +00:00
|
|
|
int err = lfs_dir_rewind(lfs, dir);
|
2017-03-25 21:20:31 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_seek -> %d", err);
|
2017-03-25 21:20:31 +00:00
|
|
|
return err;
|
2017-04-23 04:11:13 +00:00
|
|
|
}
|
|
|
|
|
2018-05-26 00:04:01 +00:00
|
|
|
// first two for ./..
|
2018-05-28 14:17:44 +00:00
|
|
|
dir->pos = lfs_min(2, off);
|
|
|
|
off -= dir->pos;
|
2017-03-25 21:20:31 +00:00
|
|
|
|
2019-11-27 17:27:30 +00:00
|
|
|
// skip superblock entry
|
|
|
|
dir->id = (off > 0 && lfs_pair_cmp(dir->head, lfs->root) == 0);
|
|
|
|
|
|
|
|
while (off > 0) {
|
|
|
|
int diff = lfs_min(dir->m.count - dir->id, off);
|
|
|
|
dir->id += diff;
|
|
|
|
dir->pos += diff;
|
|
|
|
off -= diff;
|
2018-05-28 14:17:44 +00:00
|
|
|
|
2018-05-29 06:11:26 +00:00
|
|
|
if (dir->id == dir->m.count) {
|
|
|
|
if (!dir->m.split) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_seek -> %d", LFS_ERR_INVAL);
|
2018-05-26 00:04:01 +00:00
|
|
|
return LFS_ERR_INVAL;
|
|
|
|
}
|
2017-04-15 16:26:37 +00:00
|
|
|
|
2018-08-05 01:33:09 +00:00
|
|
|
err = lfs_dir_fetch(lfs, &dir->m, dir->m.tail);
|
2018-05-26 00:04:01 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_seek -> %d", err);
|
2018-05-26 00:04:01 +00:00
|
|
|
return err;
|
|
|
|
}
|
2019-11-27 17:27:30 +00:00
|
|
|
|
|
|
|
dir->id = 0;
|
2018-05-26 00:04:01 +00:00
|
|
|
}
|
2017-04-23 04:11:13 +00:00
|
|
|
}
|
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_seek -> %d", 0);
|
2017-04-23 04:11:13 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-05-26 18:50:06 +00:00
|
|
|
lfs_soff_t lfs_dir_tell(lfs_t *lfs, lfs_dir_t *dir) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_tell(%p, %p)", (void*)lfs, (void*)dir);
|
2018-02-04 19:10:07 +00:00
|
|
|
(void)lfs;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_tell -> %"PRId32, dir->pos);
|
2017-04-23 04:11:13 +00:00
|
|
|
return dir->pos;
|
|
|
|
}
|
|
|
|
|
2018-05-26 18:50:06 +00:00
|
|
|
int lfs_dir_rewind(lfs_t *lfs, lfs_dir_t *dir) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_rewind(%p, %p)", (void*)lfs, (void*)dir);
|
2017-04-23 04:11:13 +00:00
|
|
|
// reload the head dir
|
2018-05-29 06:11:26 +00:00
|
|
|
int err = lfs_dir_fetch(lfs, &dir->m, dir->head);
|
2017-04-23 04:11:13 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_rewind -> %d", err);
|
2017-04-23 04:11:13 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-05-26 00:04:01 +00:00
|
|
|
dir->id = 0;
|
2018-05-29 05:50:47 +00:00
|
|
|
dir->pos = 0;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_dir_rewind -> %d", 0);
|
2017-04-23 04:11:13 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-03-25 21:20:31 +00:00
|
|
|
|
2017-04-30 16:19:37 +00:00
|
|
|
/// File index list operations ///
|
2018-08-05 04:57:43 +00:00
|
|
|
static int lfs_ctz_index(lfs_t *lfs, lfs_off_t *off) {
|
2017-10-17 00:08:47 +00:00
|
|
|
lfs_off_t size = *off;
|
2017-10-18 05:33:59 +00:00
|
|
|
lfs_off_t b = lfs->cfg->block_size - 2*4;
|
|
|
|
lfs_off_t i = size / b;
|
2017-10-17 00:08:47 +00:00
|
|
|
if (i == 0) {
|
|
|
|
return 0;
|
2017-04-23 00:48:31 +00:00
|
|
|
}
|
|
|
|
|
2017-10-18 05:33:59 +00:00
|
|
|
i = (size - 4*(lfs_popc(i-1)+2)) / b;
|
|
|
|
*off = size - b*i - 4*lfs_popc(i);
|
|
|
|
return i;
|
2017-04-23 00:48:31 +00:00
|
|
|
}
|
|
|
|
|
2018-08-05 04:57:43 +00:00
|
|
|
static int lfs_ctz_find(lfs_t *lfs,
|
|
|
|
const lfs_cache_t *pcache, lfs_cache_t *rcache,
|
2017-04-30 16:19:37 +00:00
|
|
|
lfs_block_t head, lfs_size_t size,
|
2017-04-23 00:48:31 +00:00
|
|
|
lfs_size_t pos, lfs_block_t *block, lfs_off_t *off) {
|
|
|
|
if (size == 0) {
|
2019-08-03 14:17:47 +00:00
|
|
|
*block = LFS_BLOCK_NULL;
|
2017-04-23 00:48:31 +00:00
|
|
|
*off = 0;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-08-05 04:57:43 +00:00
|
|
|
lfs_off_t current = lfs_ctz_index(lfs, &(lfs_off_t){size-1});
|
|
|
|
lfs_off_t target = lfs_ctz_index(lfs, &pos);
|
2017-04-23 00:48:31 +00:00
|
|
|
|
2017-04-23 02:42:22 +00:00
|
|
|
while (current > target) {
|
2017-04-23 00:48:31 +00:00
|
|
|
lfs_size_t skip = lfs_min(
|
|
|
|
lfs_npw2(current-target+1) - 1,
|
2017-10-10 23:48:24 +00:00
|
|
|
lfs_ctz(current));
|
2017-04-23 00:48:31 +00:00
|
|
|
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
int err = lfs_bd_read(lfs,
|
|
|
|
pcache, rcache, sizeof(head),
|
|
|
|
head, 4*skip, &head, sizeof(head));
|
2018-02-02 11:58:43 +00:00
|
|
|
head = lfs_fromle32(head);
|
2017-04-23 00:48:31 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-01-29 21:20:12 +00:00
|
|
|
LFS_ASSERT(head >= 2 && head <= lfs->cfg->block_count);
|
2017-04-23 00:48:31 +00:00
|
|
|
current -= 1 << skip;
|
|
|
|
}
|
|
|
|
|
|
|
|
*block = head;
|
|
|
|
*off = pos;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-08-05 04:57:43 +00:00
|
|
|
static int lfs_ctz_extend(lfs_t *lfs,
|
|
|
|
lfs_cache_t *pcache, lfs_cache_t *rcache,
|
2017-04-23 00:48:31 +00:00
|
|
|
lfs_block_t head, lfs_size_t size,
|
2017-11-16 21:10:17 +00:00
|
|
|
lfs_block_t *block, lfs_off_t *off) {
|
2017-05-14 17:01:45 +00:00
|
|
|
while (true) {
|
2017-11-16 21:10:17 +00:00
|
|
|
// go ahead and grab a block
|
|
|
|
lfs_block_t nblock;
|
|
|
|
int err = lfs_alloc(lfs, &nblock);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2018-01-29 21:20:12 +00:00
|
|
|
LFS_ASSERT(nblock >= 2 && nblock <= lfs->cfg->block_count);
|
2017-04-23 00:48:31 +00:00
|
|
|
|
2019-04-09 23:37:53 +00:00
|
|
|
{
|
2017-11-16 21:10:17 +00:00
|
|
|
err = lfs_bd_erase(lfs, nblock);
|
2017-10-17 00:31:56 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
goto relocate;
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
2017-04-23 00:48:31 +00:00
|
|
|
|
2017-10-17 00:31:56 +00:00
|
|
|
if (size == 0) {
|
2017-11-16 21:10:17 +00:00
|
|
|
*block = nblock;
|
2017-10-17 00:31:56 +00:00
|
|
|
*off = 0;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
lfs_size_t noff = size - 1;
|
|
|
|
lfs_off_t index = lfs_ctz_index(lfs, &noff);
|
|
|
|
noff = noff + 1;
|
2017-04-23 02:42:22 +00:00
|
|
|
|
2017-10-17 00:31:56 +00:00
|
|
|
// just copy out the last block if it is incomplete
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
if (noff != lfs->cfg->block_size) {
|
|
|
|
for (lfs_off_t i = 0; i < noff; i++) {
|
2017-10-17 00:31:56 +00:00
|
|
|
uint8_t data;
|
2018-10-21 02:02:25 +00:00
|
|
|
err = lfs_bd_read(lfs,
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
NULL, rcache, noff-i,
|
2017-10-17 00:31:56 +00:00
|
|
|
head, i, &data, 1);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2017-04-23 02:42:22 +00:00
|
|
|
|
2018-10-21 02:02:25 +00:00
|
|
|
err = lfs_bd_prog(lfs,
|
|
|
|
pcache, rcache, true,
|
2017-11-16 21:10:17 +00:00
|
|
|
nblock, i, &data, 1);
|
2017-10-17 00:31:56 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
goto relocate;
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
2017-05-14 17:01:45 +00:00
|
|
|
}
|
|
|
|
|
2017-11-16 21:10:17 +00:00
|
|
|
*block = nblock;
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
*off = noff;
|
2017-10-17 00:31:56 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
// append block
|
|
|
|
index += 1;
|
|
|
|
lfs_size_t skips = lfs_ctz(index) + 1;
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
lfs_block_t nhead = head;
|
2017-10-17 00:31:56 +00:00
|
|
|
for (lfs_off_t i = 0; i < skips; i++) {
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
nhead = lfs_tole32(nhead);
|
2018-10-21 02:02:25 +00:00
|
|
|
err = lfs_bd_prog(lfs, pcache, rcache, true,
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
nblock, 4*i, &nhead, 4);
|
|
|
|
nhead = lfs_fromle32(nhead);
|
2017-05-14 17:01:45 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
goto relocate;
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-10-17 00:31:56 +00:00
|
|
|
if (i != skips-1) {
|
2018-10-21 02:02:25 +00:00
|
|
|
err = lfs_bd_read(lfs,
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
NULL, rcache, sizeof(nhead),
|
|
|
|
nhead, 4*i, &nhead, sizeof(nhead));
|
|
|
|
nhead = lfs_fromle32(nhead);
|
2017-10-17 00:31:56 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2017-05-14 17:01:45 +00:00
|
|
|
}
|
|
|
|
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
LFS_ASSERT(nhead >= 2 && nhead <= lfs->cfg->block_count);
|
2017-05-14 17:01:45 +00:00
|
|
|
}
|
2017-09-17 17:53:18 +00:00
|
|
|
|
2017-11-16 21:10:17 +00:00
|
|
|
*block = nblock;
|
2017-10-17 00:31:56 +00:00
|
|
|
*off = 4*skips;
|
|
|
|
return 0;
|
2017-04-23 02:42:22 +00:00
|
|
|
}
|
|
|
|
|
2017-05-14 17:01:45 +00:00
|
|
|
relocate:
|
2019-07-27 01:09:24 +00:00
|
|
|
LFS_DEBUG("Bad block at %"PRIx32, nblock);
|
2017-04-23 00:48:31 +00:00
|
|
|
|
2017-05-14 17:01:45 +00:00
|
|
|
// just clear cache and try a new block
|
2018-08-05 01:33:09 +00:00
|
|
|
lfs_cache_drop(lfs, pcache);
|
2017-04-23 00:48:31 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-08-05 04:57:43 +00:00
|
|
|
static int lfs_ctz_traverse(lfs_t *lfs,
|
|
|
|
const lfs_cache_t *pcache, lfs_cache_t *rcache,
|
2017-04-23 00:48:31 +00:00
|
|
|
lfs_block_t head, lfs_size_t size,
|
2018-07-30 19:40:27 +00:00
|
|
|
int (*cb)(void*, lfs_block_t), void *data) {
|
2017-04-23 00:48:31 +00:00
|
|
|
if (size == 0) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-08-05 04:57:43 +00:00
|
|
|
lfs_off_t index = lfs_ctz_index(lfs, &(lfs_off_t){size-1});
|
2017-04-23 00:48:31 +00:00
|
|
|
|
|
|
|
while (true) {
|
2018-07-30 19:40:27 +00:00
|
|
|
int err = cb(data, head);
|
2017-04-23 00:48:31 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (index == 0) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-12-27 18:30:01 +00:00
|
|
|
lfs_block_t heads[2];
|
|
|
|
int count = 2 - (index & 1);
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
err = lfs_bd_read(lfs,
|
|
|
|
pcache, rcache, count*sizeof(head),
|
|
|
|
head, 0, &heads, count*sizeof(head));
|
2018-02-02 11:58:43 +00:00
|
|
|
heads[0] = lfs_fromle32(heads[0]);
|
|
|
|
heads[1] = lfs_fromle32(heads[1]);
|
2017-04-23 00:48:31 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2017-12-27 18:30:01 +00:00
|
|
|
for (int i = 0; i < count-1; i++) {
|
2018-07-30 19:40:27 +00:00
|
|
|
err = cb(data, heads[i]);
|
2017-12-27 18:30:01 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
head = heads[count-1];
|
|
|
|
index -= count;
|
2017-04-23 00:48:31 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2017-04-18 03:27:06 +00:00
|
|
|
/// Top level file operations ///
|
2018-07-29 20:03:23 +00:00
|
|
|
int lfs_file_opencfg(lfs_t *lfs, lfs_file_t *file,
|
|
|
|
const char *path, int flags,
|
|
|
|
const struct lfs_file_config *cfg) {
|
2019-07-27 01:09:24 +00:00
|
|
|
LFS_TRACE("lfs_file_opencfg(%p, %p, \"%s\", %x, %p {"
|
2019-05-31 09:40:19 +00:00
|
|
|
".buffer=%p, .attrs=%p, .attr_count=%"PRIu32"})",
|
|
|
|
(void*)lfs, (void*)file, path, flags,
|
|
|
|
(void*)cfg, cfg->buffer, (void*)cfg->attrs, cfg->attr_count);
|
2019-07-29 02:53:13 +00:00
|
|
|
|
2018-05-22 22:43:39 +00:00
|
|
|
// deorphan if we haven't yet, needed at most once after poweron
|
2018-07-31 13:07:36 +00:00
|
|
|
if ((flags & 3) != LFS_O_RDONLY) {
|
2018-08-01 10:52:48 +00:00
|
|
|
int err = lfs_fs_forceconsistency(lfs);
|
2018-05-22 22:43:39 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_opencfg -> %d", err);
|
2018-05-22 22:43:39 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-08-01 15:24:59 +00:00
|
|
|
// setup simple file details
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
int err;
|
2018-08-01 15:24:59 +00:00
|
|
|
file->cfg = cfg;
|
2019-07-21 12:36:40 +00:00
|
|
|
file->flags = flags | LFS_F_OPENED;
|
2018-08-01 15:24:59 +00:00
|
|
|
file->pos = 0;
|
2019-08-03 14:40:10 +00:00
|
|
|
file->off = 0;
|
2018-08-01 15:24:59 +00:00
|
|
|
file->cache.buffer = NULL;
|
|
|
|
|
2018-05-22 22:43:39 +00:00
|
|
|
// allocate entry for file if it doesn't exist
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
lfs_stag_t tag = lfs_dir_find(lfs, &file->m, &path, &file->id);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (tag < 0 && !(tag == LFS_ERR_NOENT && file->id != 0x3ff)) {
|
2018-08-01 15:24:59 +00:00
|
|
|
err = tag;
|
|
|
|
goto cleanup;
|
2018-05-22 22:43:39 +00:00
|
|
|
}
|
|
|
|
|
2018-08-01 15:24:59 +00:00
|
|
|
// get id, add to list of mdirs to catch update changes
|
|
|
|
file->type = LFS_TYPE_REG;
|
|
|
|
file->next = (lfs_file_t*)lfs->mlist;
|
2018-09-11 03:07:59 +00:00
|
|
|
lfs->mlist = (struct lfs_mlist*)file;
|
2018-08-01 15:24:59 +00:00
|
|
|
|
2018-07-13 01:22:06 +00:00
|
|
|
if (tag == LFS_ERR_NOENT) {
|
2018-05-22 22:43:39 +00:00
|
|
|
if (!(flags & LFS_O_CREAT)) {
|
2018-08-01 15:24:59 +00:00
|
|
|
err = LFS_ERR_NOENT;
|
|
|
|
goto cleanup;
|
2018-05-22 22:43:39 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// check that name fits
|
|
|
|
lfs_size_t nlen = strlen(path);
|
2018-08-05 01:10:08 +00:00
|
|
|
if (nlen > lfs->name_max) {
|
2018-08-01 15:24:59 +00:00
|
|
|
err = LFS_ERR_NAMETOOLONG;
|
|
|
|
goto cleanup;
|
2018-05-22 22:43:39 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// get next slot and create entry to remember name
|
2019-01-08 14:52:03 +00:00
|
|
|
err = lfs_dir_commit(lfs, &file->m, LFS_MKATTRS(
|
2020-01-20 23:35:45 +00:00
|
|
|
{LFS_MKTAG(LFS_TYPE_CREATE, file->id, 0)},
|
2019-01-08 14:52:03 +00:00
|
|
|
{LFS_MKTAG(LFS_TYPE_REG, file->id, nlen), path},
|
2020-01-20 23:35:45 +00:00
|
|
|
{LFS_MKTAG(LFS_TYPE_INLINESTRUCT, file->id, 0)}));
|
2018-05-26 18:50:06 +00:00
|
|
|
if (err) {
|
2018-08-01 15:24:59 +00:00
|
|
|
err = LFS_ERR_NAMETOOLONG;
|
|
|
|
goto cleanup;
|
2018-05-28 14:17:44 +00:00
|
|
|
}
|
|
|
|
|
2018-08-01 15:24:59 +00:00
|
|
|
tag = LFS_MKTAG(LFS_TYPE_INLINESTRUCT, 0, 0);
|
2018-07-13 01:43:55 +00:00
|
|
|
} else if (flags & LFS_O_EXCL) {
|
2018-08-01 15:24:59 +00:00
|
|
|
err = LFS_ERR_EXIST;
|
|
|
|
goto cleanup;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
} else if (lfs_tag_type3(tag) != LFS_TYPE_REG) {
|
2018-08-01 15:24:59 +00:00
|
|
|
err = LFS_ERR_ISDIR;
|
|
|
|
goto cleanup;
|
2018-07-13 01:43:55 +00:00
|
|
|
} else if (flags & LFS_O_TRUNC) {
|
|
|
|
// truncate if requested
|
2018-08-01 15:24:59 +00:00
|
|
|
tag = LFS_MKTAG(LFS_TYPE_INLINESTRUCT, file->id, 0);
|
|
|
|
file->flags |= LFS_F_DIRTY;
|
2018-05-26 18:50:06 +00:00
|
|
|
} else {
|
2018-07-13 01:43:55 +00:00
|
|
|
// try to load what's on disk, if it's inlined we'll fix it later
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
tag = lfs_dir_get(lfs, &file->m, LFS_MKTAG(0x700, 0x3ff, 0),
|
2018-08-01 15:24:59 +00:00
|
|
|
LFS_MKTAG(LFS_TYPE_STRUCT, file->id, 8), &file->ctz);
|
2018-07-13 01:43:55 +00:00
|
|
|
if (tag < 0) {
|
2018-08-01 15:24:59 +00:00
|
|
|
err = tag;
|
|
|
|
goto cleanup;
|
2018-05-22 22:43:39 +00:00
|
|
|
}
|
2018-08-05 04:57:43 +00:00
|
|
|
lfs_ctz_fromle32(&file->ctz);
|
2018-05-22 22:43:39 +00:00
|
|
|
}
|
|
|
|
|
2018-07-30 20:15:05 +00:00
|
|
|
// fetch attrs
|
2019-01-08 14:52:03 +00:00
|
|
|
for (unsigned i = 0; i < file->cfg->attr_count; i++) {
|
2018-07-30 20:15:05 +00:00
|
|
|
if ((file->flags & 3) != LFS_O_WRONLY) {
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_stag_t res = lfs_dir_get(lfs, &file->m,
|
|
|
|
LFS_MKTAG(0x7ff, 0x3ff, 0),
|
2019-01-08 14:52:03 +00:00
|
|
|
LFS_MKTAG(LFS_TYPE_USERATTR + file->cfg->attrs[i].type,
|
|
|
|
file->id, file->cfg->attrs[i].size),
|
|
|
|
file->cfg->attrs[i].buffer);
|
2018-07-30 20:15:05 +00:00
|
|
|
if (res < 0 && res != LFS_ERR_NOENT) {
|
2018-08-01 15:24:59 +00:00
|
|
|
err = res;
|
|
|
|
goto cleanup;
|
2018-07-29 20:03:23 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-07-30 20:15:05 +00:00
|
|
|
if ((file->flags & 3) != LFS_O_RDONLY) {
|
2019-01-08 14:52:03 +00:00
|
|
|
if (file->cfg->attrs[i].size > lfs->attr_max) {
|
2018-08-01 15:24:59 +00:00
|
|
|
err = LFS_ERR_NOSPC;
|
|
|
|
goto cleanup;
|
2018-07-30 20:15:05 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
file->flags |= LFS_F_DIRTY;
|
|
|
|
}
|
2018-07-29 20:03:23 +00:00
|
|
|
}
|
|
|
|
|
2017-04-30 16:19:37 +00:00
|
|
|
// allocate buffer if needed
|
2018-07-30 20:15:05 +00:00
|
|
|
if (file->cfg->buffer) {
|
2018-07-29 20:03:23 +00:00
|
|
|
file->cache.buffer = file->cfg->buffer;
|
2017-04-30 16:19:37 +00:00
|
|
|
} else {
|
2018-08-04 19:48:27 +00:00
|
|
|
file->cache.buffer = lfs_malloc(lfs->cfg->cache_size);
|
2017-04-30 16:19:37 +00:00
|
|
|
if (!file->cache.buffer) {
|
2018-08-01 15:24:59 +00:00
|
|
|
err = LFS_ERR_NOMEM;
|
|
|
|
goto cleanup;
|
2017-04-30 16:19:37 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-07-06 16:14:30 +00:00
|
|
|
// zero to avoid information leak
|
|
|
|
lfs_cache_zero(lfs, &file->cache);
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (lfs_tag_type3(tag) == LFS_TYPE_INLINESTRUCT) {
|
2018-04-03 13:28:09 +00:00
|
|
|
// load inline files
|
2019-08-03 14:17:47 +00:00
|
|
|
file->ctz.head = LFS_BLOCK_INLINE;
|
2018-08-05 04:57:43 +00:00
|
|
|
file->ctz.size = lfs_tag_size(tag);
|
2018-03-17 15:28:14 +00:00
|
|
|
file->flags |= LFS_F_INLINE;
|
2018-07-13 01:43:55 +00:00
|
|
|
file->cache.block = file->ctz.head;
|
2018-03-17 15:28:14 +00:00
|
|
|
file->cache.off = 0;
|
2018-08-04 19:48:27 +00:00
|
|
|
file->cache.size = lfs->cfg->cache_size;
|
2018-07-13 01:43:55 +00:00
|
|
|
|
|
|
|
// don't always read (may be new/trunc file)
|
|
|
|
if (file->ctz.size > 0) {
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_stag_t res = lfs_dir_get(lfs, &file->m,
|
|
|
|
LFS_MKTAG(0x700, 0x3ff, 0),
|
Added better handling of large program sizes (> 1024)
The issue here is how commits handle padding to the nearest program
size. This is done by exploiting the size field of the LFS_TYPE_CRC
tag that completes the commit. Unfortunately, during developement, the
size field shrank in size to make room for more type information,
limiting the size field to 1024.
Normally this isn't a problem, as very rarely do program sizes exceed
1024 bytes. However, using a simulated block device, user earlephilhower
found that exceeding 1024 caused littlefs to crash.
To make this corner case behave in a more user friendly manner, I've
modified this situtation to treat >1024 program sizes as small commits
that don't match the prog size. As a part of this, littlefs also needed
to understand that non-matching commits indicate an "unerased" dir
block, which would be needed for portability (something which notably
lacks testing).
This raises the question of if the tag size field size needs to be
reconsidered, but to change that at this point would need a new major
version.
found by earlephilhower
2019-04-09 21:06:43 +00:00
|
|
|
LFS_MKTAG(LFS_TYPE_STRUCT, file->id,
|
|
|
|
lfs_min(file->cache.size, 0x3fe)),
|
2018-07-13 01:43:55 +00:00
|
|
|
file->cache.buffer);
|
|
|
|
if (res < 0) {
|
2018-08-01 15:24:59 +00:00
|
|
|
err = res;
|
|
|
|
goto cleanup;
|
2018-05-26 18:50:06 +00:00
|
|
|
}
|
2018-03-17 15:28:14 +00:00
|
|
|
}
|
2018-04-03 13:28:09 +00:00
|
|
|
}
|
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_opencfg -> %d", 0);
|
2017-04-18 03:27:06 +00:00
|
|
|
return 0;
|
2018-08-01 15:24:59 +00:00
|
|
|
|
|
|
|
cleanup:
|
|
|
|
// clean up lingering resources
|
|
|
|
file->flags |= LFS_F_ERRED;
|
|
|
|
lfs_file_close(lfs, file);
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_opencfg -> %d", err);
|
2018-08-01 15:24:59 +00:00
|
|
|
return err;
|
2017-03-20 03:00:56 +00:00
|
|
|
}
|
|
|
|
|
2018-07-29 20:03:23 +00:00
|
|
|
int lfs_file_open(lfs_t *lfs, lfs_file_t *file,
|
|
|
|
const char *path, int flags) {
|
2019-07-27 01:09:24 +00:00
|
|
|
LFS_TRACE("lfs_file_open(%p, %p, \"%s\", %x)",
|
2019-05-31 09:40:19 +00:00
|
|
|
(void*)lfs, (void*)file, path, flags);
|
2018-07-30 20:15:05 +00:00
|
|
|
static const struct lfs_file_config defaults = {0};
|
2019-05-31 09:40:19 +00:00
|
|
|
int err = lfs_file_opencfg(lfs, file, path, flags, &defaults);
|
|
|
|
LFS_TRACE("lfs_file_open -> %d", err);
|
|
|
|
return err;
|
2018-07-29 20:03:23 +00:00
|
|
|
}
|
|
|
|
|
2018-05-23 04:57:19 +00:00
|
|
|
int lfs_file_close(lfs_t *lfs, lfs_file_t *file) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_close(%p, %p)", (void*)lfs, (void*)file);
|
2019-07-21 09:34:53 +00:00
|
|
|
LFS_ASSERT(file->flags & LFS_F_OPENED);
|
|
|
|
|
2018-05-23 04:57:19 +00:00
|
|
|
int err = lfs_file_sync(lfs, file);
|
2018-05-22 22:43:39 +00:00
|
|
|
|
2018-08-01 15:24:59 +00:00
|
|
|
// remove from list of mdirs
|
2018-09-11 03:07:59 +00:00
|
|
|
for (struct lfs_mlist **p = &lfs->mlist; *p; p = &(*p)->next) {
|
|
|
|
if (*p == (struct lfs_mlist*)file) {
|
2018-08-01 15:24:59 +00:00
|
|
|
*p = (*p)->next;
|
2018-05-23 04:57:19 +00:00
|
|
|
break;
|
2018-05-22 22:43:39 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-05-23 04:57:19 +00:00
|
|
|
// clean up memory
|
2018-08-11 18:45:13 +00:00
|
|
|
if (!file->cfg->buffer) {
|
2018-05-23 04:57:19 +00:00
|
|
|
lfs_free(file->cache.buffer);
|
2018-05-22 22:43:39 +00:00
|
|
|
}
|
|
|
|
|
2019-07-21 09:34:53 +00:00
|
|
|
file->flags &= ~LFS_F_OPENED;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_close -> %d", err);
|
2018-05-23 04:57:19 +00:00
|
|
|
return err;
|
2018-05-22 22:43:39 +00:00
|
|
|
}
|
|
|
|
|
2018-04-11 00:55:17 +00:00
|
|
|
static int lfs_file_relocate(lfs_t *lfs, lfs_file_t *file) {
|
2019-07-21 09:34:53 +00:00
|
|
|
LFS_ASSERT(file->flags & LFS_F_OPENED);
|
|
|
|
|
2018-07-31 02:12:00 +00:00
|
|
|
while (true) {
|
|
|
|
// just relocate what exists into new block
|
|
|
|
lfs_block_t nblock;
|
|
|
|
int err = lfs_alloc(lfs, &nblock);
|
2017-06-25 22:23:40 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
2017-06-25 19:01:33 +00:00
|
|
|
}
|
|
|
|
|
2018-07-31 02:12:00 +00:00
|
|
|
err = lfs_bd_erase(lfs, nblock);
|
2017-06-25 19:01:33 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
goto relocate;
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-07-31 02:12:00 +00:00
|
|
|
// either read from dirty cache or disk
|
|
|
|
for (lfs_off_t i = 0; i < file->off; i++) {
|
|
|
|
uint8_t data;
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
if (file->flags & LFS_F_INLINE) {
|
|
|
|
err = lfs_dir_getread(lfs, &file->m,
|
|
|
|
// note we evict inline files before they can be dirty
|
|
|
|
NULL, &file->cache, file->off-i,
|
|
|
|
LFS_MKTAG(0xfff, 0x1ff, 0),
|
|
|
|
LFS_MKTAG(LFS_TYPE_INLINESTRUCT, file->id, 0),
|
|
|
|
i, &data, 1);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
err = lfs_bd_read(lfs,
|
|
|
|
&file->cache, &lfs->rcache, file->off-i,
|
|
|
|
file->block, i, &data, 1);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2018-07-31 02:12:00 +00:00
|
|
|
}
|
2017-06-25 19:01:33 +00:00
|
|
|
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
err = lfs_bd_prog(lfs,
|
|
|
|
&lfs->pcache, &lfs->rcache, true,
|
2018-07-31 02:12:00 +00:00
|
|
|
nblock, i, &data, 1);
|
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
goto relocate;
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// copy over new state of file
|
2018-08-04 19:48:27 +00:00
|
|
|
memcpy(file->cache.buffer, lfs->pcache.buffer, lfs->cfg->cache_size);
|
2018-07-31 02:12:00 +00:00
|
|
|
file->cache.block = lfs->pcache.block;
|
|
|
|
file->cache.off = lfs->pcache.off;
|
2018-08-04 19:48:27 +00:00
|
|
|
file->cache.size = lfs->pcache.size;
|
2018-08-05 01:33:09 +00:00
|
|
|
lfs_cache_zero(lfs, &lfs->pcache);
|
2018-07-31 02:12:00 +00:00
|
|
|
|
|
|
|
file->block = nblock;
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
file->flags |= LFS_F_WRITING;
|
2018-07-31 02:12:00 +00:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
relocate:
|
2019-07-27 01:09:24 +00:00
|
|
|
LFS_DEBUG("Bad block at %"PRIx32, nblock);
|
2018-08-13 19:08:30 +00:00
|
|
|
|
|
|
|
// just clear cache and try a new block
|
|
|
|
lfs_cache_drop(lfs, &lfs->pcache);
|
2018-07-31 02:12:00 +00:00
|
|
|
}
|
2018-04-11 00:55:17 +00:00
|
|
|
}
|
2018-04-08 21:58:12 +00:00
|
|
|
|
2019-07-09 22:51:15 +00:00
|
|
|
static int lfs_file_outline(lfs_t *lfs, lfs_file_t *file) {
|
|
|
|
file->off = file->pos;
|
|
|
|
lfs_alloc_ack(lfs);
|
|
|
|
int err = lfs_file_relocate(lfs, file);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
file->flags &= ~LFS_F_INLINE;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-04-11 00:55:17 +00:00
|
|
|
static int lfs_file_flush(lfs_t *lfs, lfs_file_t *file) {
|
2019-07-21 09:34:53 +00:00
|
|
|
LFS_ASSERT(file->flags & LFS_F_OPENED);
|
|
|
|
|
2019-01-22 22:21:16 +00:00
|
|
|
if (file->flags & LFS_F_READING) {
|
|
|
|
if (!(file->flags & LFS_F_INLINE)) {
|
|
|
|
lfs_cache_drop(lfs, &file->cache);
|
|
|
|
}
|
|
|
|
file->flags &= ~LFS_F_READING;
|
|
|
|
}
|
2017-04-23 02:42:22 +00:00
|
|
|
|
2018-04-11 00:55:17 +00:00
|
|
|
if (file->flags & LFS_F_WRITING) {
|
|
|
|
lfs_off_t pos = file->pos;
|
2018-04-08 21:58:12 +00:00
|
|
|
|
2018-04-11 00:55:17 +00:00
|
|
|
if (!(file->flags & LFS_F_INLINE)) {
|
|
|
|
// copy over anything after current branch
|
|
|
|
lfs_file_t orig = {
|
2018-07-13 01:43:55 +00:00
|
|
|
.ctz.head = file->ctz.head,
|
|
|
|
.ctz.size = file->ctz.size,
|
2019-07-21 12:36:40 +00:00
|
|
|
.flags = LFS_O_RDONLY | LFS_F_OPENED,
|
2018-04-11 00:55:17 +00:00
|
|
|
.pos = file->pos,
|
|
|
|
.cache = lfs->rcache,
|
|
|
|
};
|
2018-08-05 01:33:09 +00:00
|
|
|
lfs_cache_drop(lfs, &lfs->rcache);
|
2017-04-24 04:39:50 +00:00
|
|
|
|
2018-07-13 01:43:55 +00:00
|
|
|
while (file->pos < file->ctz.size) {
|
2018-03-19 01:36:48 +00:00
|
|
|
// copy over a byte at a time, leave it up to caching
|
|
|
|
// to make this efficient
|
|
|
|
uint8_t data;
|
|
|
|
lfs_ssize_t res = lfs_file_read(lfs, &orig, &data, 1);
|
|
|
|
if (res < 0) {
|
|
|
|
return res;
|
|
|
|
}
|
2017-04-23 02:42:22 +00:00
|
|
|
|
2018-03-19 01:36:48 +00:00
|
|
|
res = lfs_file_write(lfs, file, &data, 1);
|
|
|
|
if (res < 0) {
|
|
|
|
return res;
|
|
|
|
}
|
2017-04-30 16:19:37 +00:00
|
|
|
|
2018-03-19 01:36:48 +00:00
|
|
|
// keep our reference to the rcache in sync
|
2019-08-03 14:17:47 +00:00
|
|
|
if (lfs->rcache.block != LFS_BLOCK_NULL) {
|
2018-08-05 01:33:09 +00:00
|
|
|
lfs_cache_drop(lfs, &orig.cache);
|
|
|
|
lfs_cache_drop(lfs, &lfs->rcache);
|
2018-03-19 01:36:48 +00:00
|
|
|
}
|
2017-04-30 16:19:37 +00:00
|
|
|
}
|
|
|
|
|
2018-03-19 01:36:48 +00:00
|
|
|
// write out what we have
|
|
|
|
while (true) {
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
int err = lfs_bd_flush(lfs, &file->cache, &lfs->rcache, true);
|
2018-03-19 01:36:48 +00:00
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
goto relocate;
|
|
|
|
}
|
|
|
|
return err;
|
2017-06-25 19:01:33 +00:00
|
|
|
}
|
|
|
|
|
2018-03-19 01:36:48 +00:00
|
|
|
break;
|
2018-07-31 02:12:00 +00:00
|
|
|
|
2017-06-25 19:01:33 +00:00
|
|
|
relocate:
|
2019-07-27 01:09:24 +00:00
|
|
|
LFS_DEBUG("Bad block at %"PRIx32, file->block);
|
2018-03-19 01:36:48 +00:00
|
|
|
err = lfs_file_relocate(lfs, file);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2017-06-25 19:01:33 +00:00
|
|
|
}
|
2018-03-19 01:36:48 +00:00
|
|
|
} else {
|
2019-05-28 18:55:03 +00:00
|
|
|
file->pos = lfs_max(file->pos, file->ctz.size);
|
2017-04-23 02:42:22 +00:00
|
|
|
}
|
|
|
|
|
2017-04-24 04:39:50 +00:00
|
|
|
// actual file updates
|
2018-07-13 01:43:55 +00:00
|
|
|
file->ctz.head = file->block;
|
|
|
|
file->ctz.size = file->pos;
|
2017-04-30 16:19:37 +00:00
|
|
|
file->flags &= ~LFS_F_WRITING;
|
|
|
|
file->flags |= LFS_F_DIRTY;
|
2017-04-24 04:39:50 +00:00
|
|
|
|
|
|
|
file->pos = pos;
|
|
|
|
}
|
2017-04-23 02:42:22 +00:00
|
|
|
|
2017-04-23 04:11:13 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-05-23 04:57:19 +00:00
|
|
|
int lfs_file_sync(lfs_t *lfs, lfs_file_t *file) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_sync(%p, %p)", (void*)lfs, (void*)file);
|
2019-07-21 09:34:53 +00:00
|
|
|
LFS_ASSERT(file->flags & LFS_F_OPENED);
|
|
|
|
|
2020-02-17 18:00:47 +00:00
|
|
|
if (file->flags & LFS_F_ERRED) {
|
|
|
|
// it's not safe to do anything if our file errored
|
|
|
|
LFS_TRACE("lfs_file_sync -> %d", 0);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-02-09 16:02:41 +00:00
|
|
|
int err = lfs_file_flush(lfs, file);
|
|
|
|
if (err) {
|
|
|
|
file->flags |= LFS_F_ERRED;
|
|
|
|
LFS_TRACE("lfs_file_sync -> %d", err);
|
|
|
|
return err;
|
|
|
|
}
|
2018-07-31 02:12:00 +00:00
|
|
|
|
2020-02-09 16:02:41 +00:00
|
|
|
if ((file->flags & LFS_F_DIRTY) &&
|
|
|
|
!lfs_pair_isnull(file->m.pair)) {
|
|
|
|
// update dir entry
|
|
|
|
uint16_t type;
|
|
|
|
const void *buffer;
|
|
|
|
lfs_size_t size;
|
|
|
|
struct lfs_ctz ctz;
|
|
|
|
if (file->flags & LFS_F_INLINE) {
|
|
|
|
// inline the whole file
|
|
|
|
type = LFS_TYPE_INLINESTRUCT;
|
|
|
|
buffer = file->cache.buffer;
|
|
|
|
size = file->ctz.size;
|
|
|
|
} else {
|
|
|
|
// update the ctz reference
|
|
|
|
type = LFS_TYPE_CTZSTRUCT;
|
|
|
|
// copy ctz so alloc will work during a relocate
|
|
|
|
ctz = file->ctz;
|
|
|
|
lfs_ctz_tole32(&ctz);
|
|
|
|
buffer = &ctz;
|
|
|
|
size = sizeof(ctz);
|
2017-04-24 04:39:50 +00:00
|
|
|
}
|
|
|
|
|
2020-02-09 16:02:41 +00:00
|
|
|
// commit file data and attributes
|
|
|
|
err = lfs_dir_commit(lfs, &file->m, LFS_MKATTRS(
|
|
|
|
{LFS_MKTAG(type, file->id, size), buffer},
|
|
|
|
{LFS_MKTAG(LFS_FROM_USERATTRS, file->id,
|
|
|
|
file->cfg->attr_count), file->cfg->attrs}));
|
2018-07-31 02:12:00 +00:00
|
|
|
if (err) {
|
2019-04-09 22:41:26 +00:00
|
|
|
file->flags |= LFS_F_ERRED;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_sync -> %d", err);
|
2018-07-31 02:12:00 +00:00
|
|
|
return err;
|
|
|
|
}
|
2020-02-09 16:02:41 +00:00
|
|
|
|
|
|
|
file->flags &= ~LFS_F_DIRTY;
|
2018-07-31 02:12:00 +00:00
|
|
|
}
|
2020-02-09 16:02:41 +00:00
|
|
|
|
|
|
|
LFS_TRACE("lfs_file_sync -> %d", 0);
|
|
|
|
return 0;
|
2017-03-20 03:00:56 +00:00
|
|
|
}
|
|
|
|
|
2017-04-24 02:40:03 +00:00
|
|
|
lfs_ssize_t lfs_file_read(lfs_t *lfs, lfs_file_t *file,
|
|
|
|
void *buffer, lfs_size_t size) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_read(%p, %p, %p, %"PRIu32")",
|
|
|
|
(void*)lfs, (void*)file, buffer, size);
|
2019-07-21 09:34:53 +00:00
|
|
|
LFS_ASSERT(file->flags & LFS_F_OPENED);
|
2019-07-24 19:59:48 +00:00
|
|
|
LFS_ASSERT((file->flags & 3) != LFS_O_WRONLY);
|
2017-04-24 02:40:03 +00:00
|
|
|
|
|
|
|
uint8_t *data = buffer;
|
|
|
|
lfs_size_t nsize = size;
|
|
|
|
|
2017-04-30 16:19:37 +00:00
|
|
|
if (file->flags & LFS_F_WRITING) {
|
2017-04-24 04:39:50 +00:00
|
|
|
// flush out any writes
|
|
|
|
int err = lfs_file_flush(lfs, file);
|
|
|
|
if (err) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_read -> %d", err);
|
2017-04-24 04:39:50 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-07-13 01:43:55 +00:00
|
|
|
if (file->pos >= file->ctz.size) {
|
2017-09-17 22:57:12 +00:00
|
|
|
// eof if past end
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_read -> %d", 0);
|
2017-09-17 22:57:12 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-07-13 01:43:55 +00:00
|
|
|
size = lfs_min(size, file->ctz.size - file->pos);
|
2017-04-24 04:39:50 +00:00
|
|
|
nsize = size;
|
|
|
|
|
2017-04-24 02:40:03 +00:00
|
|
|
while (nsize > 0) {
|
|
|
|
// check if we need a new block
|
2017-04-30 16:19:37 +00:00
|
|
|
if (!(file->flags & LFS_F_READING) ||
|
|
|
|
file->off == lfs->cfg->block_size) {
|
2018-03-19 01:36:48 +00:00
|
|
|
if (!(file->flags & LFS_F_INLINE)) {
|
2018-08-05 04:57:43 +00:00
|
|
|
int err = lfs_ctz_find(lfs, NULL, &file->cache,
|
2018-07-13 01:43:55 +00:00
|
|
|
file->ctz.head, file->ctz.size,
|
2018-03-17 15:28:14 +00:00
|
|
|
file->pos, &file->block, &file->off);
|
|
|
|
if (err) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_read -> %d", err);
|
2018-03-17 15:28:14 +00:00
|
|
|
return err;
|
|
|
|
}
|
2018-03-19 01:36:48 +00:00
|
|
|
} else {
|
2019-08-03 14:17:47 +00:00
|
|
|
file->block = LFS_BLOCK_INLINE;
|
2018-03-19 01:36:48 +00:00
|
|
|
file->off = file->pos;
|
2017-04-24 02:40:03 +00:00
|
|
|
}
|
2017-04-30 16:19:37 +00:00
|
|
|
|
|
|
|
file->flags |= LFS_F_READING;
|
2017-04-24 02:40:03 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// read as much as we can in current block
|
2017-04-30 16:19:37 +00:00
|
|
|
lfs_size_t diff = lfs_min(nsize, lfs->cfg->block_size - file->off);
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
if (file->flags & LFS_F_INLINE) {
|
|
|
|
int err = lfs_dir_getread(lfs, &file->m,
|
|
|
|
NULL, &file->cache, lfs->cfg->block_size,
|
|
|
|
LFS_MKTAG(0xfff, 0x1ff, 0),
|
|
|
|
LFS_MKTAG(LFS_TYPE_INLINESTRUCT, file->id, 0),
|
|
|
|
file->off, data, diff);
|
|
|
|
if (err) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_read -> %d", err);
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
int err = lfs_bd_read(lfs,
|
|
|
|
NULL, &file->cache, lfs->cfg->block_size,
|
|
|
|
file->block, file->off, data, diff);
|
|
|
|
if (err) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_read -> %d", err);
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
return err;
|
|
|
|
}
|
2017-04-24 02:40:03 +00:00
|
|
|
}
|
|
|
|
|
2017-04-24 04:39:50 +00:00
|
|
|
file->pos += diff;
|
2017-04-30 16:19:37 +00:00
|
|
|
file->off += diff;
|
2017-04-24 02:40:03 +00:00
|
|
|
data += diff;
|
|
|
|
nsize -= diff;
|
|
|
|
}
|
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_read -> %"PRId32, size);
|
2017-04-24 02:40:03 +00:00
|
|
|
return size;
|
|
|
|
}
|
|
|
|
|
2017-03-20 03:00:56 +00:00
|
|
|
lfs_ssize_t lfs_file_write(lfs_t *lfs, lfs_file_t *file,
|
|
|
|
const void *buffer, lfs_size_t size) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_write(%p, %p, %p, %"PRIu32")",
|
|
|
|
(void*)lfs, (void*)file, buffer, size);
|
2019-07-21 09:34:53 +00:00
|
|
|
LFS_ASSERT(file->flags & LFS_F_OPENED);
|
2019-07-24 19:59:48 +00:00
|
|
|
LFS_ASSERT((file->flags & 3) != LFS_O_RDONLY);
|
2017-04-23 04:11:13 +00:00
|
|
|
|
2017-03-20 03:00:56 +00:00
|
|
|
const uint8_t *data = buffer;
|
|
|
|
lfs_size_t nsize = size;
|
|
|
|
|
2017-04-30 16:19:37 +00:00
|
|
|
if (file->flags & LFS_F_READING) {
|
2017-04-24 04:39:50 +00:00
|
|
|
// drop any reads
|
|
|
|
int err = lfs_file_flush(lfs, file);
|
|
|
|
if (err) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_write -> %d", err);
|
2017-04-24 04:39:50 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-07-13 01:43:55 +00:00
|
|
|
if ((file->flags & LFS_O_APPEND) && file->pos < file->ctz.size) {
|
|
|
|
file->pos = file->ctz.size;
|
2017-04-24 04:39:50 +00:00
|
|
|
}
|
|
|
|
|
2018-10-21 02:02:25 +00:00
|
|
|
if (file->pos + size > lfs->file_max) {
|
|
|
|
// Larger than file limit?
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_write -> %d", LFS_ERR_FBIG);
|
2018-10-08 19:12:20 +00:00
|
|
|
return LFS_ERR_FBIG;
|
|
|
|
}
|
|
|
|
|
2018-07-13 01:43:55 +00:00
|
|
|
if (!(file->flags & LFS_F_WRITING) && file->pos > file->ctz.size) {
|
2017-09-17 22:57:12 +00:00
|
|
|
// fill with zeros
|
|
|
|
lfs_off_t pos = file->pos;
|
2018-07-13 01:43:55 +00:00
|
|
|
file->pos = file->ctz.size;
|
2017-09-17 22:57:12 +00:00
|
|
|
|
|
|
|
while (file->pos < pos) {
|
|
|
|
lfs_ssize_t res = lfs_file_write(lfs, file, &(uint8_t){0}, 1);
|
|
|
|
if (res < 0) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_write -> %"PRId32, res);
|
2017-09-17 22:57:12 +00:00
|
|
|
return res;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Added disk-backed limits on the name/attrs/inline sizes
Being a portable, microcontroller-scale embedded filesystem, littlefs is
presented with a relatively unique challenge. The amount of RAM
available is on completely different scales from machine to machine, and
what is normally a reasonable RAM assumption may break completely on an
embedded system.
A great example of this is file names. On almost every PC these days, the limit
for a file name is 255 bytes. It's a very convenient limit for a number
of reasons. However, on microcontrollers, allocating 255 bytes of RAM to
do a file search can be unreasonable.
The simplest solution (and one that has existing in littlefs for a
while), is to let this limit be redefined to a smaller value on devices
that need to save RAM. However, this presents an interesting portability
issue. If these devices are plugged into a PC with relatively infinite
RAM, nothing stops the PC from writing files with full 255-byte file
names, which can't be read on the small device.
One solution here is to store this limit on the superblock during format
time. When mounting a disk, the filesystem implementation is responsible for
checking this limit in the superblock. If it's larger than what can be
read, raise an error. If it's smaller, respect the limit on the
superblock and raise an error if the user attempts to exceed it.
In this commit, this strategy is adopted for file names, inline files,
and the size of all attributes, since these could impact the memory
consumption of the filesystem. (Recording the attribute's limit is
iffy, but is the only other arbitrary limit and could be used for disabling
support of custom attributes).
Note! This changes makes it very important to configure littlefs
correctly at format time. If littlefs is formatted on a PC without
changing the limits appropriately, it will be rejected by a smaller
device.
2018-04-01 20:36:29 +00:00
|
|
|
if ((file->flags & LFS_F_INLINE) &&
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
lfs_max(file->pos+nsize, file->ctz.size) >
|
2019-05-22 21:18:41 +00:00
|
|
|
lfs_min(0x3fe, lfs_min(
|
2019-01-14 23:53:41 +00:00
|
|
|
lfs->cfg->cache_size, lfs->cfg->block_size/8))) {
|
2018-04-03 13:29:28 +00:00
|
|
|
// inline file doesn't fit anymore
|
2019-07-09 22:51:15 +00:00
|
|
|
int err = lfs_file_outline(lfs, file);
|
2018-03-18 01:32:16 +00:00
|
|
|
if (err) {
|
|
|
|
file->flags |= LFS_F_ERRED;
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_write -> %d", err);
|
2018-03-18 01:32:16 +00:00
|
|
|
return err;
|
2018-03-17 15:28:14 +00:00
|
|
|
}
|
2018-03-18 01:32:16 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
while (nsize > 0) {
|
2017-04-23 02:42:22 +00:00
|
|
|
// check if we need a new block
|
2017-04-30 16:19:37 +00:00
|
|
|
if (!(file->flags & LFS_F_WRITING) ||
|
|
|
|
file->off == lfs->cfg->block_size) {
|
2018-03-19 01:36:48 +00:00
|
|
|
if (!(file->flags & LFS_F_INLINE)) {
|
2018-03-17 15:28:14 +00:00
|
|
|
if (!(file->flags & LFS_F_WRITING) && file->pos > 0) {
|
|
|
|
// find out which block we're extending from
|
2018-08-05 04:57:43 +00:00
|
|
|
int err = lfs_ctz_find(lfs, NULL, &file->cache,
|
2018-07-13 01:43:55 +00:00
|
|
|
file->ctz.head, file->ctz.size,
|
2018-03-17 15:28:14 +00:00
|
|
|
file->pos-1, &file->block, &file->off);
|
|
|
|
if (err) {
|
|
|
|
file->flags |= LFS_F_ERRED;
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_write -> %d", err);
|
2018-03-17 15:28:14 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
// mark cache as dirty since we may have read data into it
|
2018-08-05 01:33:09 +00:00
|
|
|
lfs_cache_zero(lfs, &file->cache);
|
2018-03-17 15:28:14 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// extend file with new blocks
|
|
|
|
lfs_alloc_ack(lfs);
|
2018-08-05 04:57:43 +00:00
|
|
|
int err = lfs_ctz_extend(lfs, &file->cache, &lfs->rcache,
|
2018-03-17 15:28:14 +00:00
|
|
|
file->block, file->pos,
|
|
|
|
&file->block, &file->off);
|
2017-04-23 04:11:13 +00:00
|
|
|
if (err) {
|
2017-11-16 23:25:41 +00:00
|
|
|
file->flags |= LFS_F_ERRED;
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_write -> %d", err);
|
2017-04-23 04:11:13 +00:00
|
|
|
return err;
|
|
|
|
}
|
2018-03-19 01:36:48 +00:00
|
|
|
} else {
|
2019-08-03 14:17:47 +00:00
|
|
|
file->block = LFS_BLOCK_INLINE;
|
2018-03-19 01:36:48 +00:00
|
|
|
file->off = file->pos;
|
2017-03-20 03:00:56 +00:00
|
|
|
}
|
2017-09-17 17:53:18 +00:00
|
|
|
|
|
|
|
file->flags |= LFS_F_WRITING;
|
2017-03-20 03:00:56 +00:00
|
|
|
}
|
|
|
|
|
2017-04-23 02:42:22 +00:00
|
|
|
// program as much as we can in current block
|
2017-04-30 16:19:37 +00:00
|
|
|
lfs_size_t diff = lfs_min(nsize, lfs->cfg->block_size - file->off);
|
2017-05-14 17:01:45 +00:00
|
|
|
while (true) {
|
Revisited caching rules to optimize bus transactions
The littlefs driver has always had this really weird quirk: larger cache
sizes can significantly harm performance. This has probably been one of
the most surprising pieces of configuraing and optimizing littlefs.
The reason is that littlefs's caches are kinda dumb (this is somewhat
intentional, as dumb caches take up much less code space than smart
caches). When littlefs needs to read data, it will load the entire cache
line. This means that even when we only need a small 4 byte piece of
data, we may need to read a full 512 byte cache. And since
microcontrollers may be reading from storage over relatively slow bus
protocols, the time to send data over the bus may dominate other
operations.
Now that we have separate configuration options for "cache_size" and
"read_size", we can start making littlefs's caches a bit smarter. They
aren't going to be perfect, because code size is still a priority, but
there are some small improvements we can do:
1. Program caches write to prog_size aligned units, but eagerly cache as
much as possible. There's no downside to using the full cache in
program operations.
2. Add a hint parameter to cached reads. This internal API allows callers
to tell the cache how much data they expect to need. This avoids
excess bus traffic, and now we can even bypass the cache if the
caller provides enough of a buffer.
We can still fall back to reading full cache-lines in the cases where
we don't know how much data we need by providing the block size as
the hint. We do this for directory fetches and for file reads.
This has immediate improvements for both metadata-log traversal and CTZ
skip-list traversal, since these both only need to read 4-byte pointers
and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
|
|
|
int err = lfs_bd_prog(lfs, &file->cache, &lfs->rcache, true,
|
2017-05-14 17:01:45 +00:00
|
|
|
file->block, file->off, data, diff);
|
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
goto relocate;
|
|
|
|
}
|
2017-11-16 23:25:41 +00:00
|
|
|
file->flags |= LFS_F_ERRED;
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_write -> %d", err);
|
2017-05-14 17:01:45 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
break;
|
|
|
|
relocate:
|
2017-06-25 19:01:33 +00:00
|
|
|
err = lfs_file_relocate(lfs, file);
|
2017-05-14 17:01:45 +00:00
|
|
|
if (err) {
|
2017-11-16 23:25:41 +00:00
|
|
|
file->flags |= LFS_F_ERRED;
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_write -> %d", err);
|
2017-05-14 17:01:45 +00:00
|
|
|
return err;
|
|
|
|
}
|
2017-03-20 03:00:56 +00:00
|
|
|
}
|
|
|
|
|
2017-04-24 04:39:50 +00:00
|
|
|
file->pos += diff;
|
2017-04-30 16:19:37 +00:00
|
|
|
file->off += diff;
|
2017-03-20 03:00:56 +00:00
|
|
|
data += diff;
|
|
|
|
nsize -= diff;
|
2017-05-14 17:01:45 +00:00
|
|
|
|
|
|
|
lfs_alloc_ack(lfs);
|
2017-04-23 02:42:22 +00:00
|
|
|
}
|
|
|
|
|
2017-11-16 23:25:41 +00:00
|
|
|
file->flags &= ~LFS_F_ERRED;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_write -> %"PRId32, size);
|
2017-03-20 03:00:56 +00:00
|
|
|
return size;
|
|
|
|
}
|
|
|
|
|
2017-04-23 04:11:13 +00:00
|
|
|
lfs_soff_t lfs_file_seek(lfs_t *lfs, lfs_file_t *file,
|
|
|
|
lfs_soff_t off, int whence) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_seek(%p, %p, %"PRId32", %d)",
|
|
|
|
(void*)lfs, (void*)file, off, whence);
|
2019-07-21 09:34:53 +00:00
|
|
|
LFS_ASSERT(file->flags & LFS_F_OPENED);
|
|
|
|
|
2017-04-23 04:11:13 +00:00
|
|
|
// write out everything beforehand, may be noop if rdonly
|
|
|
|
int err = lfs_file_flush(lfs, file);
|
|
|
|
if (err) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_seek -> %d", err);
|
2017-04-23 04:11:13 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-10-08 19:12:20 +00:00
|
|
|
// find new pos
|
2018-10-21 02:02:25 +00:00
|
|
|
lfs_off_t npos = file->pos;
|
2017-04-24 04:39:50 +00:00
|
|
|
if (whence == LFS_SEEK_SET) {
|
2018-10-08 19:12:20 +00:00
|
|
|
npos = off;
|
2017-04-24 04:39:50 +00:00
|
|
|
} else if (whence == LFS_SEEK_CUR) {
|
2018-10-08 19:12:20 +00:00
|
|
|
npos = file->pos + off;
|
2017-04-24 04:39:50 +00:00
|
|
|
} else if (whence == LFS_SEEK_END) {
|
2018-10-21 02:02:25 +00:00
|
|
|
npos = file->ctz.size + off;
|
2018-10-08 19:12:20 +00:00
|
|
|
}
|
2017-09-17 22:57:12 +00:00
|
|
|
|
2019-01-22 22:21:16 +00:00
|
|
|
if (npos > lfs->file_max) {
|
2018-10-08 19:12:20 +00:00
|
|
|
// file position out of range
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_seek -> %d", LFS_ERR_INVAL);
|
2018-10-08 19:12:20 +00:00
|
|
|
return LFS_ERR_INVAL;
|
2017-04-23 04:11:13 +00:00
|
|
|
}
|
|
|
|
|
2018-10-08 19:12:20 +00:00
|
|
|
// update pos
|
|
|
|
file->pos = npos;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_seek -> %"PRId32, npos);
|
2018-10-08 19:12:20 +00:00
|
|
|
return npos;
|
2017-04-23 04:11:13 +00:00
|
|
|
}
|
|
|
|
|
2018-01-20 23:30:40 +00:00
|
|
|
int lfs_file_truncate(lfs_t *lfs, lfs_file_t *file, lfs_off_t size) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_truncate(%p, %p, %"PRIu32")",
|
|
|
|
(void*)lfs, (void*)file, size);
|
2019-07-21 09:34:53 +00:00
|
|
|
LFS_ASSERT(file->flags & LFS_F_OPENED);
|
2019-07-24 19:59:48 +00:00
|
|
|
LFS_ASSERT((file->flags & 3) != LFS_O_RDONLY);
|
2018-01-20 23:30:40 +00:00
|
|
|
|
2019-02-17 11:39:58 +00:00
|
|
|
if (size > LFS_FILE_MAX) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_truncate -> %d", LFS_ERR_INVAL);
|
2019-02-17 11:39:58 +00:00
|
|
|
return LFS_ERR_INVAL;
|
|
|
|
}
|
|
|
|
|
2019-08-03 16:58:19 +00:00
|
|
|
lfs_off_t pos = file->pos;
|
2018-02-04 19:10:07 +00:00
|
|
|
lfs_off_t oldsize = lfs_file_size(lfs, file);
|
|
|
|
if (size < oldsize) {
|
2018-01-20 23:30:40 +00:00
|
|
|
// need to flush since directly changing metadata
|
|
|
|
int err = lfs_file_flush(lfs, file);
|
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_truncate -> %d", err);
|
2018-01-20 23:30:40 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
// lookup new head in ctz skip list
|
2018-08-05 04:57:43 +00:00
|
|
|
err = lfs_ctz_find(lfs, NULL, &file->cache,
|
2018-07-13 01:43:55 +00:00
|
|
|
file->ctz.head, file->ctz.size,
|
2019-05-22 19:24:05 +00:00
|
|
|
size, &file->block, &file->off);
|
2018-01-20 23:30:40 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_truncate -> %d", err);
|
2018-01-20 23:30:40 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-05-22 19:24:05 +00:00
|
|
|
file->ctz.head = file->block;
|
2018-07-13 01:43:55 +00:00
|
|
|
file->ctz.size = size;
|
2019-05-22 19:24:05 +00:00
|
|
|
file->flags |= LFS_F_DIRTY | LFS_F_READING;
|
2018-02-04 19:10:07 +00:00
|
|
|
} else if (size > oldsize) {
|
2018-01-20 23:30:40 +00:00
|
|
|
// flush+seek if not already at end
|
2018-02-04 19:10:07 +00:00
|
|
|
if (file->pos != oldsize) {
|
2019-10-01 04:20:43 +00:00
|
|
|
lfs_soff_t res = lfs_file_seek(lfs, file, 0, LFS_SEEK_END);
|
|
|
|
if (res < 0) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_truncate -> %"PRId32, res);
|
2019-10-01 04:20:43 +00:00
|
|
|
return (int)res;
|
2018-01-20 23:30:40 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// fill with zeros
|
|
|
|
while (file->pos < size) {
|
|
|
|
lfs_ssize_t res = lfs_file_write(lfs, file, &(uint8_t){0}, 1);
|
|
|
|
if (res < 0) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_truncate -> %"PRId32, res);
|
2019-10-01 04:20:43 +00:00
|
|
|
return (int)res;
|
2018-01-20 23:30:40 +00:00
|
|
|
}
|
|
|
|
}
|
2019-08-03 16:58:19 +00:00
|
|
|
}
|
2018-01-20 23:30:40 +00:00
|
|
|
|
2019-08-03 16:58:19 +00:00
|
|
|
// restore pos
|
2019-10-01 04:20:43 +00:00
|
|
|
lfs_soff_t res = lfs_file_seek(lfs, file, pos, LFS_SEEK_SET);
|
|
|
|
if (res < 0) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_truncate -> %"PRId32, res);
|
2019-10-01 04:20:43 +00:00
|
|
|
return (int)res;
|
2018-01-20 23:30:40 +00:00
|
|
|
}
|
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_truncate -> %d", 0);
|
2018-01-20 23:30:40 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-04-23 04:11:13 +00:00
|
|
|
lfs_soff_t lfs_file_tell(lfs_t *lfs, lfs_file_t *file) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_tell(%p, %p)", (void*)lfs, (void*)file);
|
2019-07-21 09:34:53 +00:00
|
|
|
LFS_ASSERT(file->flags & LFS_F_OPENED);
|
2018-02-04 19:10:07 +00:00
|
|
|
(void)lfs;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_tell -> %"PRId32, file->pos);
|
2017-04-24 04:39:50 +00:00
|
|
|
return file->pos;
|
2017-04-23 04:11:13 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
int lfs_file_rewind(lfs_t *lfs, lfs_file_t *file) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_rewind(%p, %p)", (void*)lfs, (void*)file);
|
2017-04-23 04:11:13 +00:00
|
|
|
lfs_soff_t res = lfs_file_seek(lfs, file, 0, LFS_SEEK_SET);
|
|
|
|
if (res < 0) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_file_rewind -> %"PRId32, res);
|
2019-10-01 04:22:25 +00:00
|
|
|
return (int)res;
|
2017-04-23 04:11:13 +00:00
|
|
|
}
|
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_rewind -> %d", 0);
|
2017-04-23 04:11:13 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2017-04-24 02:40:03 +00:00
|
|
|
lfs_soff_t lfs_file_size(lfs_t *lfs, lfs_file_t *file) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_size(%p, %p)", (void*)lfs, (void*)file);
|
2019-07-21 09:34:53 +00:00
|
|
|
LFS_ASSERT(file->flags & LFS_F_OPENED);
|
2018-02-04 19:10:07 +00:00
|
|
|
(void)lfs;
|
2018-01-20 23:30:40 +00:00
|
|
|
if (file->flags & LFS_F_WRITING) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_size -> %"PRId32,
|
|
|
|
lfs_max(file->pos, file->ctz.size));
|
2018-07-13 01:43:55 +00:00
|
|
|
return lfs_max(file->pos, file->ctz.size);
|
2018-01-20 23:30:40 +00:00
|
|
|
} else {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_file_size -> %"PRId32, file->ctz.size);
|
2018-07-13 01:43:55 +00:00
|
|
|
return file->ctz.size;
|
2018-01-20 23:30:40 +00:00
|
|
|
}
|
2017-04-24 02:40:03 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
|
2018-01-30 19:07:37 +00:00
|
|
|
/// General fs operations ///
|
2017-04-24 02:40:03 +00:00
|
|
|
int lfs_stat(lfs_t *lfs, const char *path, struct lfs_info *info) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_stat(%p, \"%s\", %p)", (void*)lfs, path, (void*)info);
|
2018-05-29 06:11:26 +00:00
|
|
|
lfs_mdir_t cwd;
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
lfs_stag_t tag = lfs_dir_find(lfs, &cwd, &path, NULL);
|
2018-07-13 01:22:06 +00:00
|
|
|
if (tag < 0) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_stat -> %"PRId32, tag);
|
2019-10-01 04:22:01 +00:00
|
|
|
return (int)tag;
|
2018-04-06 00:03:58 +00:00
|
|
|
}
|
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
int err = lfs_dir_getinfo(lfs, &cwd, lfs_tag_id(tag), info);
|
|
|
|
LFS_TRACE("lfs_stat -> %d", err);
|
|
|
|
return err;
|
2018-04-06 00:03:58 +00:00
|
|
|
}
|
|
|
|
|
2018-05-27 15:15:28 +00:00
|
|
|
int lfs_remove(lfs_t *lfs, const char *path) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_remove(%p, \"%s\")", (void*)lfs, path);
|
2018-05-27 15:15:28 +00:00
|
|
|
// deorphan if we haven't yet, needed at most once after poweron
|
2018-08-01 10:52:48 +00:00
|
|
|
int err = lfs_fs_forceconsistency(lfs);
|
2018-07-31 13:07:36 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_remove -> %d", err);
|
2018-07-31 13:07:36 +00:00
|
|
|
return err;
|
2018-05-27 15:15:28 +00:00
|
|
|
}
|
|
|
|
|
2018-05-29 06:11:26 +00:00
|
|
|
lfs_mdir_t cwd;
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
lfs_stag_t tag = lfs_dir_find(lfs, &cwd, &path, NULL);
|
2019-02-12 06:01:28 +00:00
|
|
|
if (tag < 0 || lfs_tag_id(tag) == 0x3ff) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_remove -> %"PRId32, (tag < 0) ? tag : LFS_ERR_INVAL);
|
2019-10-01 03:56:51 +00:00
|
|
|
return (tag < 0) ? (int)tag : LFS_ERR_INVAL;
|
2018-05-27 15:15:28 +00:00
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
struct lfs_mlist dir;
|
|
|
|
dir.next = lfs->mlist;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (lfs_tag_type3(tag) == LFS_TYPE_DIR) {
|
2018-05-27 15:15:28 +00:00
|
|
|
// must be empty before removal
|
2018-07-13 01:43:55 +00:00
|
|
|
lfs_block_t pair[2];
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_stag_t res = lfs_dir_get(lfs, &cwd, LFS_MKTAG(0x700, 0x3ff, 0),
|
2018-08-05 04:57:43 +00:00
|
|
|
LFS_MKTAG(LFS_TYPE_STRUCT, lfs_tag_id(tag), 8), pair);
|
2018-07-13 01:43:55 +00:00
|
|
|
if (res < 0) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_remove -> %"PRId32, res);
|
2019-10-01 03:56:51 +00:00
|
|
|
return (int)res;
|
2018-07-09 17:51:31 +00:00
|
|
|
}
|
2018-08-05 04:57:43 +00:00
|
|
|
lfs_pair_fromle32(pair);
|
2018-07-09 17:51:31 +00:00
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
err = lfs_dir_fetch(lfs, &dir.m, pair);
|
2018-05-27 15:15:28 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_remove -> %d", err);
|
2018-05-27 15:15:28 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
if (dir.m.count > 0 || dir.m.split) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_remove -> %d", LFS_ERR_NOTEMPTY);
|
2018-05-27 15:15:28 +00:00
|
|
|
return LFS_ERR_NOTEMPTY;
|
|
|
|
}
|
2018-07-31 13:07:36 +00:00
|
|
|
|
|
|
|
// mark fs as orphaned
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_fs_preporphans(lfs, +1);
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
|
|
|
|
// I know it's crazy but yes, dir can be changed by our parent's
|
|
|
|
// commit (if predecessor is child)
|
|
|
|
dir.type = 0;
|
|
|
|
dir.id = 0;
|
|
|
|
lfs->mlist = &dir;
|
2018-05-27 15:15:28 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// delete the entry
|
2019-01-08 14:52:03 +00:00
|
|
|
err = lfs_dir_commit(lfs, &cwd, LFS_MKATTRS(
|
2020-01-20 23:35:45 +00:00
|
|
|
{LFS_MKTAG(LFS_TYPE_DELETE, lfs_tag_id(tag), 0)}));
|
2018-05-27 15:15:28 +00:00
|
|
|
if (err) {
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs->mlist = dir.next;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_remove -> %d", err);
|
2018-05-27 15:15:28 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs->mlist = dir.next;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (lfs_tag_type3(tag) == LFS_TYPE_DIR) {
|
2018-08-13 14:03:13 +00:00
|
|
|
// fix orphan
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_fs_preporphans(lfs, -1);
|
2018-08-13 14:03:13 +00:00
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
err = lfs_fs_pred(lfs, dir.m.pair, &cwd);
|
2018-07-13 01:43:55 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_remove -> %d", err);
|
2018-07-13 01:43:55 +00:00
|
|
|
return err;
|
2018-07-02 03:29:42 +00:00
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
err = lfs_dir_drop(lfs, &cwd, &dir.m);
|
2018-07-13 00:07:56 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_remove -> %d", err);
|
2018-07-13 00:07:56 +00:00
|
|
|
return err;
|
|
|
|
}
|
2018-07-02 03:29:42 +00:00
|
|
|
}
|
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_remove -> %d", 0);
|
2018-05-27 15:15:28 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int lfs_rename(lfs_t *lfs, const char *oldpath, const char *newpath) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_rename(%p, \"%s\", \"%s\")", (void*)lfs, oldpath, newpath);
|
|
|
|
|
2018-05-27 15:15:28 +00:00
|
|
|
// deorphan if we haven't yet, needed at most once after poweron
|
2018-08-01 10:52:48 +00:00
|
|
|
int err = lfs_fs_forceconsistency(lfs);
|
2018-07-31 13:07:36 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_rename -> %d", err);
|
2018-07-31 13:07:36 +00:00
|
|
|
return err;
|
2018-05-27 15:15:28 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// find old entry
|
2018-05-29 06:11:26 +00:00
|
|
|
lfs_mdir_t oldcwd;
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
lfs_stag_t oldtag = lfs_dir_find(lfs, &oldcwd, &oldpath, NULL);
|
2019-02-12 06:01:28 +00:00
|
|
|
if (oldtag < 0 || lfs_tag_id(oldtag) == 0x3ff) {
|
2020-01-20 23:35:45 +00:00
|
|
|
LFS_TRACE("lfs_rename -> %"PRId32,
|
|
|
|
(oldtag < 0) ? oldtag : LFS_ERR_INVAL);
|
2019-10-01 03:51:52 +00:00
|
|
|
return (oldtag < 0) ? (int)oldtag : LFS_ERR_INVAL;
|
2018-05-27 15:15:28 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// find new entry
|
2018-05-29 06:11:26 +00:00
|
|
|
lfs_mdir_t newcwd;
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
uint16_t newid;
|
|
|
|
lfs_stag_t prevtag = lfs_dir_find(lfs, &newcwd, &newpath, &newid);
|
2019-02-12 06:01:28 +00:00
|
|
|
if ((prevtag < 0 || lfs_tag_id(prevtag) == 0x3ff) &&
|
|
|
|
!(prevtag == LFS_ERR_NOENT && newid != 0x3ff)) {
|
2020-01-20 23:35:45 +00:00
|
|
|
LFS_TRACE("lfs_rename -> %"PRId32,
|
|
|
|
(prevtag < 0) ? prevtag : LFS_ERR_INVAL);
|
2019-10-01 03:51:52 +00:00
|
|
|
return (prevtag < 0) ? (int)prevtag : LFS_ERR_INVAL;
|
2018-05-27 15:15:28 +00:00
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
// if we're in the same pair there's a few special cases...
|
|
|
|
bool samepair = (lfs_pair_cmp(oldcwd.pair, newcwd.pair) == 0);
|
|
|
|
uint16_t newoldid = lfs_tag_id(oldtag);
|
|
|
|
|
|
|
|
struct lfs_mlist prevdir;
|
|
|
|
prevdir.next = lfs->mlist;
|
2018-08-13 14:03:13 +00:00
|
|
|
if (prevtag == LFS_ERR_NOENT) {
|
2018-05-27 15:15:28 +00:00
|
|
|
// check that name fits
|
|
|
|
lfs_size_t nlen = strlen(newpath);
|
2018-08-05 01:10:08 +00:00
|
|
|
if (nlen > lfs->name_max) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_rename -> %d", LFS_ERR_NAMETOOLONG);
|
2018-05-27 15:15:28 +00:00
|
|
|
return LFS_ERR_NAMETOOLONG;
|
|
|
|
}
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
|
|
|
|
// there is a small chance we are being renamed in the same
|
|
|
|
// directory/ to an id less than our old id, the global update
|
|
|
|
// to handle this is a bit messy
|
|
|
|
if (samepair && newid <= newoldid) {
|
|
|
|
newoldid += 1;
|
|
|
|
}
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
} else if (lfs_tag_type3(prevtag) != lfs_tag_type3(oldtag)) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_rename -> %d", LFS_ERR_ISDIR);
|
2018-08-13 14:03:13 +00:00
|
|
|
return LFS_ERR_ISDIR;
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
} else if (samepair && newid == newoldid) {
|
|
|
|
// we're renaming to ourselves??
|
|
|
|
LFS_TRACE("lfs_rename -> %d", 0);
|
|
|
|
return 0;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
} else if (lfs_tag_type3(prevtag) == LFS_TYPE_DIR) {
|
2018-08-13 14:03:13 +00:00
|
|
|
// must be empty before removal
|
|
|
|
lfs_block_t prevpair[2];
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_stag_t res = lfs_dir_get(lfs, &newcwd, LFS_MKTAG(0x700, 0x3ff, 0),
|
2018-08-13 14:03:13 +00:00
|
|
|
LFS_MKTAG(LFS_TYPE_STRUCT, newid, 8), prevpair);
|
|
|
|
if (res < 0) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_rename -> %"PRId32, res);
|
2019-10-01 03:51:52 +00:00
|
|
|
return (int)res;
|
2018-08-13 14:03:13 +00:00
|
|
|
}
|
|
|
|
lfs_pair_fromle32(prevpair);
|
|
|
|
|
|
|
|
// must be empty before removal
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
err = lfs_dir_fetch(lfs, &prevdir.m, prevpair);
|
2018-08-13 14:03:13 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_rename -> %d", err);
|
2018-08-13 14:03:13 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
if (prevdir.m.count > 0 || prevdir.m.split) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_rename -> %d", LFS_ERR_NOTEMPTY);
|
2018-08-13 14:03:13 +00:00
|
|
|
return LFS_ERR_NOTEMPTY;
|
|
|
|
}
|
|
|
|
|
|
|
|
// mark fs as orphaned
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_fs_preporphans(lfs, +1);
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
|
|
|
|
// I know it's crazy but yes, dir can be changed by our parent's
|
|
|
|
// commit (if predecessor is child)
|
|
|
|
prevdir.type = 0;
|
|
|
|
prevdir.id = 0;
|
|
|
|
lfs->mlist = &prevdir;
|
2018-05-27 15:15:28 +00:00
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
if (!samepair) {
|
|
|
|
lfs_fs_prepmove(lfs, newoldid, oldcwd.pair);
|
Switched to traversal-based compact logic
This simplifies some of the interactions between reading and writing
inside the commit logic. Unfortunately this change didn't decrease
code size as was initially hoped, but it does offer a nice runtime
improvement for the common case and should improve debugability.
Before, the compact logic required three iterations:
1. iterate through all the ids in a directory
2. scan attrs bound to each id in the directory
3. lookup attrs in the in-progress commit
The code for this, while terse and complicated, did have some nice side
effect. The directory lookup logic could be reused for looking up in the
in-progress commit, and iterating through each id allows us to know
exactly how many ids we can fit during a compact. Giving us a O(n^3)
compact and O(n^3) split.
However, this was complicated by a few things.
First, this compact logic doesn't handle deleted attrs. To work around this,
I added a marker for the last commit (or first based on your perspective)
which would indicate if a delete should be copied over. This worked but was
a bit hacky and meant deletes weren't cleaned up on the first compact.
Second, we can't actually figure out our compacted size until we
compact. This worked ok except for the fact that splits will always have a
failed compact. This means we waste an erase which could very expensive.
It is possible to work around this by keeping our work, but with only a
single prog cache this was very tricky and also somewhat hacky.
Third, the interactions between reading and writing to the same block
were tricky and error-prone. They should mostly be working now, but
seeing this requirement go away does not make me sad.
The new compact logic fixes these issues by moving the complexity into a
general-purpose lfs_dir_traverse function which has much fewer side
effects on the system. We can even use it for dry-runs to precompute our
estimated size.
How does it work?
1. iterate through all attr in the directory
2. for each attr, scan the rest of the directory to figure out the
attr's history, this will change the attr based on dir modifications
and may even exit early if the attr was deleted.
The end result is a traversal function that gives us the resulting state
of each attr in only O(n^2). To make this complete, we allow a bounded
recursion into mcu-side move attrs, although this ends up being O(n^3)
unlike moves in the original solution (however moves are less common.
This gives us a nice traversal function we can use for compacts and
moves, handles deletes, and is overall simpler to reason about.
Two minor hiccups:
1. We need to handle create attrs specially, since this algorithm
doesn't care or id order, which can cause problems since attr
insertion are order sensitive. We can fix this by simply looking up
each create (since there is only one per file) in order at the
beginning of our traversal. This is oddly complimentary to the move
logic, which also handles create attrs separately.
2. We no longer know exactly how many ids we can write to a dir during
splits. However, since we can do a dry-run traversal, we can use that
to simply binary search for the mid-point.
This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice
minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
|
|
|
}
|
2019-01-04 23:23:36 +00:00
|
|
|
|
2018-07-02 03:29:42 +00:00
|
|
|
// move over all attributes
|
2019-01-08 14:52:03 +00:00
|
|
|
err = lfs_dir_commit(lfs, &newcwd, LFS_MKATTRS(
|
2020-01-20 23:35:45 +00:00
|
|
|
{LFS_MKTAG_IF(prevtag != LFS_ERR_NOENT,
|
|
|
|
LFS_TYPE_DELETE, newid, 0)},
|
|
|
|
{LFS_MKTAG(LFS_TYPE_CREATE, newid, 0)},
|
|
|
|
{LFS_MKTAG(lfs_tag_type3(oldtag), newid, strlen(newpath)), newpath},
|
|
|
|
{LFS_MKTAG(LFS_FROM_MOVE, newid, lfs_tag_id(oldtag)), &oldcwd},
|
|
|
|
{LFS_MKTAG_IF(samepair,
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
LFS_TYPE_DELETE, newoldid, 0)}));
|
2018-05-27 15:15:28 +00:00
|
|
|
if (err) {
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs->mlist = prevdir.next;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_rename -> %d", err);
|
2018-05-27 15:15:28 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-07-17 23:31:30 +00:00
|
|
|
// let commit clean up after move (if we're different! otherwise move
|
|
|
|
// logic already fixed it for us)
|
2020-01-20 23:35:45 +00:00
|
|
|
if (!samepair && lfs_gstate_hasmove(&lfs->gstate)) {
|
|
|
|
// prep gstate and delete move id
|
|
|
|
lfs_fs_prepmove(lfs, 0x3ff, NULL);
|
|
|
|
err = lfs_dir_commit(lfs, &oldcwd, LFS_MKATTRS(
|
|
|
|
{LFS_MKTAG(LFS_TYPE_DELETE, lfs_tag_id(oldtag), 0)}));
|
2018-07-17 23:31:30 +00:00
|
|
|
if (err) {
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs->mlist = prevdir.next;
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_rename -> %d", err);
|
2018-07-17 23:31:30 +00:00
|
|
|
return err;
|
|
|
|
}
|
2018-05-27 15:15:28 +00:00
|
|
|
}
|
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs->mlist = prevdir.next;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (prevtag != LFS_ERR_NOENT && lfs_tag_type3(prevtag) == LFS_TYPE_DIR) {
|
2018-08-13 14:03:13 +00:00
|
|
|
// fix orphan
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_fs_preporphans(lfs, -1);
|
2018-08-13 14:03:13 +00:00
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
err = lfs_fs_pred(lfs, prevdir.m.pair, &newcwd);
|
2018-07-13 01:43:55 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_rename -> %d", err);
|
2018-07-13 01:43:55 +00:00
|
|
|
return err;
|
2018-05-27 15:15:28 +00:00
|
|
|
}
|
2018-07-02 03:29:42 +00:00
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
err = lfs_dir_drop(lfs, &newcwd, &prevdir.m);
|
2018-07-13 00:07:56 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_rename -> %d", err);
|
2018-07-13 00:07:56 +00:00
|
|
|
return err;
|
|
|
|
}
|
2018-05-27 15:15:28 +00:00
|
|
|
}
|
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_rename -> %d", 0);
|
2018-05-27 15:15:28 +00:00
|
|
|
return 0;
|
|
|
|
}
|
2018-04-06 00:03:58 +00:00
|
|
|
|
2018-07-29 20:03:23 +00:00
|
|
|
lfs_ssize_t lfs_getattr(lfs_t *lfs, const char *path,
|
|
|
|
uint8_t type, void *buffer, lfs_size_t size) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_getattr(%p, \"%s\", %"PRIu8", %p, %"PRIu32")",
|
|
|
|
(void*)lfs, path, type, buffer, size);
|
2018-07-29 20:03:23 +00:00
|
|
|
lfs_mdir_t cwd;
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
lfs_stag_t tag = lfs_dir_find(lfs, &cwd, &path, NULL);
|
|
|
|
if (tag < 0) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_getattr -> %"PRId32, tag);
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
return tag;
|
2018-07-29 20:03:23 +00:00
|
|
|
}
|
|
|
|
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
uint16_t id = lfs_tag_id(tag);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (id == 0x3ff) {
|
2018-08-12 04:05:52 +00:00
|
|
|
// special case for root
|
|
|
|
id = 0;
|
|
|
|
int err = lfs_dir_fetch(lfs, &cwd, lfs->root);
|
|
|
|
if (err) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_getattr -> %d", err);
|
2018-08-12 04:05:52 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
tag = lfs_dir_get(lfs, &cwd, LFS_MKTAG(0x7ff, 0x3ff, 0),
|
|
|
|
LFS_MKTAG(LFS_TYPE_USERATTR + type,
|
|
|
|
id, lfs_min(size, lfs->attr_max)),
|
2018-08-12 04:05:52 +00:00
|
|
|
buffer);
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
if (tag < 0) {
|
|
|
|
if (tag == LFS_ERR_NOENT) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_getattr -> %d", LFS_ERR_NOATTR);
|
2018-09-09 23:48:18 +00:00
|
|
|
return LFS_ERR_NOATTR;
|
|
|
|
}
|
2019-05-31 09:40:19 +00:00
|
|
|
|
|
|
|
LFS_TRACE("lfs_getattr -> %"PRId32, tag);
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
return tag;
|
2018-07-29 20:03:23 +00:00
|
|
|
}
|
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
size = lfs_tag_size(tag);
|
|
|
|
LFS_TRACE("lfs_getattr -> %"PRId32, size);
|
|
|
|
return size;
|
2018-07-29 20:03:23 +00:00
|
|
|
}
|
|
|
|
|
2018-09-09 23:48:18 +00:00
|
|
|
static int lfs_commitattr(lfs_t *lfs, const char *path,
|
2018-07-29 20:03:23 +00:00
|
|
|
uint8_t type, const void *buffer, lfs_size_t size) {
|
|
|
|
lfs_mdir_t cwd;
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
lfs_stag_t tag = lfs_dir_find(lfs, &cwd, &path, NULL);
|
|
|
|
if (tag < 0) {
|
|
|
|
return tag;
|
2018-07-29 20:03:23 +00:00
|
|
|
}
|
|
|
|
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
uint16_t id = lfs_tag_id(tag);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (id == 0x3ff) {
|
2018-08-12 04:05:52 +00:00
|
|
|
// special case for root
|
|
|
|
id = 0;
|
|
|
|
int err = lfs_dir_fetch(lfs, &cwd, lfs->root);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-01-08 14:52:03 +00:00
|
|
|
return lfs_dir_commit(lfs, &cwd, LFS_MKATTRS(
|
|
|
|
{LFS_MKTAG(LFS_TYPE_USERATTR + type, id, size), buffer}));
|
2018-07-29 20:03:23 +00:00
|
|
|
}
|
|
|
|
|
2018-09-09 23:48:18 +00:00
|
|
|
int lfs_setattr(lfs_t *lfs, const char *path,
|
|
|
|
uint8_t type, const void *buffer, lfs_size_t size) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_setattr(%p, \"%s\", %"PRIu8", %p, %"PRIu32")",
|
|
|
|
(void*)lfs, path, type, buffer, size);
|
2018-09-09 23:48:18 +00:00
|
|
|
if (size > lfs->attr_max) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_setattr -> %d", LFS_ERR_NOSPC);
|
2018-09-09 23:48:18 +00:00
|
|
|
return LFS_ERR_NOSPC;
|
|
|
|
}
|
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
int err = lfs_commitattr(lfs, path, type, buffer, size);
|
|
|
|
LFS_TRACE("lfs_setattr -> %d", err);
|
|
|
|
return err;
|
2018-09-09 23:48:18 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
int lfs_removeattr(lfs_t *lfs, const char *path, uint8_t type) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_removeattr(%p, \"%s\", %"PRIu8")", (void*)lfs, path, type);
|
|
|
|
int err = lfs_commitattr(lfs, path, type, NULL, 0x3ff);
|
|
|
|
LFS_TRACE("lfs_removeattr -> %d", err);
|
|
|
|
return err;
|
2018-09-09 23:48:18 +00:00
|
|
|
}
|
|
|
|
|
2018-04-06 00:03:58 +00:00
|
|
|
|
2017-04-24 02:40:03 +00:00
|
|
|
/// Filesystem operations ///
|
2017-04-22 18:30:40 +00:00
|
|
|
static int lfs_init(lfs_t *lfs, const struct lfs_config *cfg) {
|
|
|
|
lfs->cfg = cfg;
|
2018-08-05 01:33:09 +00:00
|
|
|
int err = 0;
|
2017-04-22 18:30:40 +00:00
|
|
|
|
2019-08-31 13:57:56 +00:00
|
|
|
// validate that the lfs-cfg sizes were initiated properly before
|
|
|
|
// performing any arithmetic logics with them
|
|
|
|
LFS_ASSERT(lfs->cfg->read_size != 0);
|
|
|
|
LFS_ASSERT(lfs->cfg->prog_size != 0);
|
|
|
|
LFS_ASSERT(lfs->cfg->cache_size != 0);
|
|
|
|
|
2018-08-04 21:04:24 +00:00
|
|
|
// check that block size is a multiple of cache size is a multiple
|
|
|
|
// of prog and read sizes
|
|
|
|
LFS_ASSERT(lfs->cfg->cache_size % lfs->cfg->read_size == 0);
|
|
|
|
LFS_ASSERT(lfs->cfg->cache_size % lfs->cfg->prog_size == 0);
|
|
|
|
LFS_ASSERT(lfs->cfg->block_size % lfs->cfg->cache_size == 0);
|
|
|
|
|
|
|
|
// check that the block size is large enough to fit ctz pointers
|
2020-02-10 04:43:20 +00:00
|
|
|
LFS_ASSERT(4*lfs_npw2(0xffffffff / (lfs->cfg->block_size-2*4))
|
2018-08-04 21:04:24 +00:00
|
|
|
<= lfs->cfg->block_size);
|
|
|
|
|
2019-07-29 01:42:13 +00:00
|
|
|
// block_cycles = 0 is no longer supported.
|
|
|
|
//
|
|
|
|
// block_cycles is the number of erase cycles before littlefs evicts
|
|
|
|
// metadata logs as a part of wear leveling. Suggested values are in the
|
|
|
|
// range of 100-1000, or set block_cycles to -1 to disable block-level
|
|
|
|
// wear-leveling.
|
2019-07-17 22:05:20 +00:00
|
|
|
LFS_ASSERT(lfs->cfg->block_cycles != 0);
|
2019-02-12 06:01:28 +00:00
|
|
|
|
|
|
|
|
2017-04-22 19:56:12 +00:00
|
|
|
// setup read cache
|
2017-04-22 18:30:40 +00:00
|
|
|
if (lfs->cfg->read_buffer) {
|
|
|
|
lfs->rcache.buffer = lfs->cfg->read_buffer;
|
|
|
|
} else {
|
2018-08-04 19:48:27 +00:00
|
|
|
lfs->rcache.buffer = lfs_malloc(lfs->cfg->cache_size);
|
2017-04-22 18:30:40 +00:00
|
|
|
if (!lfs->rcache.buffer) {
|
2018-08-05 01:33:09 +00:00
|
|
|
err = LFS_ERR_NOMEM;
|
2018-07-18 14:50:00 +00:00
|
|
|
goto cleanup;
|
2017-04-22 18:30:40 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-04-22 19:56:12 +00:00
|
|
|
// setup program cache
|
2017-04-22 18:30:40 +00:00
|
|
|
if (lfs->cfg->prog_buffer) {
|
|
|
|
lfs->pcache.buffer = lfs->cfg->prog_buffer;
|
|
|
|
} else {
|
2018-08-04 19:48:27 +00:00
|
|
|
lfs->pcache.buffer = lfs_malloc(lfs->cfg->cache_size);
|
2017-04-22 18:30:40 +00:00
|
|
|
if (!lfs->pcache.buffer) {
|
2018-08-05 01:33:09 +00:00
|
|
|
err = LFS_ERR_NOMEM;
|
2018-07-18 14:50:00 +00:00
|
|
|
goto cleanup;
|
2017-04-22 18:30:40 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-07-06 16:14:30 +00:00
|
|
|
// zero to avoid information leaks
|
|
|
|
lfs_cache_zero(lfs, &lfs->rcache);
|
|
|
|
lfs_cache_zero(lfs, &lfs->pcache);
|
|
|
|
|
2019-07-31 01:01:51 +00:00
|
|
|
// setup lookahead, must be multiple of 64-bits, 32-bit aligned
|
2018-10-03 01:18:30 +00:00
|
|
|
LFS_ASSERT(lfs->cfg->lookahead_size > 0);
|
2019-04-09 23:07:44 +00:00
|
|
|
LFS_ASSERT(lfs->cfg->lookahead_size % 8 == 0 &&
|
2019-07-31 01:01:51 +00:00
|
|
|
(uintptr_t)lfs->cfg->lookahead_buffer % 4 == 0);
|
2017-04-22 19:56:12 +00:00
|
|
|
if (lfs->cfg->lookahead_buffer) {
|
2017-09-19 02:20:33 +00:00
|
|
|
lfs->free.buffer = lfs->cfg->lookahead_buffer;
|
2017-04-22 19:56:12 +00:00
|
|
|
} else {
|
2018-10-03 01:18:30 +00:00
|
|
|
lfs->free.buffer = lfs_malloc(lfs->cfg->lookahead_size);
|
2017-09-19 02:20:33 +00:00
|
|
|
if (!lfs->free.buffer) {
|
2018-08-05 01:33:09 +00:00
|
|
|
err = LFS_ERR_NOMEM;
|
2018-07-18 14:50:00 +00:00
|
|
|
goto cleanup;
|
2017-04-22 19:56:12 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Added disk-backed limits on the name/attrs/inline sizes
Being a portable, microcontroller-scale embedded filesystem, littlefs is
presented with a relatively unique challenge. The amount of RAM
available is on completely different scales from machine to machine, and
what is normally a reasonable RAM assumption may break completely on an
embedded system.
A great example of this is file names. On almost every PC these days, the limit
for a file name is 255 bytes. It's a very convenient limit for a number
of reasons. However, on microcontrollers, allocating 255 bytes of RAM to
do a file search can be unreasonable.
The simplest solution (and one that has existing in littlefs for a
while), is to let this limit be redefined to a smaller value on devices
that need to save RAM. However, this presents an interesting portability
issue. If these devices are plugged into a PC with relatively infinite
RAM, nothing stops the PC from writing files with full 255-byte file
names, which can't be read on the small device.
One solution here is to store this limit on the superblock during format
time. When mounting a disk, the filesystem implementation is responsible for
checking this limit in the superblock. If it's larger than what can be
read, raise an error. If it's smaller, respect the limit on the
superblock and raise an error if the user attempts to exceed it.
In this commit, this strategy is adopted for file names, inline files,
and the size of all attributes, since these could impact the memory
consumption of the filesystem. (Recording the attribute's limit is
iffy, but is the only other arbitrary limit and could be used for disabling
support of custom attributes).
Note! This changes makes it very important to configure littlefs
correctly at format time. If littlefs is formatted on a PC without
changing the limits appropriately, it will be rejected by a smaller
device.
2018-04-01 20:36:29 +00:00
|
|
|
// check that the size limits are sane
|
2018-10-02 23:28:37 +00:00
|
|
|
LFS_ASSERT(lfs->cfg->name_max <= LFS_NAME_MAX);
|
|
|
|
lfs->name_max = lfs->cfg->name_max;
|
|
|
|
if (!lfs->name_max) {
|
|
|
|
lfs->name_max = LFS_NAME_MAX;
|
|
|
|
}
|
|
|
|
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
LFS_ASSERT(lfs->cfg->file_max <= LFS_FILE_MAX);
|
|
|
|
lfs->file_max = lfs->cfg->file_max;
|
|
|
|
if (!lfs->file_max) {
|
|
|
|
lfs->file_max = LFS_FILE_MAX;
|
Added disk-backed limits on the name/attrs/inline sizes
Being a portable, microcontroller-scale embedded filesystem, littlefs is
presented with a relatively unique challenge. The amount of RAM
available is on completely different scales from machine to machine, and
what is normally a reasonable RAM assumption may break completely on an
embedded system.
A great example of this is file names. On almost every PC these days, the limit
for a file name is 255 bytes. It's a very convenient limit for a number
of reasons. However, on microcontrollers, allocating 255 bytes of RAM to
do a file search can be unreasonable.
The simplest solution (and one that has existing in littlefs for a
while), is to let this limit be redefined to a smaller value on devices
that need to save RAM. However, this presents an interesting portability
issue. If these devices are plugged into a PC with relatively infinite
RAM, nothing stops the PC from writing files with full 255-byte file
names, which can't be read on the small device.
One solution here is to store this limit on the superblock during format
time. When mounting a disk, the filesystem implementation is responsible for
checking this limit in the superblock. If it's larger than what can be
read, raise an error. If it's smaller, respect the limit on the
superblock and raise an error if the user attempts to exceed it.
In this commit, this strategy is adopted for file names, inline files,
and the size of all attributes, since these could impact the memory
consumption of the filesystem. (Recording the attribute's limit is
iffy, but is the only other arbitrary limit and could be used for disabling
support of custom attributes).
Note! This changes makes it very important to configure littlefs
correctly at format time. If littlefs is formatted on a PC without
changing the limits appropriately, it will be rejected by a smaller
device.
2018-04-01 20:36:29 +00:00
|
|
|
}
|
|
|
|
|
2018-08-05 01:10:08 +00:00
|
|
|
LFS_ASSERT(lfs->cfg->attr_max <= LFS_ATTR_MAX);
|
|
|
|
lfs->attr_max = lfs->cfg->attr_max;
|
|
|
|
if (!lfs->attr_max) {
|
|
|
|
lfs->attr_max = LFS_ATTR_MAX;
|
Added disk-backed limits on the name/attrs/inline sizes
Being a portable, microcontroller-scale embedded filesystem, littlefs is
presented with a relatively unique challenge. The amount of RAM
available is on completely different scales from machine to machine, and
what is normally a reasonable RAM assumption may break completely on an
embedded system.
A great example of this is file names. On almost every PC these days, the limit
for a file name is 255 bytes. It's a very convenient limit for a number
of reasons. However, on microcontrollers, allocating 255 bytes of RAM to
do a file search can be unreasonable.
The simplest solution (and one that has existing in littlefs for a
while), is to let this limit be redefined to a smaller value on devices
that need to save RAM. However, this presents an interesting portability
issue. If these devices are plugged into a PC with relatively infinite
RAM, nothing stops the PC from writing files with full 255-byte file
names, which can't be read on the small device.
One solution here is to store this limit on the superblock during format
time. When mounting a disk, the filesystem implementation is responsible for
checking this limit in the superblock. If it's larger than what can be
read, raise an error. If it's smaller, respect the limit on the
superblock and raise an error if the user attempts to exceed it.
In this commit, this strategy is adopted for file names, inline files,
and the size of all attributes, since these could impact the memory
consumption of the filesystem. (Recording the attribute's limit is
iffy, but is the only other arbitrary limit and could be used for disabling
support of custom attributes).
Note! This changes makes it very important to configure littlefs
correctly at format time. If littlefs is formatted on a PC without
changing the limits appropriately, it will be rejected by a smaller
device.
2018-04-01 20:36:29 +00:00
|
|
|
}
|
|
|
|
|
2017-05-14 17:01:45 +00:00
|
|
|
// setup default state
|
2019-08-03 14:17:47 +00:00
|
|
|
lfs->root[0] = LFS_BLOCK_NULL;
|
|
|
|
lfs->root[1] = LFS_BLOCK_NULL;
|
2018-08-01 15:24:59 +00:00
|
|
|
lfs->mlist = NULL;
|
2018-08-09 14:06:17 +00:00
|
|
|
lfs->seed = 0;
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs->gdisk = (lfs_gstate_t){0};
|
|
|
|
lfs->gstate = (lfs_gstate_t){0};
|
|
|
|
lfs->gdelta = (lfs_gstate_t){0};
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
#ifdef LFS_MIGRATE
|
|
|
|
lfs->lfs1 = NULL;
|
|
|
|
#endif
|
2017-04-29 15:22:01 +00:00
|
|
|
|
2017-04-22 18:30:40 +00:00
|
|
|
return 0;
|
|
|
|
|
2018-07-18 14:50:00 +00:00
|
|
|
cleanup:
|
|
|
|
lfs_deinit(lfs);
|
2018-08-05 01:33:09 +00:00
|
|
|
return err;
|
2017-04-22 18:30:40 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs_deinit(lfs_t *lfs) {
|
2017-05-14 17:01:45 +00:00
|
|
|
// free allocated memory
|
2017-04-22 18:30:40 +00:00
|
|
|
if (!lfs->cfg->read_buffer) {
|
2018-01-29 21:20:12 +00:00
|
|
|
lfs_free(lfs->rcache.buffer);
|
2017-04-22 18:30:40 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!lfs->cfg->prog_buffer) {
|
2018-01-29 21:20:12 +00:00
|
|
|
lfs_free(lfs->pcache.buffer);
|
2017-04-22 18:30:40 +00:00
|
|
|
}
|
|
|
|
|
2017-04-29 17:50:23 +00:00
|
|
|
if (!lfs->cfg->lookahead_buffer) {
|
2018-01-29 21:20:12 +00:00
|
|
|
lfs_free(lfs->free.buffer);
|
2017-04-29 17:50:23 +00:00
|
|
|
}
|
|
|
|
|
2017-04-22 18:30:40 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int lfs_format(lfs_t *lfs, const struct lfs_config *cfg) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_format(%p, %p {.context=%p, "
|
|
|
|
".read=%p, .prog=%p, .erase=%p, .sync=%p, "
|
|
|
|
".read_size=%"PRIu32", .prog_size=%"PRIu32", "
|
|
|
|
".block_size=%"PRIu32", .block_count=%"PRIu32", "
|
|
|
|
".block_cycles=%"PRIu32", .cache_size=%"PRIu32", "
|
|
|
|
".lookahead_size=%"PRIu32", .read_buffer=%p, "
|
|
|
|
".prog_buffer=%p, .lookahead_buffer=%p, "
|
|
|
|
".name_max=%"PRIu32", .file_max=%"PRIu32", "
|
|
|
|
".attr_max=%"PRIu32"})",
|
|
|
|
(void*)lfs, (void*)cfg, cfg->context,
|
|
|
|
(void*)(uintptr_t)cfg->read, (void*)(uintptr_t)cfg->prog,
|
|
|
|
(void*)(uintptr_t)cfg->erase, (void*)(uintptr_t)cfg->sync,
|
|
|
|
cfg->read_size, cfg->prog_size, cfg->block_size, cfg->block_count,
|
|
|
|
cfg->block_cycles, cfg->cache_size, cfg->lookahead_size,
|
|
|
|
cfg->read_buffer, cfg->prog_buffer, cfg->lookahead_buffer,
|
|
|
|
cfg->name_max, cfg->file_max, cfg->attr_max);
|
2018-09-26 15:11:40 +00:00
|
|
|
int err = 0;
|
2019-04-09 23:37:53 +00:00
|
|
|
{
|
2018-09-26 15:11:40 +00:00
|
|
|
err = lfs_init(lfs, cfg);
|
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_format -> %d", err);
|
2018-09-26 15:11:40 +00:00
|
|
|
return err;
|
|
|
|
}
|
2017-02-27 00:05:27 +00:00
|
|
|
|
2018-09-26 15:11:40 +00:00
|
|
|
// create free lookahead
|
2018-10-21 02:02:25 +00:00
|
|
|
memset(lfs->free.buffer, 0, lfs->cfg->lookahead_size);
|
2018-09-26 15:11:40 +00:00
|
|
|
lfs->free.off = 0;
|
2018-10-21 02:02:25 +00:00
|
|
|
lfs->free.size = lfs_min(8*lfs->cfg->lookahead_size,
|
|
|
|
lfs->cfg->block_count);
|
2018-09-26 15:11:40 +00:00
|
|
|
lfs->free.i = 0;
|
|
|
|
lfs_alloc_ack(lfs);
|
2017-03-20 03:00:56 +00:00
|
|
|
|
2018-10-21 02:02:25 +00:00
|
|
|
// create root dir
|
|
|
|
lfs_mdir_t root;
|
2018-09-26 15:11:40 +00:00
|
|
|
err = lfs_dir_alloc(lfs, &root);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
2017-04-18 03:27:06 +00:00
|
|
|
|
2018-10-21 02:02:25 +00:00
|
|
|
// write one superblock
|
2018-09-26 15:11:40 +00:00
|
|
|
lfs_superblock_t superblock = {
|
2018-10-21 02:02:25 +00:00
|
|
|
.version = LFS_DISK_VERSION,
|
|
|
|
.block_size = lfs->cfg->block_size,
|
|
|
|
.block_count = lfs->cfg->block_count,
|
|
|
|
.name_max = lfs->name_max,
|
|
|
|
.file_max = lfs->file_max,
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
.attr_max = lfs->attr_max,
|
2018-09-26 15:11:40 +00:00
|
|
|
};
|
2017-02-27 00:05:27 +00:00
|
|
|
|
2018-10-21 02:02:25 +00:00
|
|
|
lfs_superblock_tole32(&superblock);
|
2019-07-21 09:34:53 +00:00
|
|
|
err = lfs_dir_commit(lfs, &root, LFS_MKATTRS(
|
2020-01-20 23:35:45 +00:00
|
|
|
{LFS_MKTAG(LFS_TYPE_CREATE, 0, 0)},
|
2019-01-08 14:52:03 +00:00
|
|
|
{LFS_MKTAG(LFS_TYPE_SUPERBLOCK, 0, 8), "littlefs"},
|
|
|
|
{LFS_MKTAG(LFS_TYPE_INLINESTRUCT, 0, sizeof(superblock)),
|
|
|
|
&superblock}));
|
2018-10-21 02:02:25 +00:00
|
|
|
if (err) {
|
2018-09-26 15:11:40 +00:00
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
// sanity check that fetch works
|
2018-10-21 02:02:25 +00:00
|
|
|
err = lfs_dir_fetch(lfs, &root, (const lfs_block_t[2]){0, 1});
|
2018-09-26 15:11:40 +00:00
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
2019-05-28 23:16:51 +00:00
|
|
|
|
|
|
|
// force compaction to prevent accidentally mounting any
|
|
|
|
// older version of littlefs that may live on disk
|
|
|
|
root.erased = false;
|
|
|
|
err = lfs_dir_commit(lfs, &root, NULL, 0);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
2017-04-22 18:30:40 +00:00
|
|
|
}
|
|
|
|
|
2018-07-18 14:50:00 +00:00
|
|
|
cleanup:
|
|
|
|
lfs_deinit(lfs);
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_format -> %d", err);
|
2018-07-18 14:50:00 +00:00
|
|
|
return err;
|
2017-02-27 00:05:27 +00:00
|
|
|
}
|
|
|
|
|
2017-04-22 18:30:40 +00:00
|
|
|
int lfs_mount(lfs_t *lfs, const struct lfs_config *cfg) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mount(%p, %p {.context=%p, "
|
|
|
|
".read=%p, .prog=%p, .erase=%p, .sync=%p, "
|
|
|
|
".read_size=%"PRIu32", .prog_size=%"PRIu32", "
|
|
|
|
".block_size=%"PRIu32", .block_count=%"PRIu32", "
|
|
|
|
".block_cycles=%"PRIu32", .cache_size=%"PRIu32", "
|
|
|
|
".lookahead_size=%"PRIu32", .read_buffer=%p, "
|
|
|
|
".prog_buffer=%p, .lookahead_buffer=%p, "
|
|
|
|
".name_max=%"PRIu32", .file_max=%"PRIu32", "
|
|
|
|
".attr_max=%"PRIu32"})",
|
|
|
|
(void*)lfs, (void*)cfg, cfg->context,
|
|
|
|
(void*)(uintptr_t)cfg->read, (void*)(uintptr_t)cfg->prog,
|
|
|
|
(void*)(uintptr_t)cfg->erase, (void*)(uintptr_t)cfg->sync,
|
|
|
|
cfg->read_size, cfg->prog_size, cfg->block_size, cfg->block_count,
|
|
|
|
cfg->block_cycles, cfg->cache_size, cfg->lookahead_size,
|
|
|
|
cfg->read_buffer, cfg->prog_buffer, cfg->lookahead_buffer,
|
|
|
|
cfg->name_max, cfg->file_max, cfg->attr_max);
|
2017-04-22 18:30:40 +00:00
|
|
|
int err = lfs_init(lfs, cfg);
|
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mount -> %d", err);
|
2017-04-22 18:30:40 +00:00
|
|
|
return err;
|
|
|
|
}
|
2017-03-13 00:41:08 +00:00
|
|
|
|
2018-09-12 06:50:21 +00:00
|
|
|
// scan directory blocks for superblock and any global updates
|
|
|
|
lfs_mdir_t dir = {.tail = {0, 1}};
|
|
|
|
while (!lfs_pair_isnull(dir.tail)) {
|
|
|
|
// fetch next block in tail list
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_stag_t tag = lfs_dir_fetchmatch(lfs, &dir, dir.tail,
|
|
|
|
LFS_MKTAG(0x7ff, 0x3ff, 0),
|
2018-09-12 06:50:21 +00:00
|
|
|
LFS_MKTAG(LFS_TYPE_SUPERBLOCK, 0, 8),
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
NULL,
|
2018-09-12 06:50:21 +00:00
|
|
|
lfs_dir_find_match, &(struct lfs_dir_find_match){
|
|
|
|
lfs, "littlefs", 8});
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
if (tag < 0) {
|
|
|
|
err = tag;
|
2018-09-12 06:50:21 +00:00
|
|
|
goto cleanup;
|
|
|
|
}
|
2017-10-07 21:56:00 +00:00
|
|
|
|
2018-09-12 06:50:21 +00:00
|
|
|
// has superblock?
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
if (tag && !lfs_tag_isdelete(tag)) {
|
2018-09-12 06:50:21 +00:00
|
|
|
// update root
|
|
|
|
lfs->root[0] = dir.pair[0];
|
|
|
|
lfs->root[1] = dir.pair[1];
|
|
|
|
|
|
|
|
// grab superblock
|
|
|
|
lfs_superblock_t superblock;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
tag = lfs_dir_get(lfs, &dir, LFS_MKTAG(0x7ff, 0x3ff, 0),
|
2018-10-04 21:25:19 +00:00
|
|
|
LFS_MKTAG(LFS_TYPE_INLINESTRUCT, 0, sizeof(superblock)),
|
2018-09-12 06:50:21 +00:00
|
|
|
&superblock);
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
if (tag < 0) {
|
|
|
|
err = tag;
|
2018-09-12 06:50:21 +00:00
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
lfs_superblock_fromle32(&superblock);
|
2018-03-23 21:11:36 +00:00
|
|
|
|
2018-09-12 06:50:21 +00:00
|
|
|
// check version
|
|
|
|
uint16_t major_version = (0xffff & (superblock.version >> 16));
|
|
|
|
uint16_t minor_version = (0xffff & (superblock.version >> 0));
|
|
|
|
if ((major_version != LFS_DISK_VERSION_MAJOR ||
|
|
|
|
minor_version > LFS_DISK_VERSION_MINOR)) {
|
2019-01-22 22:21:16 +00:00
|
|
|
LFS_ERROR("Invalid version %"PRIu16".%"PRIu16,
|
2018-09-12 06:50:21 +00:00
|
|
|
major_version, minor_version);
|
|
|
|
err = LFS_ERR_INVAL;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
Added root entry and expanding superblocks
Expanding superblocks has been on my wishlist for a while. The basic
idea is that instead of maintaining a fixed offset blocks {0, 1} to the
the root directory (1 pointer), we maintain a dynamically sized
linked-list of superblocks that point to the actual root. If the number
of writes to the root exceeds some value, we increase the size of the
superblock linked-list.
This can leverage existing metadata-pair operations. The revision count for
metadata-pairs provides some knowledge on how much wear we've put on the
superblock, and the threaded linked-list can also be reused for this
purpose. This means superblock expansion is both optional and cheap to
implement.
Expanding superblocks helps both extremely small and extremely large filesystem
(extreme being relative of course). On the small end, we can actually
collapse the superblock into the root directory and drop the hard requirement
of 4-blocks for the superblock. On the large end, our superblock will
now last longer than the rest of the filesystem. Each time we expand,
the number of cycles until the superblock dies is increased by a power.
Before we were stuck with this layout:
level cycles limit layout
1 E^2 390 MiB s0 -> root
Now we expand every time a fixed offset is exceeded:
level cycles limit layout
0 E 4 KiB s0+root
1 E^2 390 MiB s0 -> root
2 E^3 37 TiB s0 -> s1 -> root
3 E^4 3.6 EiB s0 -> s1 -> s2 -> root
...
Where the cycles are the number of cycles before death, and the limit is
the worst-case size a filesystem where early superblock death becomes a
concern (all writes to root using this formula: E^|s| = E*B, E = erase
cycles = 100000, B = block count, assuming 4096 byte blocks).
Note we can also store copies of the superblock entry on the expanded
superblocks. This may help filesystem recover tools in the future.
2018-08-06 18:30:51 +00:00
|
|
|
|
2018-09-12 06:50:21 +00:00
|
|
|
// check superblock configuration
|
|
|
|
if (superblock.name_max) {
|
|
|
|
if (superblock.name_max > lfs->name_max) {
|
|
|
|
LFS_ERROR("Unsupported name_max (%"PRIu32" > %"PRIu32")",
|
|
|
|
superblock.name_max, lfs->name_max);
|
|
|
|
err = LFS_ERR_INVAL;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
Added disk-backed limits on the name/attrs/inline sizes
Being a portable, microcontroller-scale embedded filesystem, littlefs is
presented with a relatively unique challenge. The amount of RAM
available is on completely different scales from machine to machine, and
what is normally a reasonable RAM assumption may break completely on an
embedded system.
A great example of this is file names. On almost every PC these days, the limit
for a file name is 255 bytes. It's a very convenient limit for a number
of reasons. However, on microcontrollers, allocating 255 bytes of RAM to
do a file search can be unreasonable.
The simplest solution (and one that has existing in littlefs for a
while), is to let this limit be redefined to a smaller value on devices
that need to save RAM. However, this presents an interesting portability
issue. If these devices are plugged into a PC with relatively infinite
RAM, nothing stops the PC from writing files with full 255-byte file
names, which can't be read on the small device.
One solution here is to store this limit on the superblock during format
time. When mounting a disk, the filesystem implementation is responsible for
checking this limit in the superblock. If it's larger than what can be
read, raise an error. If it's smaller, respect the limit on the
superblock and raise an error if the user attempts to exceed it.
In this commit, this strategy is adopted for file names, inline files,
and the size of all attributes, since these could impact the memory
consumption of the filesystem. (Recording the attribute's limit is
iffy, but is the only other arbitrary limit and could be used for disabling
support of custom attributes).
Note! This changes makes it very important to configure littlefs
correctly at format time. If littlefs is formatted on a PC without
changing the limits appropriately, it will be rejected by a smaller
device.
2018-04-01 20:36:29 +00:00
|
|
|
|
2018-09-12 06:50:21 +00:00
|
|
|
lfs->name_max = superblock.name_max;
|
|
|
|
}
|
Added disk-backed limits on the name/attrs/inline sizes
Being a portable, microcontroller-scale embedded filesystem, littlefs is
presented with a relatively unique challenge. The amount of RAM
available is on completely different scales from machine to machine, and
what is normally a reasonable RAM assumption may break completely on an
embedded system.
A great example of this is file names. On almost every PC these days, the limit
for a file name is 255 bytes. It's a very convenient limit for a number
of reasons. However, on microcontrollers, allocating 255 bytes of RAM to
do a file search can be unreasonable.
The simplest solution (and one that has existing in littlefs for a
while), is to let this limit be redefined to a smaller value on devices
that need to save RAM. However, this presents an interesting portability
issue. If these devices are plugged into a PC with relatively infinite
RAM, nothing stops the PC from writing files with full 255-byte file
names, which can't be read on the small device.
One solution here is to store this limit on the superblock during format
time. When mounting a disk, the filesystem implementation is responsible for
checking this limit in the superblock. If it's larger than what can be
read, raise an error. If it's smaller, respect the limit on the
superblock and raise an error if the user attempts to exceed it.
In this commit, this strategy is adopted for file names, inline files,
and the size of all attributes, since these could impact the memory
consumption of the filesystem. (Recording the attribute's limit is
iffy, but is the only other arbitrary limit and could be used for disabling
support of custom attributes).
Note! This changes makes it very important to configure littlefs
correctly at format time. If littlefs is formatted on a PC without
changing the limits appropriately, it will be rejected by a smaller
device.
2018-04-01 20:36:29 +00:00
|
|
|
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
if (superblock.file_max) {
|
|
|
|
if (superblock.file_max > lfs->file_max) {
|
|
|
|
LFS_ERROR("Unsupported file_max (%"PRIu32" > %"PRIu32")",
|
|
|
|
superblock.file_max, lfs->file_max);
|
2018-09-12 06:50:21 +00:00
|
|
|
err = LFS_ERR_INVAL;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
Added disk-backed limits on the name/attrs/inline sizes
Being a portable, microcontroller-scale embedded filesystem, littlefs is
presented with a relatively unique challenge. The amount of RAM
available is on completely different scales from machine to machine, and
what is normally a reasonable RAM assumption may break completely on an
embedded system.
A great example of this is file names. On almost every PC these days, the limit
for a file name is 255 bytes. It's a very convenient limit for a number
of reasons. However, on microcontrollers, allocating 255 bytes of RAM to
do a file search can be unreasonable.
The simplest solution (and one that has existing in littlefs for a
while), is to let this limit be redefined to a smaller value on devices
that need to save RAM. However, this presents an interesting portability
issue. If these devices are plugged into a PC with relatively infinite
RAM, nothing stops the PC from writing files with full 255-byte file
names, which can't be read on the small device.
One solution here is to store this limit on the superblock during format
time. When mounting a disk, the filesystem implementation is responsible for
checking this limit in the superblock. If it's larger than what can be
read, raise an error. If it's smaller, respect the limit on the
superblock and raise an error if the user attempts to exceed it.
In this commit, this strategy is adopted for file names, inline files,
and the size of all attributes, since these could impact the memory
consumption of the filesystem. (Recording the attribute's limit is
iffy, but is the only other arbitrary limit and could be used for disabling
support of custom attributes).
Note! This changes makes it very important to configure littlefs
correctly at format time. If littlefs is formatted on a PC without
changing the limits appropriately, it will be rejected by a smaller
device.
2018-04-01 20:36:29 +00:00
|
|
|
|
Added support for RAM-independent reading of inline files
One of the new features in LittleFS is "inline files", which is the
inlining of small files in the parent directory. Inline files have a big
limitation in that they no longer have a dedicated scratch area to write
out data before commit-time. This is fine as long as inline files are
small enough to fit in RAM.
However, this dependency on RAM creates an uncomfortable situation for
portability, with larger devices able to create larger files than
smaller devices. This problem is especially important on embedded
systems, where RAM is at a premium.
Recently, I realized this RAM requirement is necessary for _writing_
inline files, but not for _reading_ inline files. By allowing fetches of
specific slices of inline files it's possible to read inline files
without the RAM to back it.
However however, this creates a conflict with COW semantics. Normally,
when a file is open twice, it is referenced by a COW data structure that
can be updated independently. Inlines files that fit in RAM also allows
independent updates, but the moment an inline file can't fit in
RAM, any updates to that directory block could corrupt open files
referencing the inline file. The fact that this behaviour is only
inconsistent for inline files created on a different device with more
RAM creates a potential nightmare for user experience.
Fortunately, there is a workaround for this. When we are commiting to a
directory, any open files needs to live in a COW structure or in RAM.
While we could move large inline files to COW structures at open time,
this would break the separation of read/write operations and could lead
to write errors at read time (ie ENOSPC). But since this is only an
issue for commits, we can defer the move to a COW structure to any
commits to that directory. This means when committing to a directory we
need to find any _open_ large inline files and evict them from the
directory, leaving the file with a new COW structure even if it was
opened read only.
While complicated, the end result is inline files that can use the
MAX RAM that is available, but can be read with MIN RAM, even with
multiple write operations happening to the underlying directory block.
This prevents users from needing to learn the idiosyncrasies of inline
files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
|
|
|
lfs->file_max = superblock.file_max;
|
2018-09-12 06:50:21 +00:00
|
|
|
}
|
2018-10-02 23:28:37 +00:00
|
|
|
|
|
|
|
if (superblock.attr_max) {
|
|
|
|
if (superblock.attr_max > lfs->attr_max) {
|
|
|
|
LFS_ERROR("Unsupported attr_max (%"PRIu32" > %"PRIu32")",
|
|
|
|
superblock.attr_max, lfs->attr_max);
|
|
|
|
err = LFS_ERR_INVAL;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
lfs->attr_max = superblock.attr_max;
|
|
|
|
}
|
2018-08-04 21:04:24 +00:00
|
|
|
}
|
|
|
|
|
2019-01-04 23:23:36 +00:00
|
|
|
// has gstate?
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
err = lfs_dir_getgstate(lfs, &dir, &lfs->gstate);
|
2018-09-15 03:02:39 +00:00
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
goto cleanup;
|
2018-09-12 06:34:03 +00:00
|
|
|
}
|
2018-08-04 00:01:27 +00:00
|
|
|
}
|
|
|
|
|
2018-10-04 21:25:19 +00:00
|
|
|
// found superblock?
|
|
|
|
if (lfs_pair_isnull(lfs->root)) {
|
|
|
|
err = LFS_ERR_INVAL;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
2019-01-04 23:23:36 +00:00
|
|
|
// update littlefs with gstate
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
if (!lfs_gstate_iszero(&lfs->gstate)) {
|
|
|
|
LFS_DEBUG("Found pending gstate %08"PRIx32" %08"PRIx32" %08"PRIx32,
|
|
|
|
lfs->gstate.tag,
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs->gstate.pair[0],
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs->gstate.pair[1]);
|
2018-07-02 03:29:42 +00:00
|
|
|
}
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs->gstate.tag += !lfs_tag_isvalid(lfs->gstate.tag);
|
|
|
|
lfs->gdisk = lfs->gstate;
|
2018-07-02 03:29:42 +00:00
|
|
|
|
2018-08-09 14:06:17 +00:00
|
|
|
// setup free lookahead
|
|
|
|
lfs->free.off = lfs->seed % lfs->cfg->block_size;
|
|
|
|
lfs->free.size = 0;
|
|
|
|
lfs->free.i = 0;
|
|
|
|
lfs_alloc_ack(lfs);
|
|
|
|
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mount -> %d", 0);
|
2017-10-07 21:56:00 +00:00
|
|
|
return 0;
|
2018-07-18 14:50:00 +00:00
|
|
|
|
|
|
|
cleanup:
|
2018-08-05 01:33:09 +00:00
|
|
|
lfs_unmount(lfs);
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_mount -> %d", err);
|
2018-07-18 14:50:00 +00:00
|
|
|
return err;
|
2017-04-01 15:44:17 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
int lfs_unmount(lfs_t *lfs) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_unmount(%p)", (void*)lfs);
|
|
|
|
int err = lfs_deinit(lfs);
|
|
|
|
LFS_TRACE("lfs_unmount -> %d", err);
|
|
|
|
return err;
|
2017-04-01 15:44:17 +00:00
|
|
|
}
|
|
|
|
|
2017-04-24 02:40:03 +00:00
|
|
|
|
2018-08-01 10:52:48 +00:00
|
|
|
/// Filesystem filesystem operations ///
|
2020-02-09 15:05:37 +00:00
|
|
|
int lfs_fs_traverseraw(lfs_t *lfs,
|
|
|
|
int (*cb)(void *data, lfs_block_t block), void *data,
|
|
|
|
bool includeorphans) {
|
2018-05-21 05:56:20 +00:00
|
|
|
// iterate over metadata pairs
|
2018-05-29 06:11:26 +00:00
|
|
|
lfs_mdir_t dir = {.tail = {0, 1}};
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
|
|
|
|
#ifdef LFS_MIGRATE
|
|
|
|
// also consider v1 blocks during migration
|
|
|
|
if (lfs->lfs1) {
|
|
|
|
int err = lfs1_traverse(lfs, cb, data);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
dir.tail[0] = lfs->root[0];
|
|
|
|
dir.tail[1] = lfs->root[1];
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2018-08-05 04:57:43 +00:00
|
|
|
while (!lfs_pair_isnull(dir.tail)) {
|
2018-05-21 05:56:20 +00:00
|
|
|
for (int i = 0; i < 2; i++) {
|
2018-07-30 19:40:27 +00:00
|
|
|
int err = cb(data, dir.tail[i]);
|
2018-05-21 05:56:20 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// iterate through ids in directory
|
2018-05-26 18:50:06 +00:00
|
|
|
int err = lfs_dir_fetch(lfs, &dir, dir.tail);
|
2018-05-21 05:56:20 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-05-26 18:50:06 +00:00
|
|
|
for (uint16_t id = 0; id < dir.count; id++) {
|
2018-07-13 01:43:55 +00:00
|
|
|
struct lfs_ctz ctz;
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_stag_t tag = lfs_dir_get(lfs, &dir, LFS_MKTAG(0x700, 0x3ff, 0),
|
2018-07-13 01:43:55 +00:00
|
|
|
LFS_MKTAG(LFS_TYPE_STRUCT, id, sizeof(ctz)), &ctz);
|
|
|
|
if (tag < 0) {
|
|
|
|
if (tag == LFS_ERR_NOENT) {
|
2018-05-21 05:56:20 +00:00
|
|
|
continue;
|
|
|
|
}
|
2018-07-13 01:43:55 +00:00
|
|
|
return tag;
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
2018-08-05 04:57:43 +00:00
|
|
|
lfs_ctz_fromle32(&ctz);
|
2018-05-21 05:56:20 +00:00
|
|
|
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (lfs_tag_type3(tag) == LFS_TYPE_CTZSTRUCT) {
|
2018-08-05 04:57:43 +00:00
|
|
|
err = lfs_ctz_traverse(lfs, NULL, &lfs->rcache,
|
2018-07-13 01:43:55 +00:00
|
|
|
ctz.head, ctz.size, cb, data);
|
2018-05-26 18:50:06 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2020-02-09 15:05:37 +00:00
|
|
|
} else if (includeorphans &&
|
Fixed more bugs, mostly related to ENOSPC on different geometries
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
2020-01-29 07:45:19 +00:00
|
|
|
lfs_tag_type3(tag) == LFS_TYPE_DIRSTRUCT) {
|
|
|
|
for (int i = 0; i < 2; i++) {
|
|
|
|
err = cb(data, (&ctz.head)[i]);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// iterate over any open files
|
2018-08-01 15:24:59 +00:00
|
|
|
for (lfs_file_t *f = (lfs_file_t*)lfs->mlist; f; f = f->next) {
|
|
|
|
if (f->type != LFS_TYPE_REG) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2018-05-21 05:56:20 +00:00
|
|
|
if ((f->flags & LFS_F_DIRTY) && !(f->flags & LFS_F_INLINE)) {
|
2018-08-05 04:57:43 +00:00
|
|
|
int err = lfs_ctz_traverse(lfs, &f->cache, &lfs->rcache,
|
2018-07-13 01:43:55 +00:00
|
|
|
f->ctz.head, f->ctz.size, cb, data);
|
2018-05-26 18:50:06 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if ((f->flags & LFS_F_WRITING) && !(f->flags & LFS_F_INLINE)) {
|
2018-08-05 04:57:43 +00:00
|
|
|
int err = lfs_ctz_traverse(lfs, &f->cache, &lfs->rcache,
|
2018-05-26 18:50:06 +00:00
|
|
|
f->block, f->pos, cb, data);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2017-04-01 15:44:17 +00:00
|
|
|
|
2020-02-09 15:05:37 +00:00
|
|
|
int lfs_fs_traverse(lfs_t *lfs,
|
|
|
|
int (*cb)(void *data, lfs_block_t block), void *data) {
|
|
|
|
LFS_TRACE("lfs_fs_traverse(%p, %p, %p)",
|
|
|
|
(void*)lfs, (void*)(uintptr_t)cb, data);
|
|
|
|
int err = lfs_fs_traverseraw(lfs, cb, data, true);
|
|
|
|
LFS_TRACE("lfs_fs_traverse -> %d", 0);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2018-08-01 10:52:48 +00:00
|
|
|
static int lfs_fs_pred(lfs_t *lfs,
|
|
|
|
const lfs_block_t pair[2], lfs_mdir_t *pdir) {
|
2018-05-26 18:50:06 +00:00
|
|
|
// iterate over all directory directory entries
|
2018-05-21 05:56:20 +00:00
|
|
|
pdir->tail[0] = 0;
|
|
|
|
pdir->tail[1] = 1;
|
2018-08-05 04:57:43 +00:00
|
|
|
while (!lfs_pair_isnull(pdir->tail)) {
|
|
|
|
if (lfs_pair_cmp(pdir->tail, pair) == 0) {
|
2018-07-13 01:43:55 +00:00
|
|
|
return 0;
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
|
|
|
|
2018-05-26 18:50:06 +00:00
|
|
|
int err = lfs_dir_fetch(lfs, pdir, pdir->tail);
|
2018-05-21 05:56:20 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-07-13 01:43:55 +00:00
|
|
|
return LFS_ERR_NOENT;
|
2018-05-21 05:56:20 +00:00
|
|
|
}
|
Introduced xored-globals logic to fix fundamental problem with moves
This was a big roadblock for a while: with the new feature of inlined
files, the existing move logic was fundamentally flawed.
To pull off atomic moves between two different metadata-pairs, littlefs
uses a simple, if a bit clumsy trick.
1. Marks entry as "moving"
2. Copies entry to new metadata-pair
3. Deletes old entry
If power is lost before the move operation is completed, we will find the
"moving" tag. This means there may or may not be an incomplete move on
the filesystem. In this case, we simply search for the moved entry, if
we find it, we remove the old entry, otherwise we just remove the
"moving" tag.
This worked perfectly, until we introduced inlined files. See, unlike
the existing directory and ctz entries, inlined files have no guarantee
they are unique. There is nothing we can search for that will allow us
to find a moved file unless we assign entries globally-unique ids. (note
that moves are fundamentally rename operations, so searching for names
does not make sense).
---
Solving this problem required completely restructuring how littlefs
handled moves and pulled out a really old idea that had been left in the
cutting room floor back when littlefs was going through many
designs: xored-globals.
The problem xored-globals solves is the need to maintain some global state
via commits to these distributed, independent metadata-pairs. The idea
is that we can use some sort of symmetric operation, such as xor, to
introduces deltas of the global state that can be committed atomically
along with any other info to these metadata-pairs.
This means that to figure out our global state, we xor together the global
delta stored in every metadata-pair.
Which means any commit can update the global state atomically, opening
up a whole new set atomic possibilities.
There is a couple of downsides. These globals may end up with deltas on
every single metadata-pair, effectively duplicating the data for each
block. Additionally, these globals need to have multiple copies in RAM.
This means and globals need to be a bounded size and very small, since even
small globals will have a large footprint.
---
On top of xored-globals, it's trivial to fix our move logic. Here we've
added an indirect delete tag which allows us to atomically specify a
delete of any entry on the filesystem.
Our move operation is now:
1. Copy entry to new metadata-pair and atomically xor globals to
indirectly delete our original entry.
2. Delete the original entry and xor globals to remove the indirect
delete.
Extra exciting is that this now takes our relatively clumsy move
operation into a sexy guaranteed O(1) move operation with no searching
necessary (though we do need to xor globals during mount).
Also reintroduced entry struct, now with a specific purpose to describe
the metadata-pair + id combo needed by indirect deletes to locate an
entry.
2018-05-29 17:35:23 +00:00
|
|
|
|
2018-09-11 03:07:59 +00:00
|
|
|
struct lfs_fs_parent_match {
|
|
|
|
lfs_t *lfs;
|
|
|
|
const lfs_block_t pair[2];
|
|
|
|
};
|
|
|
|
|
|
|
|
static int lfs_fs_parent_match(void *data,
|
|
|
|
lfs_tag_t tag, const void *buffer) {
|
|
|
|
struct lfs_fs_parent_match *find = data;
|
|
|
|
lfs_t *lfs = find->lfs;
|
|
|
|
const struct lfs_diskoff *disk = buffer;
|
|
|
|
(void)tag;
|
|
|
|
|
|
|
|
lfs_block_t child[2];
|
|
|
|
int err = lfs_bd_read(lfs,
|
|
|
|
&lfs->pcache, &lfs->rcache, lfs->cfg->block_size,
|
|
|
|
disk->block, disk->off, &child, sizeof(child));
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
lfs_pair_fromle32(child);
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
return (lfs_pair_cmp(child, find->pair) == 0) ? LFS_CMP_EQ : LFS_CMP_LT;
|
2018-09-11 03:07:59 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static lfs_stag_t lfs_fs_parent(lfs_t *lfs, const lfs_block_t pair[2],
|
2018-07-13 01:22:06 +00:00
|
|
|
lfs_mdir_t *parent) {
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
// use fetchmatch with callback to find pairs
|
|
|
|
parent->tail[0] = 0;
|
|
|
|
parent->tail[1] = 1;
|
|
|
|
while (!lfs_pair_isnull(parent->tail)) {
|
|
|
|
lfs_stag_t tag = lfs_dir_fetchmatch(lfs, parent, parent->tail,
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
LFS_MKTAG(0x7ff, 0, 0x3ff),
|
|
|
|
LFS_MKTAG(LFS_TYPE_DIRSTRUCT, 0, 8),
|
|
|
|
NULL,
|
Switched to strongly ordered directories
Instead of storing files in an arbitrary order, we now store files in
ascending lexicographical order by filename.
Although a big change, this actually has little impact on how littlefs
works internally. We need to support file insertion, and compare file
names to find our position. But since we already need to scan the entire
directory block, this adds relatively little overhead.
What this does allow, is the potential to add B-tree support in the
future in a backwards compatible manner.
How could you add B-trees to littlefs?
1. Add an optional "child" tag with a pointer that allows you to skip to
a position in the metadata-pair list that composes the directory
2. When splitting a metadata-pair (sound familiar?), we either insert a
second child tag in our parent, or we create a new root containing
the child tags.
3. Each layer needs a bit stored in the tail-pointer to indicate if
we're going to the next layer. This can be created trivially when we
create a new root.
4. During lookup we keep two pointers containing the bounds of our
search. We may need to iterate through multiple metadata-pairs in our
linked-list, but this gives us a O(log n) lookup cost in a balanced
tree.
5. During deletion we also delete any children pointers. Note that
children pointers must come before the actual file entry.
This gives us a B-tree implementation that is compatible with the
current directory layout (assuming the files are ordered). This means
that B-trees could be supported by a host PC and ignored on a small
device. And during power-loss, we never end up with a broken filesystem,
just a less-than-optimal tree.
Note that we don't handle removes, so it's possible for a tree to become
unbalanced. But worst case that's the same as the current linked-list
implementation.
All we need to do now is keep directories ordered. If we decide to drop
B-tree support in the future or the B-tree implementation turns out
inherently flawed, we can just drop the ordered requirement without
breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
|
|
|
lfs_fs_parent_match, &(struct lfs_fs_parent_match){
|
|
|
|
lfs, {pair[0], pair[1]}});
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
if (tag && tag != LFS_ERR_NOENT) {
|
Added root entry and expanding superblocks
Expanding superblocks has been on my wishlist for a while. The basic
idea is that instead of maintaining a fixed offset blocks {0, 1} to the
the root directory (1 pointer), we maintain a dynamically sized
linked-list of superblocks that point to the actual root. If the number
of writes to the root exceeds some value, we increase the size of the
superblock linked-list.
This can leverage existing metadata-pair operations. The revision count for
metadata-pairs provides some knowledge on how much wear we've put on the
superblock, and the threaded linked-list can also be reused for this
purpose. This means superblock expansion is both optional and cheap to
implement.
Expanding superblocks helps both extremely small and extremely large filesystem
(extreme being relative of course). On the small end, we can actually
collapse the superblock into the root directory and drop the hard requirement
of 4-blocks for the superblock. On the large end, our superblock will
now last longer than the rest of the filesystem. Each time we expand,
the number of cycles until the superblock dies is increased by a power.
Before we were stuck with this layout:
level cycles limit layout
1 E^2 390 MiB s0 -> root
Now we expand every time a fixed offset is exceeded:
level cycles limit layout
0 E 4 KiB s0+root
1 E^2 390 MiB s0 -> root
2 E^3 37 TiB s0 -> s1 -> root
3 E^4 3.6 EiB s0 -> s1 -> s2 -> root
...
Where the cycles are the number of cycles before death, and the limit is
the worst-case size a filesystem where early superblock death becomes a
concern (all writes to root using this formula: E^|s| = E*B, E = erase
cycles = 100000, B = block count, assuming 4096 byte blocks).
Note we can also store copies of the superblock entry on the expanded
superblocks. This may help filesystem recover tools in the future.
2018-08-06 18:30:51 +00:00
|
|
|
return tag;
|
2017-04-01 17:23:15 +00:00
|
|
|
}
|
2017-05-14 17:01:45 +00:00
|
|
|
}
|
2017-04-01 17:23:15 +00:00
|
|
|
|
2018-07-11 12:18:30 +00:00
|
|
|
return LFS_ERR_NOENT;
|
2017-05-14 17:01:45 +00:00
|
|
|
}
|
2018-05-29 05:50:47 +00:00
|
|
|
|
2018-08-01 10:52:48 +00:00
|
|
|
static int lfs_fs_relocate(lfs_t *lfs,
|
2018-08-01 23:10:24 +00:00
|
|
|
const lfs_block_t oldpair[2], lfs_block_t newpair[2]) {
|
2018-08-01 15:24:59 +00:00
|
|
|
// update internal root
|
2018-08-05 04:57:43 +00:00
|
|
|
if (lfs_pair_cmp(oldpair, lfs->root) == 0) {
|
2019-07-27 01:09:24 +00:00
|
|
|
LFS_DEBUG("Relocating root %"PRIx32" %"PRIx32,
|
2018-08-05 01:33:09 +00:00
|
|
|
newpair[0], newpair[1]);
|
2018-08-01 15:24:59 +00:00
|
|
|
lfs->root[0] = newpair[0];
|
|
|
|
lfs->root[1] = newpair[1];
|
|
|
|
}
|
|
|
|
|
|
|
|
// update internally tracked dirs
|
2018-09-11 03:07:59 +00:00
|
|
|
for (struct lfs_mlist *d = lfs->mlist; d; d = d->next) {
|
2018-08-05 04:57:43 +00:00
|
|
|
if (lfs_pair_cmp(oldpair, d->m.pair) == 0) {
|
2018-08-01 15:24:59 +00:00
|
|
|
d->m.pair[0] = newpair[0];
|
|
|
|
d->m.pair[1] = newpair[1];
|
|
|
|
}
|
2019-11-14 20:25:42 +00:00
|
|
|
|
|
|
|
if (d->type == LFS_TYPE_DIR &&
|
|
|
|
lfs_pair_cmp(oldpair, ((lfs_dir_t*)d)->head) == 0) {
|
|
|
|
((lfs_dir_t*)d)->head[0] = newpair[0];
|
|
|
|
((lfs_dir_t*)d)->head[1] = newpair[1];
|
|
|
|
}
|
2018-08-01 15:24:59 +00:00
|
|
|
}
|
|
|
|
|
2017-05-14 17:01:45 +00:00
|
|
|
// find parent
|
2018-05-29 06:11:26 +00:00
|
|
|
lfs_mdir_t parent;
|
2018-09-11 03:07:59 +00:00
|
|
|
lfs_stag_t tag = lfs_fs_parent(lfs, oldpair, &parent);
|
2018-07-13 01:43:55 +00:00
|
|
|
if (tag < 0 && tag != LFS_ERR_NOENT) {
|
|
|
|
return tag;
|
2017-05-14 17:01:45 +00:00
|
|
|
}
|
|
|
|
|
2018-07-13 01:43:55 +00:00
|
|
|
if (tag != LFS_ERR_NOENT) {
|
2017-05-14 17:01:45 +00:00
|
|
|
// update disk, this creates a desync
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_fs_preporphans(lfs, +1);
|
2018-08-13 14:03:13 +00:00
|
|
|
|
2020-01-20 23:35:45 +00:00
|
|
|
// fix pending move in this pair? this looks like an optimization but
|
|
|
|
// is in fact _required_ since relocating may outdate the move.
|
|
|
|
uint16_t moveid = 0x3ff;
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
if (lfs_gstate_hasmovehere(&lfs->gstate, parent.pair)) {
|
|
|
|
moveid = lfs_tag_id(lfs->gstate.tag);
|
2020-01-20 23:35:45 +00:00
|
|
|
LFS_DEBUG("Fixing move while relocating "
|
|
|
|
"%"PRIx32" %"PRIx32" %"PRIx16"\n",
|
|
|
|
parent.pair[0], parent.pair[1], moveid);
|
|
|
|
lfs_fs_prepmove(lfs, 0x3ff, NULL);
|
|
|
|
if (moveid < lfs_tag_id(tag)) {
|
|
|
|
tag -= LFS_MKTAG(0, 1, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-08-05 04:57:43 +00:00
|
|
|
lfs_pair_tole32(newpair);
|
2020-01-20 23:35:45 +00:00
|
|
|
int err = lfs_dir_commit(lfs, &parent, LFS_MKATTRS(
|
|
|
|
{LFS_MKTAG_IF(moveid != 0x3ff,
|
|
|
|
LFS_TYPE_DELETE, moveid, 0)},
|
|
|
|
{tag, newpair}));
|
2018-08-05 04:57:43 +00:00
|
|
|
lfs_pair_fromle32(newpair);
|
2017-05-14 17:01:45 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
2017-04-29 17:41:53 +00:00
|
|
|
}
|
2017-05-14 17:01:45 +00:00
|
|
|
|
2018-08-13 14:03:13 +00:00
|
|
|
// next step, clean up orphans
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_fs_preporphans(lfs, -1);
|
2017-05-14 17:01:45 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// find pred
|
2018-08-01 10:52:48 +00:00
|
|
|
int err = lfs_fs_pred(lfs, oldpair, &parent);
|
2018-07-13 01:43:55 +00:00
|
|
|
if (err && err != LFS_ERR_NOENT) {
|
|
|
|
return err;
|
2017-05-14 17:01:45 +00:00
|
|
|
}
|
|
|
|
|
2018-07-13 20:04:31 +00:00
|
|
|
// if we can't find dir, it must be new
|
2018-07-13 01:43:55 +00:00
|
|
|
if (err != LFS_ERR_NOENT) {
|
2020-01-20 23:35:45 +00:00
|
|
|
// fix pending move in this pair? this looks like an optimization but
|
|
|
|
// is in fact _required_ since relocating may outdate the move.
|
|
|
|
uint16_t moveid = 0x3ff;
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
if (lfs_gstate_hasmovehere(&lfs->gstate, parent.pair)) {
|
|
|
|
moveid = lfs_tag_id(lfs->gstate.tag);
|
2020-01-20 23:35:45 +00:00
|
|
|
LFS_DEBUG("Fixing move while relocating "
|
|
|
|
"%"PRIx32" %"PRIx32" %"PRIx16"\n",
|
|
|
|
parent.pair[0], parent.pair[1], moveid);
|
|
|
|
lfs_fs_prepmove(lfs, 0x3ff, NULL);
|
|
|
|
}
|
|
|
|
|
2018-08-13 14:03:13 +00:00
|
|
|
// replace bad pair, either we clean up desync, or no desync occured
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_pair_tole32(newpair);
|
2019-01-08 14:52:03 +00:00
|
|
|
err = lfs_dir_commit(lfs, &parent, LFS_MKATTRS(
|
2020-01-20 23:35:45 +00:00
|
|
|
{LFS_MKTAG_IF(moveid != 0x3ff,
|
|
|
|
LFS_TYPE_DELETE, moveid, 0)},
|
2019-01-08 14:52:03 +00:00
|
|
|
{LFS_MKTAG(LFS_TYPE_TAIL + parent.split, 0x3ff, 8), newpair}));
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_pair_fromle32(newpair);
|
2018-05-29 05:50:47 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-05-14 17:01:45 +00:00
|
|
|
return 0;
|
2017-04-14 22:33:36 +00:00
|
|
|
}
|
2017-04-01 17:23:15 +00:00
|
|
|
|
2019-01-04 23:23:36 +00:00
|
|
|
static void lfs_fs_preporphans(lfs_t *lfs, int8_t orphans) {
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
LFS_ASSERT(lfs_tag_size(lfs->gstate.tag) > 0 || orphans >= 0);
|
|
|
|
lfs->gstate.tag += orphans;
|
|
|
|
lfs->gstate.tag = ((lfs->gstate.tag & ~LFS_MKTAG(0x800, 0, 0)) |
|
|
|
|
((uint32_t)lfs_gstate_hasorphans(&lfs->gstate) << 31));
|
2019-01-04 23:23:36 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static void lfs_fs_prepmove(lfs_t *lfs,
|
|
|
|
uint16_t id, const lfs_block_t pair[2]) {
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs->gstate.tag = ((lfs->gstate.tag & ~LFS_MKTAG(0x7ff, 0x3ff, 0)) |
|
|
|
|
((id != 0x3ff) ? LFS_MKTAG(LFS_TYPE_DELETE, id, 0) : 0));
|
|
|
|
lfs->gstate.pair[0] = (id != 0x3ff) ? pair[0] : 0;
|
|
|
|
lfs->gstate.pair[1] = (id != 0x3ff) ? pair[1] : 0;
|
2019-01-04 23:23:36 +00:00
|
|
|
}
|
|
|
|
|
2018-09-15 03:02:39 +00:00
|
|
|
static int lfs_fs_demove(lfs_t *lfs) {
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
if (!lfs_gstate_hasmove(&lfs->gdisk)) {
|
2018-09-15 03:02:39 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Fix bad moves
|
2019-07-27 01:09:24 +00:00
|
|
|
LFS_DEBUG("Fixing move %"PRIx32" %"PRIx32" %"PRIx16,
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs->gdisk.pair[0],
|
|
|
|
lfs->gdisk.pair[1],
|
|
|
|
lfs_tag_id(lfs->gdisk.tag));
|
2018-09-15 03:02:39 +00:00
|
|
|
|
|
|
|
// fetch and delete the moved entry
|
|
|
|
lfs_mdir_t movedir;
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
int err = lfs_dir_fetch(lfs, &movedir, lfs->gdisk.pair);
|
2018-09-15 03:02:39 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2020-01-20 23:35:45 +00:00
|
|
|
// prep gstate and delete move id
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
uint16_t moveid = lfs_tag_id(lfs->gdisk.tag);
|
2020-01-20 23:35:45 +00:00
|
|
|
lfs_fs_prepmove(lfs, 0x3ff, NULL);
|
|
|
|
err = lfs_dir_commit(lfs, &movedir, LFS_MKATTRS(
|
|
|
|
{LFS_MKTAG(LFS_TYPE_DELETE, moveid, 0)}));
|
2018-09-15 03:02:39 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
static int lfs_fs_deorphan(lfs_t *lfs) {
|
2019-01-04 23:23:36 +00:00
|
|
|
if (!lfs_gstate_hasorphans(&lfs->gstate)) {
|
2018-09-15 03:02:39 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
// Fix any orphans
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
lfs_mdir_t pdir = {.split = true, .tail = {0, 1}};
|
|
|
|
lfs_mdir_t dir;
|
2018-08-01 10:52:48 +00:00
|
|
|
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
// iterate over all directory directory entries
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
while (!lfs_pair_isnull(pdir.tail)) {
|
|
|
|
int err = lfs_dir_fetch(lfs, &dir, pdir.tail);
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2018-05-21 05:56:20 +00:00
|
|
|
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
// check head blocks for orphans
|
|
|
|
if (!pdir.split) {
|
|
|
|
// check if we have a parent
|
|
|
|
lfs_mdir_t parent;
|
2018-09-11 03:07:59 +00:00
|
|
|
lfs_stag_t tag = lfs_fs_parent(lfs, pdir.tail, &parent);
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
if (tag < 0 && tag != LFS_ERR_NOENT) {
|
|
|
|
return tag;
|
2018-07-31 13:07:36 +00:00
|
|
|
}
|
2018-07-17 23:31:30 +00:00
|
|
|
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
if (tag == LFS_ERR_NOENT) {
|
|
|
|
// we are an orphan
|
2019-07-27 01:09:24 +00:00
|
|
|
LFS_DEBUG("Fixing orphan %"PRIx32" %"PRIx32,
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
pdir.tail[0], pdir.tail[1]);
|
|
|
|
|
2018-09-12 06:34:03 +00:00
|
|
|
err = lfs_dir_drop(lfs, &pdir, &dir);
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
2018-07-31 13:07:36 +00:00
|
|
|
}
|
2018-07-17 23:31:30 +00:00
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
// refetch tail
|
|
|
|
continue;
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
}
|
2018-07-02 03:29:42 +00:00
|
|
|
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
lfs_block_t pair[2];
|
Cleaned up tag encoding, now with clear chunk field
Before, the tag format's type field was limited to 9-bits. This sounds
like a lot, but this field needed to encode up to 256 user-specified
types. This limited the flexibility of the encoded types. As time went
on, more bits in the type field were repurposed for various things,
leaving a rather fragile type field.
Here we make the jump to full 11-bit type fields. This comes at the cost
of a smaller length field, however the use of the length field was
always going to come with a RAM limitation. Rather than putting pressure
on RAM for inline files, the new type field lets us encode a chunk
number, splitting up inline files into multiple updatable units. This
actually pushes the theoretical inline max from 8KiB to 256KiB! (Note
that we only allow a single 1KiB chunk for now, chunky inline files
is just a theoretical future improvement).
Here is the new 32-bit tag format, note that there are multiple levels
of types which break down into more info:
[---- 32 ----]
[1|-- 11 --|-- 10 --|-- 10 --]
^. ^ . ^ ^- entry length
|. | . \------------ file id chunk info
|. \-----.------------------ type info (type3)
\.-----------.------------------ valid bit
[-3-|-- 8 --]
^ ^- chunk info
\------- type info (type1)
Additionally, I've split the CREATE tag into separate SPLICE and NAME
tags. This simplified the new compact logic a bit. For now, littlefs
still follows the rule that a NAME tag precedes any other tags related
to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
|
|
|
lfs_stag_t res = lfs_dir_get(lfs, &parent,
|
|
|
|
LFS_MKTAG(0x7ff, 0x3ff, 0), tag, pair);
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
if (res < 0) {
|
|
|
|
return res;
|
|
|
|
}
|
|
|
|
lfs_pair_fromle32(pair);
|
2018-05-21 05:56:20 +00:00
|
|
|
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
if (!lfs_pair_sync(pair, pdir.tail)) {
|
|
|
|
// we have desynced
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
LFS_DEBUG("Fixing half-orphan "
|
|
|
|
"%"PRIx32" %"PRIx32" -> %"PRIx32" %"PRIx32,
|
|
|
|
pdir.tail[0], pdir.tail[1], pair[0], pair[1]);
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_pair_tole32(pair);
|
2019-01-08 14:52:03 +00:00
|
|
|
err = lfs_dir_commit(lfs, &pdir, LFS_MKATTRS(
|
|
|
|
{LFS_MKTAG(LFS_TYPE_SOFTTAIL, 0x3ff, 8), pair}));
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_pair_fromle32(pair);
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
2018-07-31 13:07:36 +00:00
|
|
|
}
|
2018-05-21 05:56:20 +00:00
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
// refetch tail
|
|
|
|
continue;
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
}
|
|
|
|
}
|
2018-05-21 05:56:20 +00:00
|
|
|
|
Added tests for power-cycled-relocations and fixed the bugs that fell out
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
|
|
|
pdir = dir;
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
}
|
2018-05-21 05:56:20 +00:00
|
|
|
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
// mark orphans as fixed
|
2019-01-04 23:23:36 +00:00
|
|
|
lfs_fs_preporphans(lfs, -lfs_gstate_getorphans(&lfs->gstate));
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
return 0;
|
|
|
|
}
|
2018-05-21 05:56:20 +00:00
|
|
|
|
2018-09-15 03:02:39 +00:00
|
|
|
static int lfs_fs_forceconsistency(lfs_t *lfs) {
|
|
|
|
int err = lfs_fs_demove(lfs);
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
2018-07-31 13:07:36 +00:00
|
|
|
}
|
2018-05-21 05:56:20 +00:00
|
|
|
|
2018-09-15 03:02:39 +00:00
|
|
|
err = lfs_fs_deorphan(lfs);
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
2018-05-21 05:56:20 +00:00
|
|
|
|
Added building blocks for dynamic wear-leveling
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-07-30 19:40:27 +00:00
|
|
|
static int lfs_fs_size_count(void *p, lfs_block_t block) {
|
2018-08-05 01:33:09 +00:00
|
|
|
(void)block;
|
2018-07-30 14:10:04 +00:00
|
|
|
lfs_size_t *size = p;
|
|
|
|
*size += 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
lfs_ssize_t lfs_fs_size(lfs_t *lfs) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_fs_size(%p)", (void*)lfs);
|
2018-07-30 14:10:04 +00:00
|
|
|
lfs_size_t size = 0;
|
2020-02-09 15:05:37 +00:00
|
|
|
int err = lfs_fs_traverseraw(lfs, lfs_fs_size_count, &size, false);
|
2018-07-30 14:10:04 +00:00
|
|
|
if (err) {
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_fs_size -> %d", err);
|
2018-07-30 14:10:04 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-11-22 04:29:57 +00:00
|
|
|
LFS_TRACE("lfs_fs_size -> %d", err);
|
2019-05-31 09:40:19 +00:00
|
|
|
return size;
|
2018-07-30 14:10:04 +00:00
|
|
|
}
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
|
|
|
|
#ifdef LFS_MIGRATE
|
|
|
|
////// Migration from littelfs v1 below this //////
|
|
|
|
|
|
|
|
/// Version info ///
|
|
|
|
|
|
|
|
// Software library version
|
|
|
|
// Major (top-nibble), incremented on backwards incompatible changes
|
|
|
|
// Minor (bottom-nibble), incremented on feature additions
|
|
|
|
#define LFS1_VERSION 0x00010007
|
|
|
|
#define LFS1_VERSION_MAJOR (0xffff & (LFS1_VERSION >> 16))
|
|
|
|
#define LFS1_VERSION_MINOR (0xffff & (LFS1_VERSION >> 0))
|
|
|
|
|
|
|
|
// Version of On-disk data structures
|
|
|
|
// Major (top-nibble), incremented on backwards incompatible changes
|
|
|
|
// Minor (bottom-nibble), incremented on feature additions
|
|
|
|
#define LFS1_DISK_VERSION 0x00010001
|
|
|
|
#define LFS1_DISK_VERSION_MAJOR (0xffff & (LFS1_DISK_VERSION >> 16))
|
|
|
|
#define LFS1_DISK_VERSION_MINOR (0xffff & (LFS1_DISK_VERSION >> 0))
|
|
|
|
|
|
|
|
|
|
|
|
/// v1 Definitions ///
|
|
|
|
|
|
|
|
// File types
|
|
|
|
enum lfs1_type {
|
|
|
|
LFS1_TYPE_REG = 0x11,
|
|
|
|
LFS1_TYPE_DIR = 0x22,
|
|
|
|
LFS1_TYPE_SUPERBLOCK = 0x2e,
|
|
|
|
};
|
|
|
|
|
|
|
|
typedef struct lfs1 {
|
|
|
|
lfs_block_t root[2];
|
|
|
|
} lfs1_t;
|
|
|
|
|
|
|
|
typedef struct lfs1_entry {
|
|
|
|
lfs_off_t off;
|
|
|
|
|
|
|
|
struct lfs1_disk_entry {
|
|
|
|
uint8_t type;
|
|
|
|
uint8_t elen;
|
|
|
|
uint8_t alen;
|
|
|
|
uint8_t nlen;
|
|
|
|
union {
|
|
|
|
struct {
|
|
|
|
lfs_block_t head;
|
|
|
|
lfs_size_t size;
|
|
|
|
} file;
|
|
|
|
lfs_block_t dir[2];
|
|
|
|
} u;
|
|
|
|
} d;
|
|
|
|
} lfs1_entry_t;
|
|
|
|
|
|
|
|
typedef struct lfs1_dir {
|
|
|
|
struct lfs1_dir *next;
|
|
|
|
lfs_block_t pair[2];
|
|
|
|
lfs_off_t off;
|
|
|
|
|
|
|
|
lfs_block_t head[2];
|
|
|
|
lfs_off_t pos;
|
|
|
|
|
|
|
|
struct lfs1_disk_dir {
|
|
|
|
uint32_t rev;
|
|
|
|
lfs_size_t size;
|
|
|
|
lfs_block_t tail[2];
|
|
|
|
} d;
|
|
|
|
} lfs1_dir_t;
|
|
|
|
|
|
|
|
typedef struct lfs1_superblock {
|
|
|
|
lfs_off_t off;
|
|
|
|
|
|
|
|
struct lfs1_disk_superblock {
|
|
|
|
uint8_t type;
|
|
|
|
uint8_t elen;
|
|
|
|
uint8_t alen;
|
|
|
|
uint8_t nlen;
|
|
|
|
lfs_block_t root[2];
|
|
|
|
uint32_t block_size;
|
|
|
|
uint32_t block_count;
|
|
|
|
uint32_t version;
|
|
|
|
char magic[8];
|
|
|
|
} d;
|
|
|
|
} lfs1_superblock_t;
|
|
|
|
|
|
|
|
|
|
|
|
/// Low-level wrappers v1->v2 ///
|
2019-05-16 16:51:22 +00:00
|
|
|
static void lfs1_crc(uint32_t *crc, const void *buffer, size_t size) {
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
*crc = lfs_crc(*crc, buffer, size);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs1_bd_read(lfs_t *lfs, lfs_block_t block,
|
|
|
|
lfs_off_t off, void *buffer, lfs_size_t size) {
|
|
|
|
// if we ever do more than writes to alternating pairs,
|
|
|
|
// this may need to consider pcache
|
|
|
|
return lfs_bd_read(lfs, &lfs->pcache, &lfs->rcache, size,
|
|
|
|
block, off, buffer, size);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs1_bd_crc(lfs_t *lfs, lfs_block_t block,
|
|
|
|
lfs_off_t off, lfs_size_t size, uint32_t *crc) {
|
|
|
|
for (lfs_off_t i = 0; i < size; i++) {
|
|
|
|
uint8_t c;
|
|
|
|
int err = lfs1_bd_read(lfs, block, off+i, &c, 1);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
lfs1_crc(crc, &c, 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
/// Endian swapping functions ///
|
|
|
|
static void lfs1_dir_fromle32(struct lfs1_disk_dir *d) {
|
|
|
|
d->rev = lfs_fromle32(d->rev);
|
|
|
|
d->size = lfs_fromle32(d->size);
|
|
|
|
d->tail[0] = lfs_fromle32(d->tail[0]);
|
|
|
|
d->tail[1] = lfs_fromle32(d->tail[1]);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void lfs1_dir_tole32(struct lfs1_disk_dir *d) {
|
|
|
|
d->rev = lfs_tole32(d->rev);
|
|
|
|
d->size = lfs_tole32(d->size);
|
|
|
|
d->tail[0] = lfs_tole32(d->tail[0]);
|
|
|
|
d->tail[1] = lfs_tole32(d->tail[1]);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void lfs1_entry_fromle32(struct lfs1_disk_entry *d) {
|
|
|
|
d->u.dir[0] = lfs_fromle32(d->u.dir[0]);
|
|
|
|
d->u.dir[1] = lfs_fromle32(d->u.dir[1]);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void lfs1_entry_tole32(struct lfs1_disk_entry *d) {
|
|
|
|
d->u.dir[0] = lfs_tole32(d->u.dir[0]);
|
|
|
|
d->u.dir[1] = lfs_tole32(d->u.dir[1]);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void lfs1_superblock_fromle32(struct lfs1_disk_superblock *d) {
|
|
|
|
d->root[0] = lfs_fromle32(d->root[0]);
|
|
|
|
d->root[1] = lfs_fromle32(d->root[1]);
|
|
|
|
d->block_size = lfs_fromle32(d->block_size);
|
|
|
|
d->block_count = lfs_fromle32(d->block_count);
|
|
|
|
d->version = lfs_fromle32(d->version);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
///// Metadata pair and directory operations ///
|
|
|
|
static inline lfs_size_t lfs1_entry_size(const lfs1_entry_t *entry) {
|
|
|
|
return 4 + entry->d.elen + entry->d.alen + entry->d.nlen;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs1_dir_fetch(lfs_t *lfs,
|
|
|
|
lfs1_dir_t *dir, const lfs_block_t pair[2]) {
|
|
|
|
// copy out pair, otherwise may be aliasing dir
|
|
|
|
const lfs_block_t tpair[2] = {pair[0], pair[1]};
|
|
|
|
bool valid = false;
|
|
|
|
|
|
|
|
// check both blocks for the most recent revision
|
|
|
|
for (int i = 0; i < 2; i++) {
|
|
|
|
struct lfs1_disk_dir test;
|
|
|
|
int err = lfs1_bd_read(lfs, tpair[i], 0, &test, sizeof(test));
|
|
|
|
lfs1_dir_fromle32(&test);
|
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (valid && lfs_scmp(test.rev, dir->d.rev) < 0) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((0x7fffffff & test.size) < sizeof(test)+4 ||
|
|
|
|
(0x7fffffff & test.size) > lfs->cfg->block_size) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2020-02-10 04:43:20 +00:00
|
|
|
uint32_t crc = 0xffffffff;
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
lfs1_dir_tole32(&test);
|
|
|
|
lfs1_crc(&crc, &test, sizeof(test));
|
|
|
|
lfs1_dir_fromle32(&test);
|
|
|
|
err = lfs1_bd_crc(lfs, tpair[i], sizeof(test),
|
|
|
|
(0x7fffffff & test.size) - sizeof(test), &crc);
|
|
|
|
if (err) {
|
|
|
|
if (err == LFS_ERR_CORRUPT) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (crc != 0) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
valid = true;
|
|
|
|
|
|
|
|
// setup dir in case it's valid
|
|
|
|
dir->pair[0] = tpair[(i+0) % 2];
|
|
|
|
dir->pair[1] = tpair[(i+1) % 2];
|
|
|
|
dir->off = sizeof(dir->d);
|
|
|
|
dir->d = test;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!valid) {
|
2019-07-27 01:09:24 +00:00
|
|
|
LFS_ERROR("Corrupted dir pair at %" PRIx32 " %" PRIx32 ,
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
tpair[0], tpair[1]);
|
|
|
|
return LFS_ERR_CORRUPT;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs1_dir_next(lfs_t *lfs, lfs1_dir_t *dir, lfs1_entry_t *entry) {
|
|
|
|
while (dir->off + sizeof(entry->d) > (0x7fffffff & dir->d.size)-4) {
|
|
|
|
if (!(0x80000000 & dir->d.size)) {
|
|
|
|
entry->off = dir->off;
|
|
|
|
return LFS_ERR_NOENT;
|
|
|
|
}
|
|
|
|
|
|
|
|
int err = lfs1_dir_fetch(lfs, dir, dir->d.tail);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
dir->off = sizeof(dir->d);
|
|
|
|
dir->pos += sizeof(dir->d) + 4;
|
|
|
|
}
|
|
|
|
|
|
|
|
int err = lfs1_bd_read(lfs, dir->pair[0], dir->off,
|
|
|
|
&entry->d, sizeof(entry->d));
|
|
|
|
lfs1_entry_fromle32(&entry->d);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
entry->off = dir->off;
|
|
|
|
dir->off += lfs1_entry_size(entry);
|
|
|
|
dir->pos += lfs1_entry_size(entry);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/// littlefs v1 specific operations ///
|
|
|
|
int lfs1_traverse(lfs_t *lfs, int (*cb)(void*, lfs_block_t), void *data) {
|
|
|
|
if (lfs_pair_isnull(lfs->lfs1->root)) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
// iterate over metadata pairs
|
|
|
|
lfs1_dir_t dir;
|
|
|
|
lfs1_entry_t entry;
|
|
|
|
lfs_block_t cwd[2] = {0, 1};
|
|
|
|
|
|
|
|
while (true) {
|
|
|
|
for (int i = 0; i < 2; i++) {
|
|
|
|
int err = cb(data, cwd[i]);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
int err = lfs1_dir_fetch(lfs, &dir, cwd);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
// iterate over contents
|
|
|
|
while (dir.off + sizeof(entry.d) <= (0x7fffffff & dir.d.size)-4) {
|
|
|
|
err = lfs1_bd_read(lfs, dir.pair[0], dir.off,
|
|
|
|
&entry.d, sizeof(entry.d));
|
|
|
|
lfs1_entry_fromle32(&entry.d);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
dir.off += lfs1_entry_size(&entry);
|
|
|
|
if ((0x70 & entry.d.type) == (0x70 & LFS1_TYPE_REG)) {
|
|
|
|
err = lfs_ctz_traverse(lfs, NULL, &lfs->rcache,
|
|
|
|
entry.d.u.file.head, entry.d.u.file.size, cb, data);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// we also need to check if we contain a threaded v2 directory
|
|
|
|
lfs_mdir_t dir2 = {.split=true, .tail={cwd[0], cwd[1]}};
|
|
|
|
while (dir2.split) {
|
|
|
|
err = lfs_dir_fetch(lfs, &dir2, dir2.tail);
|
|
|
|
if (err) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (int i = 0; i < 2; i++) {
|
|
|
|
err = cb(data, dir2.pair[i]);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
cwd[0] = dir.d.tail[0];
|
|
|
|
cwd[1] = dir.d.tail[1];
|
|
|
|
|
|
|
|
if (lfs_pair_isnull(cwd)) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs1_moved(lfs_t *lfs, const void *e) {
|
|
|
|
if (lfs_pair_isnull(lfs->lfs1->root)) {
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
// skip superblock
|
|
|
|
lfs1_dir_t cwd;
|
|
|
|
int err = lfs1_dir_fetch(lfs, &cwd, (const lfs_block_t[2]){0, 1});
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
// iterate over all directory directory entries
|
|
|
|
lfs1_entry_t entry;
|
|
|
|
while (!lfs_pair_isnull(cwd.d.tail)) {
|
|
|
|
err = lfs1_dir_fetch(lfs, &cwd, cwd.d.tail);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (true) {
|
|
|
|
err = lfs1_dir_next(lfs, &cwd, &entry);
|
|
|
|
if (err && err != LFS_ERR_NOENT) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (err == LFS_ERR_NOENT) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!(0x80 & entry.d.type) &&
|
|
|
|
memcmp(&entry.d.u, e, sizeof(entry.d.u)) == 0) {
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Filesystem operations ///
|
|
|
|
static int lfs1_mount(lfs_t *lfs, struct lfs1 *lfs1,
|
|
|
|
const struct lfs_config *cfg) {
|
|
|
|
int err = 0;
|
2019-04-09 23:37:53 +00:00
|
|
|
{
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
err = lfs_init(lfs, cfg);
|
|
|
|
if (err) {
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
lfs->lfs1 = lfs1;
|
2019-08-03 14:17:47 +00:00
|
|
|
lfs->lfs1->root[0] = LFS_BLOCK_NULL;
|
|
|
|
lfs->lfs1->root[1] = LFS_BLOCK_NULL;
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
|
|
|
|
// setup free lookahead
|
|
|
|
lfs->free.off = 0;
|
|
|
|
lfs->free.size = 0;
|
|
|
|
lfs->free.i = 0;
|
|
|
|
lfs_alloc_ack(lfs);
|
|
|
|
|
|
|
|
// load superblock
|
|
|
|
lfs1_dir_t dir;
|
|
|
|
lfs1_superblock_t superblock;
|
|
|
|
err = lfs1_dir_fetch(lfs, &dir, (const lfs_block_t[2]){0, 1});
|
|
|
|
if (err && err != LFS_ERR_CORRUPT) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!err) {
|
|
|
|
err = lfs1_bd_read(lfs, dir.pair[0], sizeof(dir.d),
|
|
|
|
&superblock.d, sizeof(superblock.d));
|
|
|
|
lfs1_superblock_fromle32(&superblock.d);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
lfs->lfs1->root[0] = superblock.d.root[0];
|
|
|
|
lfs->lfs1->root[1] = superblock.d.root[1];
|
|
|
|
}
|
|
|
|
|
|
|
|
if (err || memcmp(superblock.d.magic, "littlefs", 8) != 0) {
|
|
|
|
LFS_ERROR("Invalid superblock at %d %d", 0, 1);
|
|
|
|
err = LFS_ERR_CORRUPT;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
uint16_t major_version = (0xffff & (superblock.d.version >> 16));
|
|
|
|
uint16_t minor_version = (0xffff & (superblock.d.version >> 0));
|
|
|
|
if ((major_version != LFS1_DISK_VERSION_MAJOR ||
|
|
|
|
minor_version > LFS1_DISK_VERSION_MINOR)) {
|
|
|
|
LFS_ERROR("Invalid version %d.%d", major_version, minor_version);
|
|
|
|
err = LFS_ERR_INVAL;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
cleanup:
|
|
|
|
lfs_deinit(lfs);
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int lfs1_unmount(lfs_t *lfs) {
|
|
|
|
return lfs_deinit(lfs);
|
|
|
|
}
|
|
|
|
|
|
|
|
/// v1 migration ///
|
|
|
|
int lfs_migrate(lfs_t *lfs, const struct lfs_config *cfg) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_migrate(%p, %p {.context=%p, "
|
|
|
|
".read=%p, .prog=%p, .erase=%p, .sync=%p, "
|
|
|
|
".read_size=%"PRIu32", .prog_size=%"PRIu32", "
|
|
|
|
".block_size=%"PRIu32", .block_count=%"PRIu32", "
|
|
|
|
".block_cycles=%"PRIu32", .cache_size=%"PRIu32", "
|
|
|
|
".lookahead_size=%"PRIu32", .read_buffer=%p, "
|
|
|
|
".prog_buffer=%p, .lookahead_buffer=%p, "
|
|
|
|
".name_max=%"PRIu32", .file_max=%"PRIu32", "
|
|
|
|
".attr_max=%"PRIu32"})",
|
|
|
|
(void*)lfs, (void*)cfg, cfg->context,
|
|
|
|
(void*)(uintptr_t)cfg->read, (void*)(uintptr_t)cfg->prog,
|
|
|
|
(void*)(uintptr_t)cfg->erase, (void*)(uintptr_t)cfg->sync,
|
|
|
|
cfg->read_size, cfg->prog_size, cfg->block_size, cfg->block_count,
|
|
|
|
cfg->block_cycles, cfg->cache_size, cfg->lookahead_size,
|
|
|
|
cfg->read_buffer, cfg->prog_buffer, cfg->lookahead_buffer,
|
|
|
|
cfg->name_max, cfg->file_max, cfg->attr_max);
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
struct lfs1 lfs1;
|
|
|
|
int err = lfs1_mount(lfs, &lfs1, cfg);
|
|
|
|
if (err) {
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_migrate -> %d", err);
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2019-04-09 23:37:53 +00:00
|
|
|
{
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
// iterate through each directory, copying over entries
|
|
|
|
// into new directory
|
|
|
|
lfs1_dir_t dir1;
|
|
|
|
lfs_mdir_t dir2;
|
|
|
|
dir1.d.tail[0] = lfs->lfs1->root[0];
|
|
|
|
dir1.d.tail[1] = lfs->lfs1->root[1];
|
|
|
|
while (!lfs_pair_isnull(dir1.d.tail)) {
|
|
|
|
// iterate old dir
|
|
|
|
err = lfs1_dir_fetch(lfs, &dir1, dir1.d.tail);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
// create new dir and bind as temporary pretend root
|
|
|
|
err = lfs_dir_alloc(lfs, &dir2);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
dir2.rev = dir1.d.rev;
|
2019-04-02 03:12:08 +00:00
|
|
|
dir1.head[0] = dir1.pair[0];
|
|
|
|
dir1.head[1] = dir1.pair[1];
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
lfs->root[0] = dir2.pair[0];
|
|
|
|
lfs->root[1] = dir2.pair[1];
|
|
|
|
|
|
|
|
err = lfs_dir_commit(lfs, &dir2, NULL, 0);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (true) {
|
|
|
|
lfs1_entry_t entry1;
|
|
|
|
err = lfs1_dir_next(lfs, &dir1, &entry1);
|
|
|
|
if (err && err != LFS_ERR_NOENT) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (err == LFS_ERR_NOENT) {
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
// check that entry has not been moved
|
|
|
|
if (entry1.d.type & 0x80) {
|
|
|
|
int moved = lfs1_moved(lfs, &entry1.d.u);
|
|
|
|
if (moved < 0) {
|
|
|
|
err = moved;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (moved) {
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
entry1.d.type &= ~0x80;
|
|
|
|
}
|
2019-07-21 09:34:53 +00:00
|
|
|
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
// also fetch name
|
|
|
|
char name[LFS_NAME_MAX+1];
|
|
|
|
memset(name, 0, sizeof(name));
|
|
|
|
err = lfs1_bd_read(lfs, dir1.pair[0],
|
|
|
|
entry1.off + 4+entry1.d.elen+entry1.d.alen,
|
|
|
|
name, entry1.d.nlen);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
bool isdir = (entry1.d.type == LFS1_TYPE_DIR);
|
|
|
|
|
|
|
|
// create entry in new dir
|
|
|
|
err = lfs_dir_fetch(lfs, &dir2, lfs->root);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
uint16_t id;
|
|
|
|
err = lfs_dir_find(lfs, &dir2, &(const char*){name}, &id);
|
|
|
|
if (!(err == LFS_ERR_NOENT && id != 0x3ff)) {
|
|
|
|
err = (err < 0) ? err : LFS_ERR_EXIST;
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
lfs1_entry_tole32(&entry1.d);
|
|
|
|
err = lfs_dir_commit(lfs, &dir2, LFS_MKATTRS(
|
2020-01-20 23:35:45 +00:00
|
|
|
{LFS_MKTAG(LFS_TYPE_CREATE, id, 0)},
|
|
|
|
{LFS_MKTAG_IF_ELSE(isdir,
|
|
|
|
LFS_TYPE_DIR, id, entry1.d.nlen,
|
|
|
|
LFS_TYPE_REG, id, entry1.d.nlen),
|
|
|
|
name},
|
|
|
|
{LFS_MKTAG_IF_ELSE(isdir,
|
|
|
|
LFS_TYPE_DIRSTRUCT, id, sizeof(entry1.d.u),
|
|
|
|
LFS_TYPE_CTZSTRUCT, id, sizeof(entry1.d.u)),
|
|
|
|
&entry1.d.u}));
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
lfs1_entry_fromle32(&entry1.d);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!lfs_pair_isnull(dir1.d.tail)) {
|
|
|
|
// find last block and update tail to thread into fs
|
|
|
|
err = lfs_dir_fetch(lfs, &dir2, lfs->root);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (dir2.split) {
|
|
|
|
err = lfs_dir_fetch(lfs, &dir2, dir2.tail);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
lfs_pair_tole32(dir2.pair);
|
|
|
|
err = lfs_dir_commit(lfs, &dir2, LFS_MKATTRS(
|
2020-01-20 23:35:45 +00:00
|
|
|
{LFS_MKTAG(LFS_TYPE_SOFTTAIL, 0x3ff, 8), dir1.d.tail}));
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
lfs_pair_fromle32(dir2.pair);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Copy over first block to thread into fs. Unfortunately
|
|
|
|
// if this fails there is not much we can do.
|
2019-07-27 01:09:24 +00:00
|
|
|
LFS_DEBUG("Migrating %"PRIx32" %"PRIx32" -> %"PRIx32" %"PRIx32,
|
2019-04-02 03:12:08 +00:00
|
|
|
lfs->root[0], lfs->root[1], dir1.head[0], dir1.head[1]);
|
|
|
|
|
|
|
|
err = lfs_bd_erase(lfs, dir1.head[1]);
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = lfs_dir_fetch(lfs, &dir2, lfs->root);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (lfs_off_t i = 0; i < dir2.off; i++) {
|
|
|
|
uint8_t dat;
|
|
|
|
err = lfs_bd_read(lfs,
|
|
|
|
NULL, &lfs->rcache, dir2.off,
|
|
|
|
dir2.pair[0], i, &dat, 1);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
err = lfs_bd_prog(lfs,
|
|
|
|
&lfs->pcache, &lfs->rcache, true,
|
2019-04-02 03:12:08 +00:00
|
|
|
dir1.head[1], i, &dat, 1);
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
}
|
2019-05-28 16:35:22 +00:00
|
|
|
|
|
|
|
err = lfs_bd_flush(lfs, &lfs->pcache, &lfs->rcache, true);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
// Create new superblock. This marks a successful migration!
|
|
|
|
err = lfs1_dir_fetch(lfs, &dir1, (const lfs_block_t[2]){0, 1});
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
dir2.pair[0] = dir1.pair[0];
|
|
|
|
dir2.pair[1] = dir1.pair[1];
|
|
|
|
dir2.rev = dir1.d.rev;
|
|
|
|
dir2.off = sizeof(dir2.rev);
|
2020-02-10 04:43:20 +00:00
|
|
|
dir2.etag = 0xffffffff;
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
dir2.count = 0;
|
|
|
|
dir2.tail[0] = lfs->lfs1->root[0];
|
|
|
|
dir2.tail[1] = lfs->lfs1->root[1];
|
|
|
|
dir2.erased = false;
|
|
|
|
dir2.split = true;
|
|
|
|
|
|
|
|
lfs_superblock_t superblock = {
|
|
|
|
.version = LFS_DISK_VERSION,
|
|
|
|
.block_size = lfs->cfg->block_size,
|
|
|
|
.block_count = lfs->cfg->block_count,
|
|
|
|
.name_max = lfs->name_max,
|
|
|
|
.file_max = lfs->file_max,
|
|
|
|
.attr_max = lfs->attr_max,
|
|
|
|
};
|
|
|
|
|
|
|
|
lfs_superblock_tole32(&superblock);
|
|
|
|
err = lfs_dir_commit(lfs, &dir2, LFS_MKATTRS(
|
2020-01-20 23:35:45 +00:00
|
|
|
{LFS_MKTAG(LFS_TYPE_CREATE, 0, 0)},
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
{LFS_MKTAG(LFS_TYPE_SUPERBLOCK, 0, 8), "littlefs"},
|
|
|
|
{LFS_MKTAG(LFS_TYPE_INLINESTRUCT, 0, sizeof(superblock)),
|
|
|
|
&superblock}));
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
|
|
|
|
|
|
|
// sanity check that fetch works
|
|
|
|
err = lfs_dir_fetch(lfs, &dir2, (const lfs_block_t[2]){0, 1});
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
2019-05-28 23:16:51 +00:00
|
|
|
|
|
|
|
// force compaction to prevent accidentally mounting v1
|
|
|
|
dir2.erased = false;
|
|
|
|
err = lfs_dir_commit(lfs, &dir2, NULL, 0);
|
|
|
|
if (err) {
|
|
|
|
goto cleanup;
|
|
|
|
}
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
cleanup:
|
|
|
|
lfs1_unmount(lfs);
|
2019-05-31 09:40:19 +00:00
|
|
|
LFS_TRACE("lfs_migrate -> %d", err);
|
Added migration from littlefs v1
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
2019-02-23 03:34:03 +00:00
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif
|