third_party_littlefs/lfs.c

4927 lines
149 KiB
C
Raw Normal View History

/*
* The little filesystem
*
* Copyright (c) 2017, Arm Limited. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*/
#include "lfs.h"
#include "lfs_util.h"
#define LFS_BLOCK_NULL ((lfs_block_t)-1)
#define LFS_BLOCK_INLINE ((lfs_block_t)-2)
/// Caching block device operations ///
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
static inline void lfs_cache_drop(lfs_t *lfs, lfs_cache_t *rcache) {
// do not zero, cheaper if cache is readonly or only going to be
// written with identical data (during relocates)
(void)lfs;
rcache->block = LFS_BLOCK_NULL;
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
}
static inline void lfs_cache_zero(lfs_t *lfs, lfs_cache_t *pcache) {
// zero to avoid information leak
memset(pcache->buffer, 0xff, lfs->cfg->cache_size);
pcache->block = LFS_BLOCK_NULL;
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
}
static int lfs_bd_read(lfs_t *lfs,
const lfs_cache_t *pcache, lfs_cache_t *rcache, lfs_size_t hint,
lfs_block_t block, lfs_off_t off,
void *buffer, lfs_size_t size) {
uint8_t *data = buffer;
if (block >= lfs->cfg->block_count ||
off+size > lfs->cfg->block_size) {
Modified lfs_dir_compact to avoid redundant erases during split The commit machine in littlefs has three stages: commit, compact, and then split. First we try to append our commit to the metadata log, if that fails we try to compact the metadata log to remove duplicates and make room for the commit, if that still fails we split the metadata into two metadata-pairs and try again. Each stage is less efficient but also less frequent. However, in the case that we're filling up a directory with new files, such as the bootstrap process in setting up a new system, we must pass through all three stages rather quickly in order to get enough metadata-pairs to hold all of our files. This means we'll compact, split, and then need to compact again. This creates more erases than is needed in the optimal case, which can be a big cost on disks with an expensive erase operation. In theory, we can actually avoid this redundant erase by reusing the data we wrote out in the first attempt to compact. In practice, this trick is very complicated to pull off. 1. We may need to cache a half-completed program while we write out the new metadata-pair. We need to write out the second pair first in order to get our new tail before we complete our first metadata-pair. This requires two pcaches, which we don't have The solution here is to just drop our cache and reconstruct what if would have been. This needs to be perfect down to the byte level because we don't have knowledge of where our cache lines are. 2. We may have written out entries that are then moved to the new metadata-pair. The solution here isn't pretty but it works, we just add a delete tag for any entry that was moved over. In the end the solution ends up a bit hacky, with different layers poked through the commit logic in order to manage writes at the byte level from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
return LFS_ERR_CORRUPT;
}
while (size > 0) {
lfs_size_t diff = size;
if (pcache && block == pcache->block &&
off < pcache->off + pcache->size) {
if (off >= pcache->off) {
// is already in pcache?
diff = lfs_min(diff, pcache->size - (off-pcache->off));
memcpy(data, &pcache->buffer[off-pcache->off], diff);
data += diff;
off += diff;
size -= diff;
continue;
}
// pcache takes priority
diff = lfs_min(diff, pcache->off-off);
}
if (block == rcache->block &&
off < rcache->off + rcache->size) {
if (off >= rcache->off) {
// is already in rcache?
diff = lfs_min(diff, rcache->size - (off-rcache->off));
memcpy(data, &rcache->buffer[off-rcache->off], diff);
data += diff;
off += diff;
size -= diff;
continue;
}
// rcache takes priority
diff = lfs_min(diff, rcache->off-off);
}
if (size >= hint && off % lfs->cfg->read_size == 0 &&
size >= lfs->cfg->read_size) {
// bypass cache?
diff = lfs_aligndown(diff, lfs->cfg->read_size);
int err = lfs->cfg->read(lfs->cfg, block, off, data, diff);
if (err) {
return err;
}
data += diff;
off += diff;
size -= diff;
continue;
}
// load to cache, first condition can no longer fail
LFS_ASSERT(block < lfs->cfg->block_count);
rcache->block = block;
rcache->off = lfs_aligndown(off, lfs->cfg->read_size);
rcache->size = lfs_min(
lfs_min(
lfs_alignup(off+hint, lfs->cfg->read_size),
lfs->cfg->block_size)
- rcache->off,
lfs->cfg->cache_size);
int err = lfs->cfg->read(lfs->cfg, rcache->block,
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
rcache->off, rcache->buffer, rcache->size);
LFS_ASSERT(err <= 0);
if (err) {
return err;
}
}
return 0;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
enum {
LFS_CMP_EQ = 0,
LFS_CMP_LT = 1,
LFS_CMP_GT = 2,
};
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
static int lfs_bd_cmp(lfs_t *lfs,
const lfs_cache_t *pcache, lfs_cache_t *rcache, lfs_size_t hint,
lfs_block_t block, lfs_off_t off,
const void *buffer, lfs_size_t size) {
const uint8_t *data = buffer;
for (lfs_off_t i = 0; i < size; i++) {
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
uint8_t dat;
int err = lfs_bd_read(lfs,
pcache, rcache, hint-i,
block, off+i, &dat, 1);
if (err) {
return err;
}
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
if (dat != data[i]) {
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
return (dat < data[i]) ? LFS_CMP_LT : LFS_CMP_GT;
}
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
return LFS_CMP_EQ;
}
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
static int lfs_bd_flush(lfs_t *lfs,
lfs_cache_t *pcache, lfs_cache_t *rcache, bool validate) {
if (pcache->block != LFS_BLOCK_NULL && pcache->block != LFS_BLOCK_INLINE) {
LFS_ASSERT(pcache->block < lfs->cfg->block_count);
lfs_size_t diff = lfs_alignup(pcache->size, lfs->cfg->prog_size);
int err = lfs->cfg->prog(lfs->cfg, pcache->block,
pcache->off, pcache->buffer, diff);
LFS_ASSERT(err <= 0);
if (err) {
return err;
}
if (validate) {
// check data on disk
lfs_cache_drop(lfs, rcache);
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
int res = lfs_bd_cmp(lfs,
NULL, rcache, diff,
pcache->block, pcache->off, pcache->buffer, diff);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (res < 0) {
return res;
}
if (res != LFS_CMP_EQ) {
return LFS_ERR_CORRUPT;
}
}
lfs_cache_zero(lfs, pcache);
}
return 0;
}
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
static int lfs_bd_sync(lfs_t *lfs,
lfs_cache_t *pcache, lfs_cache_t *rcache, bool validate) {
lfs_cache_drop(lfs, rcache);
int err = lfs_bd_flush(lfs, pcache, rcache, validate);
if (err) {
return err;
}
err = lfs->cfg->sync(lfs->cfg);
LFS_ASSERT(err <= 0);
return err;
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
}
static int lfs_bd_prog(lfs_t *lfs,
lfs_cache_t *pcache, lfs_cache_t *rcache, bool validate,
lfs_block_t block, lfs_off_t off,
const void *buffer, lfs_size_t size) {
const uint8_t *data = buffer;
LFS_ASSERT(block == LFS_BLOCK_INLINE || block < lfs->cfg->block_count);
LFS_ASSERT(off + size <= lfs->cfg->block_size);
while (size > 0) {
if (block == pcache->block &&
off >= pcache->off &&
off < pcache->off + lfs->cfg->cache_size) {
// already fits in pcache?
lfs_size_t diff = lfs_min(size,
lfs->cfg->cache_size - (off-pcache->off));
memcpy(&pcache->buffer[off-pcache->off], data, diff);
data += diff;
off += diff;
size -= diff;
pcache->size = lfs_max(pcache->size, off - pcache->off);
if (pcache->size == lfs->cfg->cache_size) {
// eagerly flush out pcache if we fill up
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
int err = lfs_bd_flush(lfs, pcache, rcache, validate);
if (err) {
return err;
}
}
continue;
}
// pcache must have been flushed, either by programming and
// entire block or manually flushing the pcache
LFS_ASSERT(pcache->block == LFS_BLOCK_NULL);
// prepare pcache, first condition can no longer fail
pcache->block = block;
pcache->off = lfs_aligndown(off, lfs->cfg->prog_size);
pcache->size = 0;
}
return 0;
}
static int lfs_bd_erase(lfs_t *lfs, lfs_block_t block) {
LFS_ASSERT(block < lfs->cfg->block_count);
int err = lfs->cfg->erase(lfs->cfg, block);
LFS_ASSERT(err <= 0);
return err;
}
/// Small type-level utilities ///
// operations on block pairs
static inline void lfs_pair_swap(lfs_block_t pair[2]) {
lfs_block_t t = pair[0];
pair[0] = pair[1];
pair[1] = t;
}
static inline bool lfs_pair_isnull(const lfs_block_t pair[2]) {
return pair[0] == LFS_BLOCK_NULL || pair[1] == LFS_BLOCK_NULL;
}
static inline int lfs_pair_cmp(
const lfs_block_t paira[2],
const lfs_block_t pairb[2]) {
return !(paira[0] == pairb[0] || paira[1] == pairb[1] ||
paira[0] == pairb[1] || paira[1] == pairb[0]);
}
static inline bool lfs_pair_sync(
const lfs_block_t paira[2],
const lfs_block_t pairb[2]) {
return (paira[0] == pairb[0] && paira[1] == pairb[1]) ||
(paira[0] == pairb[1] && paira[1] == pairb[0]);
}
static inline void lfs_pair_fromle32(lfs_block_t pair[2]) {
pair[0] = lfs_fromle32(pair[0]);
pair[1] = lfs_fromle32(pair[1]);
}
static inline void lfs_pair_tole32(lfs_block_t pair[2]) {
pair[0] = lfs_tole32(pair[0]);
pair[1] = lfs_tole32(pair[1]);
}
// operations on 32-bit entry tags
typedef uint32_t lfs_tag_t;
typedef int32_t lfs_stag_t;
Added root entry and expanding superblocks Expanding superblocks has been on my wishlist for a while. The basic idea is that instead of maintaining a fixed offset blocks {0, 1} to the the root directory (1 pointer), we maintain a dynamically sized linked-list of superblocks that point to the actual root. If the number of writes to the root exceeds some value, we increase the size of the superblock linked-list. This can leverage existing metadata-pair operations. The revision count for metadata-pairs provides some knowledge on how much wear we've put on the superblock, and the threaded linked-list can also be reused for this purpose. This means superblock expansion is both optional and cheap to implement. Expanding superblocks helps both extremely small and extremely large filesystem (extreme being relative of course). On the small end, we can actually collapse the superblock into the root directory and drop the hard requirement of 4-blocks for the superblock. On the large end, our superblock will now last longer than the rest of the filesystem. Each time we expand, the number of cycles until the superblock dies is increased by a power. Before we were stuck with this layout: level cycles limit layout 1 E^2 390 MiB s0 -> root Now we expand every time a fixed offset is exceeded: level cycles limit layout 0 E 4 KiB s0+root 1 E^2 390 MiB s0 -> root 2 E^3 37 TiB s0 -> s1 -> root 3 E^4 3.6 EiB s0 -> s1 -> s2 -> root ... Where the cycles are the number of cycles before death, and the limit is the worst-case size a filesystem where early superblock death becomes a concern (all writes to root using this formula: E^|s| = E*B, E = erase cycles = 100000, B = block count, assuming 4096 byte blocks). Note we can also store copies of the superblock entry on the expanded superblocks. This may help filesystem recover tools in the future.
2018-08-06 18:30:51 +00:00
#define LFS_MKTAG(type, id, size) \
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
(((lfs_tag_t)(type) << 20) | ((lfs_tag_t)(id) << 10) | (lfs_tag_t)(size))
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
#define LFS_MKTAG_IF(cond, type, id, size) \
((cond) ? LFS_MKTAG(type, id, size) : LFS_MKTAG(LFS_FROM_NOOP, 0, 0))
#define LFS_MKTAG_IF_ELSE(cond, type1, id1, size1, type2, id2, size2) \
2020-02-12 17:31:34 +00:00
((cond) ? LFS_MKTAG(type1, id1, size1) : LFS_MKTAG(type2, id2, size2))
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
static inline bool lfs_tag_isvalid(lfs_tag_t tag) {
return !(tag & 0x80000000);
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
static inline bool lfs_tag_isdelete(lfs_tag_t tag) {
return ((int32_t)(tag << 22) >> 22) == -1;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
static inline uint16_t lfs_tag_type1(lfs_tag_t tag) {
return (tag & 0x70000000) >> 20;
}
static inline uint16_t lfs_tag_type3(lfs_tag_t tag) {
return (tag & 0x7ff00000) >> 20;
Added support for deleting attributes littlefs has a mechanism for deleting file entries, but it doesn't have a mechanism for deleting individual tags. This _is_ sufficient for a filesystem, but limits our flexibility. Deleting attributes would be useful in the custom attribute API and for future improvements (hint the child pointers in B-trees). However, deleteing attributes is tricky. We can't just omit the attribute, since we can only add new tags. Additionally, we need a way to track what attributes have been deleted during compaction, which currently relies on writing out attributes to disk. The solution here is pretty nifty. First we have to come up with a way to represent a "deleted" attribute. Rather than adding an additional bit to the already squished tag structure, we use a -1 length field, specifically 0xfff. Now we can commit a delete attribute, and this deleted tag acts as a place holder during compacts. However our delete tag will never leave our metadata log. We need some way to discard our delete tag if we know it's the only representation of that tag on the metadata log. Ah! We know it's the only tag if it's in the first commit on the metadata log. So we add an additional bit to the CRC entry to indicate if we're on the first commit, and use that to decide if we need to keep delete tags around. Now we have working tag deletion. Interestingly enough, tag deletion is actually indirectly more efficient than entry deletion, since compacting entries requires multiple passes, whereas tag deletion gets cleaned up lazily. However we can't adopt the same strategy in entry deletion because of the compact ordering of entries. Tag deletion works because tag types are unique and static. Managing entry deletion in this manner would require static id allocation, which would cause problems when creating files, running out of space, and disallow arbitrary insertions of files.
2018-09-09 22:48:11 +00:00
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
static inline uint8_t lfs_tag_chunk(lfs_tag_t tag) {
return (tag & 0x0ff00000) >> 20;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
static inline int8_t lfs_tag_splice(lfs_tag_t tag) {
return (int8_t)lfs_tag_chunk(tag);
}
static inline uint16_t lfs_tag_id(lfs_tag_t tag) {
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
return (tag & 0x000ffc00) >> 10;
}
static inline lfs_size_t lfs_tag_size(lfs_tag_t tag) {
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
return tag & 0x000003ff;
}
static inline lfs_size_t lfs_tag_dsize(lfs_tag_t tag) {
return sizeof(tag) + lfs_tag_size(tag + lfs_tag_isdelete(tag));
Added support for deleting attributes littlefs has a mechanism for deleting file entries, but it doesn't have a mechanism for deleting individual tags. This _is_ sufficient for a filesystem, but limits our flexibility. Deleting attributes would be useful in the custom attribute API and for future improvements (hint the child pointers in B-trees). However, deleteing attributes is tricky. We can't just omit the attribute, since we can only add new tags. Additionally, we need a way to track what attributes have been deleted during compaction, which currently relies on writing out attributes to disk. The solution here is pretty nifty. First we have to come up with a way to represent a "deleted" attribute. Rather than adding an additional bit to the already squished tag structure, we use a -1 length field, specifically 0xfff. Now we can commit a delete attribute, and this deleted tag acts as a place holder during compacts. However our delete tag will never leave our metadata log. We need some way to discard our delete tag if we know it's the only representation of that tag on the metadata log. Ah! We know it's the only tag if it's in the first commit on the metadata log. So we add an additional bit to the CRC entry to indicate if we're on the first commit, and use that to decide if we need to keep delete tags around. Now we have working tag deletion. Interestingly enough, tag deletion is actually indirectly more efficient than entry deletion, since compacting entries requires multiple passes, whereas tag deletion gets cleaned up lazily. However we can't adopt the same strategy in entry deletion because of the compact ordering of entries. Tag deletion works because tag types are unique and static. Managing entry deletion in this manner would require static id allocation, which would cause problems when creating files, running out of space, and disallow arbitrary insertions of files.
2018-09-09 22:48:11 +00:00
}
// operations on attributes in attribute lists
struct lfs_mattr {
lfs_tag_t tag;
const void *buffer;
};
struct lfs_diskoff {
lfs_block_t block;
lfs_off_t off;
};
#define LFS_MKATTRS(...) \
(struct lfs_mattr[]){__VA_ARGS__}, \
sizeof((struct lfs_mattr[]){__VA_ARGS__}) / sizeof(struct lfs_mattr)
// operations on global state
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
static inline void lfs_gstate_xor(lfs_gstate_t *a, const lfs_gstate_t *b) {
for (int i = 0; i < 3; i++) {
((uint32_t*)a)[i] ^= ((const uint32_t*)b)[i];
}
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
static inline bool lfs_gstate_iszero(const lfs_gstate_t *a) {
for (int i = 0; i < 3; i++) {
if (((uint32_t*)a)[i] != 0) {
return false;
}
}
return true;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
static inline bool lfs_gstate_hasorphans(const lfs_gstate_t *a) {
return lfs_tag_size(a->tag);
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
static inline uint8_t lfs_gstate_getorphans(const lfs_gstate_t *a) {
return lfs_tag_size(a->tag);
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
static inline bool lfs_gstate_hasmove(const lfs_gstate_t *a) {
return lfs_tag_type1(a->tag);
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
static inline bool lfs_gstate_hasmovehere(const lfs_gstate_t *a,
const lfs_block_t *pair) {
return lfs_tag_type1(a->tag) && lfs_pair_cmp(a->pair, pair) == 0;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
static inline void lfs_gstate_fromle32(lfs_gstate_t *a) {
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
a->tag = lfs_fromle32(a->tag);
a->pair[0] = lfs_fromle32(a->pair[0]);
a->pair[1] = lfs_fromle32(a->pair[1]);
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
static inline void lfs_gstate_tole32(lfs_gstate_t *a) {
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
a->tag = lfs_tole32(a->tag);
a->pair[0] = lfs_tole32(a->pair[0]);
a->pair[1] = lfs_tole32(a->pair[1]);
}
// other endianness operations
static void lfs_ctz_fromle32(struct lfs_ctz *ctz) {
ctz->head = lfs_fromle32(ctz->head);
ctz->size = lfs_fromle32(ctz->size);
}
static void lfs_ctz_tole32(struct lfs_ctz *ctz) {
ctz->head = lfs_tole32(ctz->head);
ctz->size = lfs_tole32(ctz->size);
}
static inline void lfs_superblock_fromle32(lfs_superblock_t *superblock) {
superblock->version = lfs_fromle32(superblock->version);
superblock->block_size = lfs_fromle32(superblock->block_size);
superblock->block_count = lfs_fromle32(superblock->block_count);
superblock->name_max = lfs_fromle32(superblock->name_max);
superblock->file_max = lfs_fromle32(superblock->file_max);
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
superblock->attr_max = lfs_fromle32(superblock->attr_max);
}
static inline void lfs_superblock_tole32(lfs_superblock_t *superblock) {
superblock->version = lfs_tole32(superblock->version);
superblock->block_size = lfs_tole32(superblock->block_size);
superblock->block_count = lfs_tole32(superblock->block_count);
superblock->name_max = lfs_tole32(superblock->name_max);
superblock->file_max = lfs_tole32(superblock->file_max);
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
superblock->attr_max = lfs_tole32(superblock->attr_max);
}
/// Internal operations predeclared here ///
static int lfs_dir_commit(lfs_t *lfs, lfs_mdir_t *dir,
const struct lfs_mattr *attrs, int attrcount);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
static int lfs_dir_compact(lfs_t *lfs,
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_mdir_t *dir, const struct lfs_mattr *attrs, int attrcount,
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
lfs_mdir_t *source, uint16_t begin, uint16_t end);
static int lfs_file_outline(lfs_t *lfs, lfs_file_t *file);
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
static int lfs_file_flush(lfs_t *lfs, lfs_file_t *file);
static void lfs_fs_preporphans(lfs_t *lfs, int8_t orphans);
static void lfs_fs_prepmove(lfs_t *lfs,
uint16_t id, const lfs_block_t pair[2]);
static int lfs_fs_pred(lfs_t *lfs, const lfs_block_t dir[2],
lfs_mdir_t *pdir);
static lfs_stag_t lfs_fs_parent(lfs_t *lfs, const lfs_block_t dir[2],
lfs_mdir_t *parent);
static int lfs_fs_relocate(lfs_t *lfs,
const lfs_block_t oldpair[2], lfs_block_t newpair[2]);
int lfs_fs_traverseraw(lfs_t *lfs,
int (*cb)(void *data, lfs_block_t block), void *data,
bool includeorphans);
static int lfs_fs_forceconsistency(lfs_t *lfs);
static int lfs_deinit(lfs_t *lfs);
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
#ifdef LFS_MIGRATE
static int lfs1_traverse(lfs_t *lfs,
int (*cb)(void*, lfs_block_t), void *data);
#endif
/// Block allocator ///
static int lfs_alloc_lookahead(void *p, lfs_block_t block) {
lfs_t *lfs = (lfs_t*)p;
lfs_block_t off = ((block - lfs->free.off)
+ lfs->cfg->block_count) % lfs->cfg->block_count;
if (off < lfs->free.size) {
lfs->free.buffer[off / 32] |= 1U << (off % 32);
}
return 0;
}
// indicate allocated blocks have been committed into the filesystem, this
// is to prevent blocks from being garbage collected in the middle of a
// commit operation
static void lfs_alloc_ack(lfs_t *lfs) {
lfs->free.ack = lfs->cfg->block_count;
}
// drop the lookahead buffer, this is done during mounting and failed
// traversals in order to avoid invalid lookahead state
static void lfs_alloc_drop(lfs_t *lfs) {
lfs->free.size = 0;
lfs->free.i = 0;
lfs_alloc_ack(lfs);
}
static int lfs_alloc(lfs_t *lfs, lfs_block_t *block) {
while (true) {
while (lfs->free.i != lfs->free.size) {
lfs_block_t off = lfs->free.i;
lfs->free.i += 1;
lfs->free.ack -= 1;
if (!(lfs->free.buffer[off / 32] & (1U << (off % 32)))) {
// found a free block
*block = (lfs->free.off + off) % lfs->cfg->block_count;
// eagerly find next off so an alloc ack can
// discredit old lookahead blocks
while (lfs->free.i != lfs->free.size &&
(lfs->free.buffer[lfs->free.i / 32]
& (1U << (lfs->free.i % 32)))) {
lfs->free.i += 1;
lfs->free.ack -= 1;
}
return 0;
}
}
// check if we have looked at all blocks since last ack
if (lfs->free.ack == 0) {
LFS_ERROR("No more free space %"PRIu32,
lfs->free.i + lfs->free.off);
return LFS_ERR_NOSPC;
}
lfs->free.off = (lfs->free.off + lfs->free.size)
% lfs->cfg->block_count;
lfs->free.size = lfs_min(8*lfs->cfg->lookahead_size, lfs->free.ack);
lfs->free.i = 0;
// find mask of free blocks from tree
memset(lfs->free.buffer, 0, lfs->cfg->lookahead_size);
int err = lfs_fs_traverseraw(lfs, lfs_alloc_lookahead, lfs, true);
if (err) {
lfs_alloc_drop(lfs);
return err;
}
}
}
/// Metadata pair and directory operations ///
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
static lfs_stag_t lfs_dir_getslice(lfs_t *lfs, const lfs_mdir_t *dir,
lfs_tag_t gmask, lfs_tag_t gtag,
lfs_off_t goff, void *gbuffer, lfs_size_t gsize) {
lfs_off_t off = dir->off;
lfs_tag_t ntag = dir->etag;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_stag_t gdiff = 0;
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
if (lfs_gstate_hasmovehere(&lfs->gdisk, dir->pair) &&
lfs_tag_id(gmask) != 0 &&
lfs_tag_id(lfs->gdisk.tag) <= lfs_tag_id(gtag)) {
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// synthetic moves
gdiff -= LFS_MKTAG(0, 1, 0);
}
Modified lfs_dir_compact to avoid redundant erases during split The commit machine in littlefs has three stages: commit, compact, and then split. First we try to append our commit to the metadata log, if that fails we try to compact the metadata log to remove duplicates and make room for the commit, if that still fails we split the metadata into two metadata-pairs and try again. Each stage is less efficient but also less frequent. However, in the case that we're filling up a directory with new files, such as the bootstrap process in setting up a new system, we must pass through all three stages rather quickly in order to get enough metadata-pairs to hold all of our files. This means we'll compact, split, and then need to compact again. This creates more erases than is needed in the optimal case, which can be a big cost on disks with an expensive erase operation. In theory, we can actually avoid this redundant erase by reusing the data we wrote out in the first attempt to compact. In practice, this trick is very complicated to pull off. 1. We may need to cache a half-completed program while we write out the new metadata-pair. We need to write out the second pair first in order to get our new tail before we complete our first metadata-pair. This requires two pcaches, which we don't have The solution here is to just drop our cache and reconstruct what if would have been. This needs to be perfect down to the byte level because we don't have knowledge of where our cache lines are. 2. We may have written out entries that are then moved to the new metadata-pair. The solution here isn't pretty but it works, we just add a delete tag for any entry that was moved over. In the end the solution ends up a bit hacky, with different layers poked through the commit logic in order to manage writes at the byte level from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
// iterate over dir block backwards (for faster lookups)
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
while (off >= sizeof(lfs_tag_t) + lfs_tag_dsize(ntag)) {
off -= lfs_tag_dsize(ntag);
lfs_tag_t tag = ntag;
int err = lfs_bd_read(lfs,
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
NULL, &lfs->rcache, sizeof(ntag),
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
dir->pair[0], off, &ntag, sizeof(ntag));
if (err) {
return err;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
ntag = (lfs_frombe32(ntag) ^ tag) & 0x7fffffff;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (lfs_tag_id(gmask) != 0 &&
lfs_tag_type1(tag) == LFS_TYPE_SPLICE &&
lfs_tag_id(tag) <= lfs_tag_id(gtag - gdiff)) {
if (tag == (LFS_MKTAG(LFS_TYPE_CREATE, 0, 0) |
(LFS_MKTAG(0, 0x3ff, 0) & (gtag - gdiff)))) {
// found where we were created
return LFS_ERR_NOENT;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// move around splices
gdiff += LFS_MKTAG(0, lfs_tag_splice(tag), 0);
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if ((gmask & tag) == (gmask & (gtag - gdiff))) {
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
if (lfs_tag_isdelete(tag)) {
return LFS_ERR_NOENT;
}
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
lfs_size_t diff = lfs_min(lfs_tag_size(tag), gsize);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
err = lfs_bd_read(lfs,
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
NULL, &lfs->rcache, diff,
dir->pair[0], off+sizeof(tag)+goff, gbuffer, diff);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (err) {
return err;
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
}
memset((uint8_t*)gbuffer + diff, 0, gsize - diff);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
return tag + gdiff;
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
}
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
return LFS_ERR_NOENT;
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
}
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
static lfs_stag_t lfs_dir_get(lfs_t *lfs, const lfs_mdir_t *dir,
lfs_tag_t gmask, lfs_tag_t gtag, void *buffer) {
return lfs_dir_getslice(lfs, dir,
gmask, gtag,
0, buffer, lfs_tag_size(gtag));
}
static int lfs_dir_getread(lfs_t *lfs, const lfs_mdir_t *dir,
const lfs_cache_t *pcache, lfs_cache_t *rcache, lfs_size_t hint,
lfs_tag_t gmask, lfs_tag_t gtag,
lfs_off_t off, void *buffer, lfs_size_t size) {
uint8_t *data = buffer;
if (off+size > lfs->cfg->block_size) {
return LFS_ERR_CORRUPT;
}
while (size > 0) {
lfs_size_t diff = size;
if (pcache && pcache->block == LFS_BLOCK_INLINE &&
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
off < pcache->off + pcache->size) {
if (off >= pcache->off) {
// is already in pcache?
diff = lfs_min(diff, pcache->size - (off-pcache->off));
memcpy(data, &pcache->buffer[off-pcache->off], diff);
data += diff;
off += diff;
size -= diff;
continue;
}
// pcache takes priority
diff = lfs_min(diff, pcache->off-off);
}
if (rcache->block == LFS_BLOCK_INLINE &&
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
off < rcache->off + rcache->size) {
if (off >= rcache->off) {
// is already in rcache?
diff = lfs_min(diff, rcache->size - (off-rcache->off));
memcpy(data, &rcache->buffer[off-rcache->off], diff);
data += diff;
off += diff;
size -= diff;
continue;
}
// rcache takes priority
diff = lfs_min(diff, rcache->off-off);
}
// load to cache, first condition can no longer fail
rcache->block = LFS_BLOCK_INLINE;
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
rcache->off = lfs_aligndown(off, lfs->cfg->read_size);
rcache->size = lfs_min(lfs_alignup(off+hint, lfs->cfg->read_size),
lfs->cfg->cache_size);
int err = lfs_dir_getslice(lfs, dir, gmask, gtag,
rcache->off, rcache->buffer, rcache->size);
if (err < 0) {
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
return err;
}
}
return 0;
}
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
static int lfs_dir_traverse_filter(void *p,
lfs_tag_t tag, const void *buffer) {
lfs_tag_t *filtertag = p;
(void)buffer;
// which mask depends on unique bit in tag structure
uint32_t mask = (tag & LFS_MKTAG(0x100, 0, 0))
? LFS_MKTAG(0x7ff, 0x3ff, 0)
: LFS_MKTAG(0x700, 0x3ff, 0);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
// check for redundancy
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if ((mask & tag) == (mask & *filtertag) ||
lfs_tag_isdelete(*filtertag) ||
(LFS_MKTAG(0x7ff, 0x3ff, 0) & tag) == (
LFS_MKTAG(LFS_TYPE_DELETE, 0, 0) |
(LFS_MKTAG(0, 0x3ff, 0) & *filtertag))) {
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
return true;
}
// check if we need to adjust for created/deleted tags
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (lfs_tag_type1(tag) == LFS_TYPE_SPLICE &&
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
lfs_tag_id(tag) <= lfs_tag_id(*filtertag)) {
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
*filtertag += LFS_MKTAG(0, lfs_tag_splice(tag), 0);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
}
return false;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
static int lfs_dir_traverse(lfs_t *lfs,
const lfs_mdir_t *dir, lfs_off_t off, lfs_tag_t ptag,
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
const struct lfs_mattr *attrs, int attrcount,
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_tag_t tmask, lfs_tag_t ttag,
uint16_t begin, uint16_t end, int16_t diff,
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
int (*cb)(void *data, lfs_tag_t tag, const void *buffer), void *data) {
// iterate over directory and attrs
while (true) {
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
lfs_tag_t tag;
const void *buffer;
struct lfs_diskoff disk;
if (off+lfs_tag_dsize(ptag) < dir->off) {
off += lfs_tag_dsize(ptag);
int err = lfs_bd_read(lfs,
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
NULL, &lfs->rcache, sizeof(tag),
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
dir->pair[0], off, &tag, sizeof(tag));
if (err) {
return err;
}
tag = (lfs_frombe32(tag) ^ ptag) | 0x80000000;
disk.block = dir->pair[0];
disk.off = off+sizeof(lfs_tag_t);
buffer = &disk;
ptag = tag;
} else if (attrcount > 0) {
tag = attrs[0].tag;
buffer = attrs[0].buffer;
attrs += 1;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
attrcount -= 1;
} else {
return 0;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
}
lfs_tag_t mask = LFS_MKTAG(0x7ff, 0, 0);
if ((mask & tmask & tag) != (mask & tmask & ttag)) {
continue;
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
}
// do we need to filter? inlining the filtering logic here allows
// for some minor optimizations
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (lfs_tag_id(tmask) != 0) {
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
// scan for duplicates and update tag based on creates/deletes
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
int filter = lfs_dir_traverse(lfs,
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
dir, off, ptag, attrs, attrcount,
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
0, 0, 0, 0, 0,
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
lfs_dir_traverse_filter, &tag);
if (filter < 0) {
return filter;
}
if (filter) {
continue;
}
// in filter range?
if (!(lfs_tag_id(tag) >= begin && lfs_tag_id(tag) < end)) {
continue;
}
}
// handle special cases for mcu-side operations
if (lfs_tag_type3(tag) == LFS_FROM_NOOP) {
// do nothing
} else if (lfs_tag_type3(tag) == LFS_FROM_MOVE) {
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
uint16_t fromid = lfs_tag_size(tag);
uint16_t toid = lfs_tag_id(tag);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
int err = lfs_dir_traverse(lfs,
buffer, 0, 0xffffffff, NULL, 0,
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
LFS_MKTAG(0x600, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_STRUCT, 0, 0),
fromid, fromid+1, toid-fromid+diff,
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
cb, data);
if (err) {
return err;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
} else if (lfs_tag_type3(tag) == LFS_FROM_USERATTRS) {
for (unsigned i = 0; i < lfs_tag_size(tag); i++) {
const struct lfs_attr *a = buffer;
int err = cb(data, LFS_MKTAG(LFS_TYPE_USERATTR + a[i].type,
lfs_tag_id(tag) + diff, a[i].size), a[i].buffer);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
if (err) {
return err;
}
}
} else {
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
int err = cb(data, tag + LFS_MKTAG(0, diff, 0), buffer);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
if (err) {
return err;
}
}
}
}
static lfs_stag_t lfs_dir_fetchmatch(lfs_t *lfs,
lfs_mdir_t *dir, const lfs_block_t pair[2],
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_tag_t fmask, lfs_tag_t ftag, uint16_t *id,
int (*cb)(void *data, lfs_tag_t tag, const void *buffer), void *data) {
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// we can find tag very efficiently during a fetch, since we're already
// scanning the entire directory
lfs_stag_t besttag = -1;
// if either block address is invalid we return LFS_ERR_CORRUPT here,
// otherwise later writes to the pair could fail
if (pair[0] >= lfs->cfg->block_count || pair[1] >= lfs->cfg->block_count) {
return LFS_ERR_CORRUPT;
}
// find the block with the most recent revision
uint32_t revs[2] = {0, 0};
int r = 0;
for (int i = 0; i < 2; i++) {
int err = lfs_bd_read(lfs,
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
NULL, &lfs->rcache, sizeof(revs[i]),
pair[i], 0, &revs[i], sizeof(revs[i]));
revs[i] = lfs_fromle32(revs[i]);
if (err && err != LFS_ERR_CORRUPT) {
return err;
}
if (err != LFS_ERR_CORRUPT &&
lfs_scmp(revs[i], revs[(i+1)%2]) > 0) {
r = i;
}
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
dir->pair[0] = pair[(r+0)%2];
dir->pair[1] = pair[(r+1)%2];
dir->rev = revs[(r+0)%2];
dir->off = 0; // nonzero = found some commits
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// now scan tags to fetch the actual dir and find possible match
for (int i = 0; i < 2; i++) {
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_off_t off = 0;
lfs_tag_t ptag = 0xffffffff;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
uint16_t tempcount = 0;
lfs_block_t temptail[2] = {LFS_BLOCK_NULL, LFS_BLOCK_NULL};
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
bool tempsplit = false;
lfs_stag_t tempbesttag = besttag;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
dir->rev = lfs_tole32(dir->rev);
uint32_t crc = lfs_crc(0xffffffff, &dir->rev, sizeof(dir->rev));
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
dir->rev = lfs_fromle32(dir->rev);
while (true) {
// extract next tag
lfs_tag_t tag;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
off += lfs_tag_dsize(ptag);
int err = lfs_bd_read(lfs,
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
NULL, &lfs->rcache, lfs->cfg->block_size,
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
dir->pair[0], off, &tag, sizeof(tag));
if (err) {
if (err == LFS_ERR_CORRUPT) {
// can't continue?
dir->erased = false;
break;
}
return err;
}
crc = lfs_crc(crc, &tag, sizeof(tag));
Tweaked tag endianness to catch power-loss after <1 word is written There was an interesting subtlety with the existing layout of tags that could become a problem in the future. Basically, littlefs avoids writing to any region of storage it is not absolutely sure has been erased beforehand. This is a part of limiting the number of assumptions about storage. It's possible a storage technology can't support writes without erases in a way that is undetectable at write time (Maybe changing a bit without an erase decreases the longevity of the information stored on the bit). But the existing layout had a very tiny corner case where this wasn't true. Consider the location of the valid bit in the tag struct: [1|--- 31 ---] ^--- valid bit The responsibility of this bit is to indicate if an attempt has been made to write the following commit. If it is not set (the specific value is dependent on a previous read and identified by the preceeding commit), the assumption is that it is safe to write to the next region because it has been erased previously. If it is set, we check if the next commit is valid, if it isn't (because of CRC failure, likely due to power-loss), we discard the commit. But because an attempt has been made to write to that storage, we must then do a compaction to move to the other block in the metadata-pair. This plan looks good on paper, but what does it look like on storage? The problem is that words in littlefs are in little-endian. So on storage the tag actually looks like this: [- 8 -|- 8 -|- 8 -|1|- 7 -] ^-- valid bit This means that we don't actually set the valid bit before writing the tag! We write the lower bytes first. If we lose power, we may have written 3 bytes without this fact being detectable. We could restructure the tag structure to store the valid bit lower, however because none of the fields are 7 bits, this would make the extraction more costly, and we then lose the ability to check this valid bit with a sign comparison. The simple solution is to just store the tag in big-endian. A small benefit is that this will actually have a negative code cost on big-endian machines. This mixture of endiannesses is frustrating, however it is a pragmatic solution with only a 20-byte code size cost.
2018-10-22 21:42:30 +00:00
tag = lfs_frombe32(tag) ^ ptag;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// next commit not yet programmed or we're not in valid range
if (!lfs_tag_isvalid(tag)) {
dir->erased = (lfs_tag_type1(ptag) == LFS_TYPE_CRC &&
dir->off % lfs->cfg->prog_size == 0);
break;
} else if (off + lfs_tag_dsize(tag) > lfs->cfg->block_size) {
dir->erased = false;
break;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
ptag = tag;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (lfs_tag_type1(tag) == LFS_TYPE_CRC) {
// check the crc attr
uint32_t dcrc;
err = lfs_bd_read(lfs,
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
NULL, &lfs->rcache, lfs->cfg->block_size,
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
dir->pair[0], off+sizeof(tag), &dcrc, sizeof(dcrc));
if (err) {
if (err == LFS_ERR_CORRUPT) {
dir->erased = false;
break;
}
return err;
}
dcrc = lfs_fromle32(dcrc);
if (crc != dcrc) {
dir->erased = false;
break;
}
// reset the next bit if we need to
ptag ^= (lfs_tag_t)(lfs_tag_chunk(tag) & 1U) << 31;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// toss our crc into the filesystem seed for
// pseudorandom numbers, note we use another crc here
// as a collection function because it is sufficiently
// random and convenient
lfs->seed = lfs_crc(lfs->seed, &crc, sizeof(crc));
// update with what's found so far
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
besttag = tempbesttag;
dir->off = off + lfs_tag_dsize(tag);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
dir->etag = ptag;
dir->count = tempcount;
dir->tail[0] = temptail[0];
dir->tail[1] = temptail[1];
dir->split = tempsplit;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// reset crc
crc = 0xffffffff;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
continue;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// crc the entry first, hopefully leaving it in the cache
for (lfs_off_t j = sizeof(tag); j < lfs_tag_dsize(tag); j++) {
uint8_t dat;
err = lfs_bd_read(lfs,
NULL, &lfs->rcache, lfs->cfg->block_size,
dir->pair[0], off+j, &dat, 1);
if (err) {
if (err == LFS_ERR_CORRUPT) {
dir->erased = false;
break;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
return err;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
crc = lfs_crc(crc, &dat, 1);
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// directory modification tags?
if (lfs_tag_type1(tag) == LFS_TYPE_NAME) {
// increase count of files if necessary
if (lfs_tag_id(tag) >= tempcount) {
tempcount = lfs_tag_id(tag) + 1;
}
} else if (lfs_tag_type1(tag) == LFS_TYPE_SPLICE) {
tempcount += lfs_tag_splice(tag);
if (tag == (LFS_MKTAG(LFS_TYPE_DELETE, 0, 0) |
(LFS_MKTAG(0, 0x3ff, 0) & tempbesttag))) {
tempbesttag |= 0x80000000;
} else if (tempbesttag != -1 &&
lfs_tag_id(tag) <= lfs_tag_id(tempbesttag)) {
tempbesttag += LFS_MKTAG(0, lfs_tag_splice(tag), 0);
}
} else if (lfs_tag_type1(tag) == LFS_TYPE_TAIL) {
tempsplit = (lfs_tag_chunk(tag) & 1);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
err = lfs_bd_read(lfs,
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
NULL, &lfs->rcache, lfs->cfg->block_size,
dir->pair[0], off+sizeof(tag), &temptail, 8);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (err) {
if (err == LFS_ERR_CORRUPT) {
dir->erased = false;
break;
}
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_pair_fromle32(temptail);
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// found a match for our fetcher?
if ((fmask & tag) == (fmask & ftag)) {
int res = cb(data, tag, &(struct lfs_diskoff){
dir->pair[0], off+sizeof(tag)});
if (res < 0) {
if (res == LFS_ERR_CORRUPT) {
dir->erased = false;
break;
}
return res;
}
if (res == LFS_CMP_EQ) {
// found a match
tempbesttag = tag;
Fixed lfs_dir_fetchmatch not understanding overwritten tags Sometimes small, single line code change hides behind it a complicated story. This is one of those times. If you look at this diff, you may note that this is a case of lfs_dir_fetchmatch not correctly handling a tag that invalidates a callback used to search for some condition, in this case a search for a parent, which is invalidated by a later dir tag overwritting the previous dir pair. But how can this happen? Dir-pair-tags are only overwritten during relocations (when a block goes bad or exceeds the block_cycles config option for dynamic wear-leveling). Other dir operations create new directory entries. And the only lfs_dir_fetchmatch condition that relies on overwrites (as opposed to proper deletes) is when we need to find a directory's parent, an operation that only occurs during a _different_ relocation. And a false _positive_, can only happen if we don't have a parent. Which is really unlikely when we search for directory parents! This bug and minimal test case was found by Matthew Renzelmann. In a unfortunate series of events, first a file creation causes a directory split to occur. This creates a new, orphaned metadata-pair containing our new file. However, the revision count on this metadata-pair indicates the pair is due for relocation as a part of wear-leveling. Normally, this is fine, even though this metadata-pair has no parent, the lfs_dir_find should return ENOENT and continue without error. However, here we get hit by our fetchmatch bug. A previous, unrelated relocation overwrites a pair which just happens to contain the block allocated for a new metadata-pair. When we search for a parent, lfs_dir_fetchmatch incorrectly finds this old, outdated metadata pair and incorrectly tells our orphan it's found its parent. As you can imagine the orphan's dissapointment must be immense. So an unfortunately timed dir split triggers a relocation which incorrectly finds a previously written parent that has been outdated by another relocation. As a solution we can outdate our found tag if it is overwritten by an exact match during lfs_dir_fetchmatch. As a part of this I started adding a new set of tests: tests/test_relocations, for aggressive relocations tests. This is already by appended to by another PR. I suspect relocations is relatively under-tested and is becoming more important due to recent improvements in wear-leveling.
2019-11-26 07:21:42 +00:00
} else if ((LFS_MKTAG(0x7ff, 0x3ff, 0) & tag) ==
(LFS_MKTAG(0x7ff, 0x3ff, 0) & tempbesttag)) {
// found an identical tag, but contents didn't match
// this must mean that our besttag has been overwritten
tempbesttag = -1;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
} else if (res == LFS_CMP_GT &&
lfs_tag_id(tag) <= lfs_tag_id(tempbesttag)) {
// found a greater match, keep track to keep things sorted
tempbesttag = tag | 0x80000000;
}
}
}
// consider what we have good enough
if (dir->off > 0) {
// synthetic move
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
if (lfs_gstate_hasmovehere(&lfs->gdisk, dir->pair)) {
if (lfs_tag_id(lfs->gdisk.tag) == lfs_tag_id(besttag)) {
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
besttag |= 0x80000000;
} else if (besttag != -1 &&
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs_tag_id(lfs->gdisk.tag) < lfs_tag_id(besttag)) {
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
besttag -= LFS_MKTAG(0, 1, 0);
}
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// found tag? or found best id?
if (id) {
*id = lfs_min(lfs_tag_id(besttag), dir->count);
}
if (lfs_tag_isvalid(besttag)) {
return besttag;
} else if (lfs_tag_id(besttag) < dir->count) {
return LFS_ERR_NOENT;
} else {
return 0;
}
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// failed, try the other block?
lfs_pair_swap(dir->pair);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
dir->rev = revs[(r+1)%2];
}
LFS_ERROR("Corrupted dir pair at {0x%"PRIx32", 0x%"PRIx32"}",
dir->pair[0], dir->pair[1]);
return LFS_ERR_CORRUPT;
}
static int lfs_dir_fetch(lfs_t *lfs,
lfs_mdir_t *dir, const lfs_block_t pair[2]) {
// note, mask=-1, tag=-1 can never match a tag since this
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// pattern has the invalid bit set
return (int)lfs_dir_fetchmatch(lfs, dir, pair,
(lfs_tag_t)-1, (lfs_tag_t)-1, NULL, NULL, NULL);
}
static int lfs_dir_getgstate(lfs_t *lfs, const lfs_mdir_t *dir,
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs_gstate_t *gstate) {
lfs_gstate_t temp;
lfs_stag_t res = lfs_dir_get(lfs, dir, LFS_MKTAG(0x7ff, 0, 0),
LFS_MKTAG(LFS_TYPE_MOVESTATE, 0, sizeof(temp)), &temp);
if (res < 0 && res != LFS_ERR_NOENT) {
return res;
}
if (res != LFS_ERR_NOENT) {
// xor together to find resulting gstate
lfs_gstate_fromle32(&temp);
lfs_gstate_xor(gstate, &temp);
}
return 0;
}
static int lfs_dir_getinfo(lfs_t *lfs, lfs_mdir_t *dir,
uint16_t id, struct lfs_info *info) {
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (id == 0x3ff) {
// special case for root
strcpy(info->name, "/");
info->type = LFS_TYPE_DIR;
return 0;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_stag_t tag = lfs_dir_get(lfs, dir, LFS_MKTAG(0x780, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_NAME, id, lfs->name_max+1), info->name);
if (tag < 0) {
return (int)tag;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
info->type = lfs_tag_type3(tag);
struct lfs_ctz ctz;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
tag = lfs_dir_get(lfs, dir, LFS_MKTAG(0x700, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_STRUCT, id, sizeof(ctz)), &ctz);
if (tag < 0) {
return (int)tag;
}
lfs_ctz_fromle32(&ctz);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (lfs_tag_type3(tag) == LFS_TYPE_CTZSTRUCT) {
info->size = ctz.size;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
} else if (lfs_tag_type3(tag) == LFS_TYPE_INLINESTRUCT) {
info->size = lfs_tag_size(tag);
}
return 0;
}
struct lfs_dir_find_match {
lfs_t *lfs;
const void *name;
lfs_size_t size;
};
static int lfs_dir_find_match(void *data,
lfs_tag_t tag, const void *buffer) {
struct lfs_dir_find_match *name = data;
lfs_t *lfs = name->lfs;
const struct lfs_diskoff *disk = buffer;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// compare with disk
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
lfs_size_t diff = lfs_min(name->size, lfs_tag_size(tag));
int res = lfs_bd_cmp(lfs,
NULL, &lfs->rcache, diff,
disk->block, disk->off, name->name, diff);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (res != LFS_CMP_EQ) {
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
return res;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// only equal if our size is still the same
if (name->size != lfs_tag_size(tag)) {
return (name->size < lfs_tag_size(tag)) ? LFS_CMP_LT : LFS_CMP_GT;
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// found a match!
return LFS_CMP_EQ;
}
static lfs_stag_t lfs_dir_find(lfs_t *lfs, lfs_mdir_t *dir,
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
const char **path, uint16_t *id) {
// we reduce path to a single name if we can find it
const char *name = *path;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (id) {
*id = 0x3ff;
}
// default to root dir
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_stag_t tag = LFS_MKTAG(LFS_TYPE_DIR, 0x3ff, 0);
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
dir->tail[0] = lfs->root[0];
dir->tail[1] = lfs->root[1];
while (true) {
nextname:
// skip slashes
name += strspn(name, "/");
lfs_size_t namelen = strcspn(name, "/");
// skip '.' and root '..'
if ((namelen == 1 && memcmp(name, ".", 1) == 0) ||
(namelen == 2 && memcmp(name, "..", 2) == 0)) {
name += namelen;
goto nextname;
}
// skip if matched by '..' in name
const char *suffix = name + namelen;
lfs_size_t sufflen;
int depth = 1;
while (true) {
suffix += strspn(suffix, "/");
sufflen = strcspn(suffix, "/");
if (sufflen == 0) {
break;
}
if (sufflen == 2 && memcmp(suffix, "..", 2) == 0) {
depth -= 1;
if (depth == 0) {
name = suffix + sufflen;
goto nextname;
}
} else {
depth += 1;
}
suffix += sufflen;
}
// found path
if (name[0] == '\0') {
return tag;
}
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
// update what we've found so far
*path = name;
// only continue if we hit a directory
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (lfs_tag_type3(tag) != LFS_TYPE_DIR) {
return LFS_ERR_NOTDIR;
}
// grab the entry data
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (lfs_tag_id(tag) != 0x3ff) {
lfs_stag_t res = lfs_dir_get(lfs, dir, LFS_MKTAG(0x700, 0x3ff, 0),
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
LFS_MKTAG(LFS_TYPE_STRUCT, lfs_tag_id(tag), 8), dir->tail);
if (res < 0) {
return res;
}
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
lfs_pair_fromle32(dir->tail);
}
// find entry matching name
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
while (true) {
tag = lfs_dir_fetchmatch(lfs, dir, dir->tail,
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
LFS_MKTAG(0x780, 0, 0),
LFS_MKTAG(LFS_TYPE_NAME, 0, namelen),
// are we last name?
(strchr(name, '/') == NULL) ? id : NULL,
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
lfs_dir_find_match, &(struct lfs_dir_find_match){
lfs, name, namelen});
if (tag < 0) {
return tag;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (tag) {
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
break;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (!dir->split) {
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
return LFS_ERR_NOENT;
}
}
// to next name
name += namelen;
}
}
// commit logic
struct lfs_commit {
lfs_block_t block;
lfs_off_t off;
lfs_tag_t ptag;
uint32_t crc;
lfs_off_t begin;
lfs_off_t end;
};
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
static int lfs_dir_commitprog(lfs_t *lfs, struct lfs_commit *commit,
const void *buffer, lfs_size_t size) {
int err = lfs_bd_prog(lfs,
&lfs->pcache, &lfs->rcache, false,
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
commit->block, commit->off ,
(const uint8_t*)buffer, size);
if (err) {
return err;
}
commit->crc = lfs_crc(commit->crc, buffer, size);
commit->off += size;
return 0;
}
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
static int lfs_dir_commitattr(lfs_t *lfs, struct lfs_commit *commit,
lfs_tag_t tag, const void *buffer) {
// check if we fit
lfs_size_t dsize = lfs_tag_dsize(tag);
if (commit->off + dsize > commit->end) {
return LFS_ERR_NOSPC;
}
// write out tag
Tweaked tag endianness to catch power-loss after <1 word is written There was an interesting subtlety with the existing layout of tags that could become a problem in the future. Basically, littlefs avoids writing to any region of storage it is not absolutely sure has been erased beforehand. This is a part of limiting the number of assumptions about storage. It's possible a storage technology can't support writes without erases in a way that is undetectable at write time (Maybe changing a bit without an erase decreases the longevity of the information stored on the bit). But the existing layout had a very tiny corner case where this wasn't true. Consider the location of the valid bit in the tag struct: [1|--- 31 ---] ^--- valid bit The responsibility of this bit is to indicate if an attempt has been made to write the following commit. If it is not set (the specific value is dependent on a previous read and identified by the preceeding commit), the assumption is that it is safe to write to the next region because it has been erased previously. If it is set, we check if the next commit is valid, if it isn't (because of CRC failure, likely due to power-loss), we discard the commit. But because an attempt has been made to write to that storage, we must then do a compaction to move to the other block in the metadata-pair. This plan looks good on paper, but what does it look like on storage? The problem is that words in littlefs are in little-endian. So on storage the tag actually looks like this: [- 8 -|- 8 -|- 8 -|1|- 7 -] ^-- valid bit This means that we don't actually set the valid bit before writing the tag! We write the lower bytes first. If we lose power, we may have written 3 bytes without this fact being detectable. We could restructure the tag structure to store the valid bit lower, however because none of the fields are 7 bits, this would make the extraction more costly, and we then lose the ability to check this valid bit with a sign comparison. The simple solution is to just store the tag in big-endian. A small benefit is that this will actually have a negative code cost on big-endian machines. This mixture of endiannesses is frustrating, however it is a pragmatic solution with only a 20-byte code size cost.
2018-10-22 21:42:30 +00:00
lfs_tag_t ntag = lfs_tobe32((tag & 0x7fffffff) ^ commit->ptag);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
int err = lfs_dir_commitprog(lfs, commit, &ntag, sizeof(ntag));
if (err) {
return err;
}
if (!(tag & 0x80000000)) {
// from memory
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
err = lfs_dir_commitprog(lfs, commit, buffer, dsize-sizeof(tag));
if (err) {
return err;
}
} else {
// from disk
const struct lfs_diskoff *disk = buffer;
for (lfs_off_t i = 0; i < dsize-sizeof(tag); i++) {
// rely on caching to make this efficient
uint8_t dat;
err = lfs_bd_read(lfs,
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
NULL, &lfs->rcache, dsize-sizeof(tag)-i,
disk->block, disk->off+i, &dat, 1);
if (err) {
return err;
}
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
err = lfs_dir_commitprog(lfs, commit, &dat, 1);
if (err) {
return err;
}
}
}
commit->ptag = tag & 0x7fffffff;
return 0;
}
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
static int lfs_dir_commitcrc(lfs_t *lfs, struct lfs_commit *commit) {
// align to program units
Fixed single unchecked bit during commit verification This bug was exposed by the bad-block tests due to changes to block allocation, but could have been hit before these changes. In flash, when blocks fail, they don't fail in a predictable manner. To account for this, the bad-block tests check a number of failure behaviors. The interesting one here is "LFS_TESTBD_BADBLOCK_ERASENOOP", in which bad blocks can not be erased or programmed, and are stuck with the data written at the time the blocks go bad. This is actually a pretty realistic failure behavior, since flash needs a large voltage to force the electrons of the floating gates. Though realistically, such a failure would like corrupt the data a bit, not leave the underlying data perfectly intact. LFS_TESTBD_BADBLOCK_ERASENOOP is rather interesting to test for because it means bad blocks can end up with perfectly valid CRCs after a failed write, confusing littlefs. --- In this case, we had the perfect series of operations such that a test was repeatedly writing the same sequence of metadata commits to the same block, which eventually goes bad, leaving the block stuck with metadata that occurs later in the sequence. What this means is that after the first commit, the metadata block contained both the first and second commits, even though the loop in the test hadn't reached that point yet. expected actual .----------. .----------. | commit 1 | | commit 1 | | crc 1 | | crc 1 | | | | commit 2 <-- (from previous iteration) | | | crc 2 | '----------' '----------' To protect against this, littlefs normally compares the written CRC against the expected CRC, but because this was the exact same data that it was going to write, this CRCs end up the same. Ah! But doesn't littlefs also encode the state of the next page to keep track of if the next page has been erased or not? Wouldn't that change between iterations? It does! In a single bit in the CRC-tag. But thanks to some incorrect logic attempting to avoid an extra condition in the loop for writing out padding commits, the CRC that littlefs checked against was the CRC immediately before we include the "is-next-page-erased" bit. Changing the verification check to use the same CRC as what is used to verify commits on fetch solves this problem.
2020-11-22 21:07:16 +00:00
const lfs_off_t end = lfs_alignup(commit->off + 2*sizeof(uint32_t),
lfs->cfg->prog_size);
Fixed single unchecked bit during commit verification This bug was exposed by the bad-block tests due to changes to block allocation, but could have been hit before these changes. In flash, when blocks fail, they don't fail in a predictable manner. To account for this, the bad-block tests check a number of failure behaviors. The interesting one here is "LFS_TESTBD_BADBLOCK_ERASENOOP", in which bad blocks can not be erased or programmed, and are stuck with the data written at the time the blocks go bad. This is actually a pretty realistic failure behavior, since flash needs a large voltage to force the electrons of the floating gates. Though realistically, such a failure would like corrupt the data a bit, not leave the underlying data perfectly intact. LFS_TESTBD_BADBLOCK_ERASENOOP is rather interesting to test for because it means bad blocks can end up with perfectly valid CRCs after a failed write, confusing littlefs. --- In this case, we had the perfect series of operations such that a test was repeatedly writing the same sequence of metadata commits to the same block, which eventually goes bad, leaving the block stuck with metadata that occurs later in the sequence. What this means is that after the first commit, the metadata block contained both the first and second commits, even though the loop in the test hadn't reached that point yet. expected actual .----------. .----------. | commit 1 | | commit 1 | | crc 1 | | crc 1 | | | | commit 2 <-- (from previous iteration) | | | crc 2 | '----------' '----------' To protect against this, littlefs normally compares the written CRC against the expected CRC, but because this was the exact same data that it was going to write, this CRCs end up the same. Ah! But doesn't littlefs also encode the state of the next page to keep track of if the next page has been erased or not? Wouldn't that change between iterations? It does! In a single bit in the CRC-tag. But thanks to some incorrect logic attempting to avoid an extra condition in the loop for writing out padding commits, the CRC that littlefs checked against was the CRC immediately before we include the "is-next-page-erased" bit. Changing the verification check to use the same CRC as what is used to verify commits on fetch solves this problem.
2020-11-22 21:07:16 +00:00
lfs_off_t off1 = 0;
uint32_t crc1 = 0;
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
// create crc tags to fill up remainder of commit, note that
// padding is not crced, which lets fetches skip padding but
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
// makes committing a bit more complicated
while (commit->off < end) {
lfs_off_t off = commit->off + sizeof(lfs_tag_t);
lfs_off_t noff = lfs_min(end - off, 0x3fe) + off;
if (noff < end) {
noff = lfs_min(noff, end - 2*sizeof(uint32_t));
}
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
// read erased state from next program unit
lfs_tag_t tag = 0xffffffff;
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
int err = lfs_bd_read(lfs,
NULL, &lfs->rcache, sizeof(tag),
commit->block, noff, &tag, sizeof(tag));
if (err && err != LFS_ERR_CORRUPT) {
return err;
}
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
// build crc tag
bool reset = ~lfs_frombe32(tag) >> 31;
tag = LFS_MKTAG(LFS_TYPE_CRC + reset, 0x3ff, noff - off);
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
// write out crc
uint32_t footer[2];
footer[0] = lfs_tobe32(tag ^ commit->ptag);
commit->crc = lfs_crc(commit->crc, &footer[0], sizeof(footer[0]));
footer[1] = lfs_tole32(commit->crc);
err = lfs_bd_prog(lfs,
&lfs->pcache, &lfs->rcache, false,
commit->block, commit->off, &footer, sizeof(footer));
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
if (err) {
return err;
}
Fixed single unchecked bit during commit verification This bug was exposed by the bad-block tests due to changes to block allocation, but could have been hit before these changes. In flash, when blocks fail, they don't fail in a predictable manner. To account for this, the bad-block tests check a number of failure behaviors. The interesting one here is "LFS_TESTBD_BADBLOCK_ERASENOOP", in which bad blocks can not be erased or programmed, and are stuck with the data written at the time the blocks go bad. This is actually a pretty realistic failure behavior, since flash needs a large voltage to force the electrons of the floating gates. Though realistically, such a failure would like corrupt the data a bit, not leave the underlying data perfectly intact. LFS_TESTBD_BADBLOCK_ERASENOOP is rather interesting to test for because it means bad blocks can end up with perfectly valid CRCs after a failed write, confusing littlefs. --- In this case, we had the perfect series of operations such that a test was repeatedly writing the same sequence of metadata commits to the same block, which eventually goes bad, leaving the block stuck with metadata that occurs later in the sequence. What this means is that after the first commit, the metadata block contained both the first and second commits, even though the loop in the test hadn't reached that point yet. expected actual .----------. .----------. | commit 1 | | commit 1 | | crc 1 | | crc 1 | | | | commit 2 <-- (from previous iteration) | | | crc 2 | '----------' '----------' To protect against this, littlefs normally compares the written CRC against the expected CRC, but because this was the exact same data that it was going to write, this CRCs end up the same. Ah! But doesn't littlefs also encode the state of the next page to keep track of if the next page has been erased or not? Wouldn't that change between iterations? It does! In a single bit in the CRC-tag. But thanks to some incorrect logic attempting to avoid an extra condition in the loop for writing out padding commits, the CRC that littlefs checked against was the CRC immediately before we include the "is-next-page-erased" bit. Changing the verification check to use the same CRC as what is used to verify commits on fetch solves this problem.
2020-11-22 21:07:16 +00:00
// keep track of non-padding checksum to verify
if (off1 == 0) {
off1 = commit->off + sizeof(uint32_t);
crc1 = commit->crc;
}
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
commit->off += sizeof(tag)+lfs_tag_size(tag);
commit->ptag = tag ^ ((lfs_tag_t)reset << 31);
commit->crc = 0xffffffff; // reset crc for next "commit"
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
}
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
// flush buffers
int err = lfs_bd_sync(lfs, &lfs->pcache, &lfs->rcache, false);
if (err) {
return err;
}
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
// successful commit, check checksums to make sure
lfs_off_t off = commit->begin;
Fixed single unchecked bit during commit verification This bug was exposed by the bad-block tests due to changes to block allocation, but could have been hit before these changes. In flash, when blocks fail, they don't fail in a predictable manner. To account for this, the bad-block tests check a number of failure behaviors. The interesting one here is "LFS_TESTBD_BADBLOCK_ERASENOOP", in which bad blocks can not be erased or programmed, and are stuck with the data written at the time the blocks go bad. This is actually a pretty realistic failure behavior, since flash needs a large voltage to force the electrons of the floating gates. Though realistically, such a failure would like corrupt the data a bit, not leave the underlying data perfectly intact. LFS_TESTBD_BADBLOCK_ERASENOOP is rather interesting to test for because it means bad blocks can end up with perfectly valid CRCs after a failed write, confusing littlefs. --- In this case, we had the perfect series of operations such that a test was repeatedly writing the same sequence of metadata commits to the same block, which eventually goes bad, leaving the block stuck with metadata that occurs later in the sequence. What this means is that after the first commit, the metadata block contained both the first and second commits, even though the loop in the test hadn't reached that point yet. expected actual .----------. .----------. | commit 1 | | commit 1 | | crc 1 | | crc 1 | | | | commit 2 <-- (from previous iteration) | | | crc 2 | '----------' '----------' To protect against this, littlefs normally compares the written CRC against the expected CRC, but because this was the exact same data that it was going to write, this CRCs end up the same. Ah! But doesn't littlefs also encode the state of the next page to keep track of if the next page has been erased or not? Wouldn't that change between iterations? It does! In a single bit in the CRC-tag. But thanks to some incorrect logic attempting to avoid an extra condition in the loop for writing out padding commits, the CRC that littlefs checked against was the CRC immediately before we include the "is-next-page-erased" bit. Changing the verification check to use the same CRC as what is used to verify commits on fetch solves this problem.
2020-11-22 21:07:16 +00:00
lfs_off_t noff = off1;
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
while (off < end) {
uint32_t crc = 0xffffffff;
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
for (lfs_off_t i = off; i < noff+sizeof(uint32_t); i++) {
// check against written crc, may catch blocks that
// become readonly and match our commit size exactly
if (i == off1 && crc != crc1) {
return LFS_ERR_CORRUPT;
}
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
// leave it up to caching to make this efficient
uint8_t dat;
err = lfs_bd_read(lfs,
NULL, &lfs->rcache, noff+sizeof(uint32_t)-i,
commit->block, i, &dat, 1);
if (err) {
return err;
}
crc = lfs_crc(crc, &dat, 1);
}
// detected write error?
if (crc != 0) {
return LFS_ERR_CORRUPT;
}
// skip padding
off = lfs_min(end - noff, 0x3fe) + noff;
if (off < end) {
off = lfs_min(off, end - 2*sizeof(uint32_t));
}
noff = off + sizeof(uint32_t);
}
return 0;
}
static int lfs_dir_alloc(lfs_t *lfs, lfs_mdir_t *dir) {
Added root entry and expanding superblocks Expanding superblocks has been on my wishlist for a while. The basic idea is that instead of maintaining a fixed offset blocks {0, 1} to the the root directory (1 pointer), we maintain a dynamically sized linked-list of superblocks that point to the actual root. If the number of writes to the root exceeds some value, we increase the size of the superblock linked-list. This can leverage existing metadata-pair operations. The revision count for metadata-pairs provides some knowledge on how much wear we've put on the superblock, and the threaded linked-list can also be reused for this purpose. This means superblock expansion is both optional and cheap to implement. Expanding superblocks helps both extremely small and extremely large filesystem (extreme being relative of course). On the small end, we can actually collapse the superblock into the root directory and drop the hard requirement of 4-blocks for the superblock. On the large end, our superblock will now last longer than the rest of the filesystem. Each time we expand, the number of cycles until the superblock dies is increased by a power. Before we were stuck with this layout: level cycles limit layout 1 E^2 390 MiB s0 -> root Now we expand every time a fixed offset is exceeded: level cycles limit layout 0 E 4 KiB s0+root 1 E^2 390 MiB s0 -> root 2 E^3 37 TiB s0 -> s1 -> root 3 E^4 3.6 EiB s0 -> s1 -> s2 -> root ... Where the cycles are the number of cycles before death, and the limit is the worst-case size a filesystem where early superblock death becomes a concern (all writes to root using this formula: E^|s| = E*B, E = erase cycles = 100000, B = block count, assuming 4096 byte blocks). Note we can also store copies of the superblock entry on the expanded superblocks. This may help filesystem recover tools in the future.
2018-08-06 18:30:51 +00:00
// allocate pair of dir blocks (backwards, so we write block 1 first)
for (int i = 0; i < 2; i++) {
int err = lfs_alloc(lfs, &dir->pair[(i+1)%2]);
if (err) {
return err;
}
}
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
// zero for reproducability in case initial block is unreadable
dir->rev = 0;
// rather than clobbering one of the blocks we just pretend
// the revision may be valid
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
int err = lfs_bd_read(lfs,
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
NULL, &lfs->rcache, sizeof(dir->rev),
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
dir->pair[0], 0, &dir->rev, sizeof(dir->rev));
dir->rev = lfs_fromle32(dir->rev);
if (err && err != LFS_ERR_CORRUPT) {
return err;
}
// make sure we don't immediately evict
dir->rev += dir->rev & 1;
// set defaults
dir->off = sizeof(dir->rev);
dir->etag = 0xffffffff;
dir->count = 0;
dir->tail[0] = LFS_BLOCK_NULL;
dir->tail[1] = LFS_BLOCK_NULL;
dir->erased = false;
dir->split = false;
// don't write out yet, let caller take care of that
return 0;
}
static int lfs_dir_drop(lfs_t *lfs, lfs_mdir_t *dir, lfs_mdir_t *tail) {
// steal state
int err = lfs_dir_getgstate(lfs, tail, &lfs->gdelta);
if (err) {
return err;
}
// steal tail
lfs_pair_tole32(tail->tail);
err = lfs_dir_commit(lfs, dir, LFS_MKATTRS(
{LFS_MKTAG(LFS_TYPE_TAIL + tail->split, 0x3ff, 8), tail->tail}));
lfs_pair_fromle32(tail->tail);
if (err) {
return err;
}
return 0;
}
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
static int lfs_dir_split(lfs_t *lfs,
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_mdir_t *dir, const struct lfs_mattr *attrs, int attrcount,
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
lfs_mdir_t *source, uint16_t split, uint16_t end) {
// create tail directory
Fixed lfs_dir_fetchmatch not understanding overwritten tags Sometimes small, single line code change hides behind it a complicated story. This is one of those times. If you look at this diff, you may note that this is a case of lfs_dir_fetchmatch not correctly handling a tag that invalidates a callback used to search for some condition, in this case a search for a parent, which is invalidated by a later dir tag overwritting the previous dir pair. But how can this happen? Dir-pair-tags are only overwritten during relocations (when a block goes bad or exceeds the block_cycles config option for dynamic wear-leveling). Other dir operations create new directory entries. And the only lfs_dir_fetchmatch condition that relies on overwrites (as opposed to proper deletes) is when we need to find a directory's parent, an operation that only occurs during a _different_ relocation. And a false _positive_, can only happen if we don't have a parent. Which is really unlikely when we search for directory parents! This bug and minimal test case was found by Matthew Renzelmann. In a unfortunate series of events, first a file creation causes a directory split to occur. This creates a new, orphaned metadata-pair containing our new file. However, the revision count on this metadata-pair indicates the pair is due for relocation as a part of wear-leveling. Normally, this is fine, even though this metadata-pair has no parent, the lfs_dir_find should return ENOENT and continue without error. However, here we get hit by our fetchmatch bug. A previous, unrelated relocation overwrites a pair which just happens to contain the block allocated for a new metadata-pair. When we search for a parent, lfs_dir_fetchmatch incorrectly finds this old, outdated metadata pair and incorrectly tells our orphan it's found its parent. As you can imagine the orphan's dissapointment must be immense. So an unfortunately timed dir split triggers a relocation which incorrectly finds a previously written parent that has been outdated by another relocation. As a solution we can outdate our found tag if it is overwritten by an exact match during lfs_dir_fetchmatch. As a part of this I started adding a new set of tests: tests/test_relocations, for aggressive relocations tests. This is already by appended to by another PR. I suspect relocations is relatively under-tested and is becoming more important due to recent improvements in wear-leveling.
2019-11-26 07:21:42 +00:00
lfs_alloc_ack(lfs);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
lfs_mdir_t tail;
int err = lfs_dir_alloc(lfs, &tail);
if (err) {
return err;
}
tail.split = dir->split;
tail.tail[0] = dir->tail[0];
tail.tail[1] = dir->tail[1];
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
err = lfs_dir_compact(lfs, &tail, attrs, attrcount, source, split, end);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
if (err) {
return err;
}
dir->tail[0] = tail.pair[0];
dir->tail[1] = tail.pair[1];
dir->split = true;
// update root if needed
if (lfs_pair_cmp(dir->pair, lfs->root) == 0 && split == 0) {
lfs->root[0] = tail.pair[0];
lfs->root[1] = tail.pair[1];
}
return 0;
}
static int lfs_dir_commit_size(void *p, lfs_tag_t tag, const void *buffer) {
lfs_size_t *size = p;
(void)buffer;
*size += lfs_tag_dsize(tag);
return 0;
}
struct lfs_dir_commit_commit {
lfs_t *lfs;
struct lfs_commit *commit;
};
static int lfs_dir_commit_commit(void *p, lfs_tag_t tag, const void *buffer) {
struct lfs_dir_commit_commit *commit = p;
return lfs_dir_commitattr(commit->lfs, commit->commit, tag, buffer);
}
static int lfs_dir_compact(lfs_t *lfs,
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_mdir_t *dir, const struct lfs_mattr *attrs, int attrcount,
lfs_mdir_t *source, uint16_t begin, uint16_t end) {
// save some state in case block is bad
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
const lfs_block_t oldpair[2] = {dir->pair[0], dir->pair[1]};
bool relocated = false;
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
bool tired = false;
// should we split?
while (end - begin > 1) {
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
// find size
lfs_size_t size = 0;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
int err = lfs_dir_traverse(lfs,
source, 0, 0xffffffff, attrs, attrcount,
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
LFS_MKTAG(0x400, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_NAME, 0, 0),
begin, end, -begin,
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
lfs_dir_commit_size, &size);
if (err) {
return err;
}
Modified lfs_dir_compact to avoid redundant erases during split The commit machine in littlefs has three stages: commit, compact, and then split. First we try to append our commit to the metadata log, if that fails we try to compact the metadata log to remove duplicates and make room for the commit, if that still fails we split the metadata into two metadata-pairs and try again. Each stage is less efficient but also less frequent. However, in the case that we're filling up a directory with new files, such as the bootstrap process in setting up a new system, we must pass through all three stages rather quickly in order to get enough metadata-pairs to hold all of our files. This means we'll compact, split, and then need to compact again. This creates more erases than is needed in the optimal case, which can be a big cost on disks with an expensive erase operation. In theory, we can actually avoid this redundant erase by reusing the data we wrote out in the first attempt to compact. In practice, this trick is very complicated to pull off. 1. We may need to cache a half-completed program while we write out the new metadata-pair. We need to write out the second pair first in order to get our new tail before we complete our first metadata-pair. This requires two pcaches, which we don't have The solution here is to just drop our cache and reconstruct what if would have been. This needs to be perfect down to the byte level because we don't have knowledge of where our cache lines are. 2. We may have written out entries that are then moved to the new metadata-pair. The solution here isn't pretty but it works, we just add a delete tag for any entry that was moved over. In the end the solution ends up a bit hacky, with different layers poked through the commit logic in order to manage writes at the byte level from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
// space is complicated, we need room for tail, crc, gstate,
Modified lfs_dir_compact to avoid redundant erases during split The commit machine in littlefs has three stages: commit, compact, and then split. First we try to append our commit to the metadata log, if that fails we try to compact the metadata log to remove duplicates and make room for the commit, if that still fails we split the metadata into two metadata-pairs and try again. Each stage is less efficient but also less frequent. However, in the case that we're filling up a directory with new files, such as the bootstrap process in setting up a new system, we must pass through all three stages rather quickly in order to get enough metadata-pairs to hold all of our files. This means we'll compact, split, and then need to compact again. This creates more erases than is needed in the optimal case, which can be a big cost on disks with an expensive erase operation. In theory, we can actually avoid this redundant erase by reusing the data we wrote out in the first attempt to compact. In practice, this trick is very complicated to pull off. 1. We may need to cache a half-completed program while we write out the new metadata-pair. We need to write out the second pair first in order to get our new tail before we complete our first metadata-pair. This requires two pcaches, which we don't have The solution here is to just drop our cache and reconstruct what if would have been. This needs to be perfect down to the byte level because we don't have knowledge of where our cache lines are. 2. We may have written out entries that are then moved to the new metadata-pair. The solution here isn't pretty but it works, we just add a delete tag for any entry that was moved over. In the end the solution ends up a bit hacky, with different layers poked through the commit logic in order to manage writes at the byte level from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
// cleanup delete, and we cap at half a block to give room
// for metadata updates.
if (end - begin < 0xff &&
size <= lfs_min(lfs->cfg->block_size - 36,
lfs_alignup(lfs->cfg->block_size/2,
lfs->cfg->prog_size))) {
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
break;
}
Modified lfs_dir_compact to avoid redundant erases during split The commit machine in littlefs has three stages: commit, compact, and then split. First we try to append our commit to the metadata log, if that fails we try to compact the metadata log to remove duplicates and make room for the commit, if that still fails we split the metadata into two metadata-pairs and try again. Each stage is less efficient but also less frequent. However, in the case that we're filling up a directory with new files, such as the bootstrap process in setting up a new system, we must pass through all three stages rather quickly in order to get enough metadata-pairs to hold all of our files. This means we'll compact, split, and then need to compact again. This creates more erases than is needed in the optimal case, which can be a big cost on disks with an expensive erase operation. In theory, we can actually avoid this redundant erase by reusing the data we wrote out in the first attempt to compact. In practice, this trick is very complicated to pull off. 1. We may need to cache a half-completed program while we write out the new metadata-pair. We need to write out the second pair first in order to get our new tail before we complete our first metadata-pair. This requires two pcaches, which we don't have The solution here is to just drop our cache and reconstruct what if would have been. This needs to be perfect down to the byte level because we don't have knowledge of where our cache lines are. 2. We may have written out entries that are then moved to the new metadata-pair. The solution here isn't pretty but it works, we just add a delete tag for any entry that was moved over. In the end the solution ends up a bit hacky, with different layers poked through the commit logic in order to manage writes at the byte level from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
// can't fit, need to split, we should really be finding the
// largest size that fits with a small binary search, but right now
// it's not worth the code size
uint16_t split = (end - begin) / 2;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
err = lfs_dir_split(lfs, dir, attrs, attrcount,
source, begin+split, end);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
if (err) {
// if we fail to split, we may be able to overcompact, unless
// we're too big for even the full block, in which case our
// only option is to error
if (err == LFS_ERR_NOSPC && size <= lfs->cfg->block_size - 36) {
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
break;
}
return err;
}
end = begin + split;
}
// increment revision count
dir->rev += 1;
// If our revision count == n * block_cycles, we should force a relocation,
// this is how littlefs wear-levels at the metadata-pair level. Note that we
// actually use (block_cycles+1)|1, this is to avoid two corner cases:
// 1. block_cycles = 1, which would prevent relocations from terminating
// 2. block_cycles = 2n, which, due to aliasing, would only ever relocate
// one metadata block in the pair, effectively making this useless
if (lfs->cfg->block_cycles > 0 &&
(dir->rev % ((lfs->cfg->block_cycles+1)|1) == 0)) {
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
if (lfs_pair_cmp(dir->pair, (const lfs_block_t[2]){0, 1}) == 0) {
// oh no! we're writing too much to the superblock,
// should we expand?
lfs_ssize_t res = lfs_fs_size(lfs);
if (res < 0) {
return res;
}
// do we have extra space? littlefs can't reclaim this space
// by itself, so expand cautiously
if ((lfs_size_t)res < lfs->cfg->block_count/2) {
LFS_DEBUG("Expanding superblock at rev %"PRIu32, dir->rev);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
int err = lfs_dir_split(lfs, dir, attrs, attrcount,
source, begin, end);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
if (err && err != LFS_ERR_NOSPC) {
return err;
}
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
// welp, we tried, if we ran out of space there's not much
// we can do, we'll error later if we've become frozen
if (!err) {
end = begin;
}
}
#ifdef LFS_MIGRATE
} else if (lfs->lfs1) {
// do not proactively relocate blocks during migrations, this
// can cause a number of failure states such: clobbering the
// v1 superblock if we relocate root, and invalidating directory
// pointers if we relocate the head of a directory. On top of
// this, relocations increase the overall complexity of
// lfs_migration, which is already a delicate operation.
#endif
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
} else {
// we're writing too much, time to relocate
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
tired = true;
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
goto relocate;
}
}
// begin loop to commit compaction to blocks until a compact sticks
while (true) {
{
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
// setup commit state
struct lfs_commit commit = {
.block = dir->pair[1],
.off = 0,
.ptag = 0xffffffff,
.crc = 0xffffffff,
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
.begin = 0,
.end = lfs->cfg->block_size - 8,
};
Modified lfs_dir_compact to avoid redundant erases during split The commit machine in littlefs has three stages: commit, compact, and then split. First we try to append our commit to the metadata log, if that fails we try to compact the metadata log to remove duplicates and make room for the commit, if that still fails we split the metadata into two metadata-pairs and try again. Each stage is less efficient but also less frequent. However, in the case that we're filling up a directory with new files, such as the bootstrap process in setting up a new system, we must pass through all three stages rather quickly in order to get enough metadata-pairs to hold all of our files. This means we'll compact, split, and then need to compact again. This creates more erases than is needed in the optimal case, which can be a big cost on disks with an expensive erase operation. In theory, we can actually avoid this redundant erase by reusing the data we wrote out in the first attempt to compact. In practice, this trick is very complicated to pull off. 1. We may need to cache a half-completed program while we write out the new metadata-pair. We need to write out the second pair first in order to get our new tail before we complete our first metadata-pair. This requires two pcaches, which we don't have The solution here is to just drop our cache and reconstruct what if would have been. This needs to be perfect down to the byte level because we don't have knowledge of where our cache lines are. 2. We may have written out entries that are then moved to the new metadata-pair. The solution here isn't pretty but it works, we just add a delete tag for any entry that was moved over. In the end the solution ends up a bit hacky, with different layers poked through the commit logic in order to manage writes at the byte level from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
// erase block to write to
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
int err = lfs_bd_erase(lfs, dir->pair[1]);
Modified lfs_dir_compact to avoid redundant erases during split The commit machine in littlefs has three stages: commit, compact, and then split. First we try to append our commit to the metadata log, if that fails we try to compact the metadata log to remove duplicates and make room for the commit, if that still fails we split the metadata into two metadata-pairs and try again. Each stage is less efficient but also less frequent. However, in the case that we're filling up a directory with new files, such as the bootstrap process in setting up a new system, we must pass through all three stages rather quickly in order to get enough metadata-pairs to hold all of our files. This means we'll compact, split, and then need to compact again. This creates more erases than is needed in the optimal case, which can be a big cost on disks with an expensive erase operation. In theory, we can actually avoid this redundant erase by reusing the data we wrote out in the first attempt to compact. In practice, this trick is very complicated to pull off. 1. We may need to cache a half-completed program while we write out the new metadata-pair. We need to write out the second pair first in order to get our new tail before we complete our first metadata-pair. This requires two pcaches, which we don't have The solution here is to just drop our cache and reconstruct what if would have been. This needs to be perfect down to the byte level because we don't have knowledge of where our cache lines are. 2. We may have written out entries that are then moved to the new metadata-pair. The solution here isn't pretty but it works, we just add a delete tag for any entry that was moved over. In the end the solution ends up a bit hacky, with different layers poked through the commit logic in order to manage writes at the byte level from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
if (err) {
if (err == LFS_ERR_CORRUPT) {
goto relocate;
}
return err;
}
// write out header
dir->rev = lfs_tole32(dir->rev);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
err = lfs_dir_commitprog(lfs, &commit,
&dir->rev, sizeof(dir->rev));
dir->rev = lfs_fromle32(dir->rev);
if (err) {
if (err == LFS_ERR_CORRUPT) {
goto relocate;
}
return err;
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
// traverse the directory, this time writing out all unique tags
err = lfs_dir_traverse(lfs,
source, 0, 0xffffffff, attrs, attrcount,
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
LFS_MKTAG(0x400, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_NAME, 0, 0),
begin, end, -begin,
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
lfs_dir_commit_commit, &(struct lfs_dir_commit_commit){
lfs, &commit});
if (err) {
if (err == LFS_ERR_CORRUPT) {
goto relocate;
}
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
return err;
}
// commit tail, which may be new after last size check
if (!lfs_pair_isnull(dir->tail)) {
lfs_pair_tole32(dir->tail);
err = lfs_dir_commitattr(lfs, &commit,
LFS_MKTAG(LFS_TYPE_TAIL + dir->split, 0x3ff, 8),
dir->tail);
lfs_pair_fromle32(dir->tail);
if (err) {
if (err == LFS_ERR_CORRUPT) {
goto relocate;
}
return err;
Modified lfs_dir_compact to avoid redundant erases during split The commit machine in littlefs has three stages: commit, compact, and then split. First we try to append our commit to the metadata log, if that fails we try to compact the metadata log to remove duplicates and make room for the commit, if that still fails we split the metadata into two metadata-pairs and try again. Each stage is less efficient but also less frequent. However, in the case that we're filling up a directory with new files, such as the bootstrap process in setting up a new system, we must pass through all three stages rather quickly in order to get enough metadata-pairs to hold all of our files. This means we'll compact, split, and then need to compact again. This creates more erases than is needed in the optimal case, which can be a big cost on disks with an expensive erase operation. In theory, we can actually avoid this redundant erase by reusing the data we wrote out in the first attempt to compact. In practice, this trick is very complicated to pull off. 1. We may need to cache a half-completed program while we write out the new metadata-pair. We need to write out the second pair first in order to get our new tail before we complete our first metadata-pair. This requires two pcaches, which we don't have The solution here is to just drop our cache and reconstruct what if would have been. This needs to be perfect down to the byte level because we don't have knowledge of where our cache lines are. 2. We may have written out entries that are then moved to the new metadata-pair. The solution here isn't pretty but it works, we just add a delete tag for any entry that was moved over. In the end the solution ends up a bit hacky, with different layers poked through the commit logic in order to manage writes at the byte level from where we manage splits. But it works fairly well and saves erases.
2018-08-21 02:45:11 +00:00
}
}
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
// bring over gstate?
lfs_gstate_t delta = {0};
if (!relocated) {
lfs_gstate_xor(&delta, &lfs->gdisk);
lfs_gstate_xor(&delta, &lfs->gstate);
}
lfs_gstate_xor(&delta, &lfs->gdelta);
delta.tag &= ~LFS_MKTAG(0, 0, 0x3ff);
err = lfs_dir_getgstate(lfs, dir, &delta);
if (err) {
return err;
}
if (!lfs_gstate_iszero(&delta)) {
lfs_gstate_tole32(&delta);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
err = lfs_dir_commitattr(lfs, &commit,
LFS_MKTAG(LFS_TYPE_MOVESTATE, 0x3ff,
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
sizeof(delta)), &delta);
if (err) {
if (err == LFS_ERR_CORRUPT) {
goto relocate;
}
return err;
Introduced xored-globals logic to fix fundamental problem with moves This was a big roadblock for a while: with the new feature of inlined files, the existing move logic was fundamentally flawed. To pull off atomic moves between two different metadata-pairs, littlefs uses a simple, if a bit clumsy trick. 1. Marks entry as "moving" 2. Copies entry to new metadata-pair 3. Deletes old entry If power is lost before the move operation is completed, we will find the "moving" tag. This means there may or may not be an incomplete move on the filesystem. In this case, we simply search for the moved entry, if we find it, we remove the old entry, otherwise we just remove the "moving" tag. This worked perfectly, until we introduced inlined files. See, unlike the existing directory and ctz entries, inlined files have no guarantee they are unique. There is nothing we can search for that will allow us to find a moved file unless we assign entries globally-unique ids. (note that moves are fundamentally rename operations, so searching for names does not make sense). --- Solving this problem required completely restructuring how littlefs handled moves and pulled out a really old idea that had been left in the cutting room floor back when littlefs was going through many designs: xored-globals. The problem xored-globals solves is the need to maintain some global state via commits to these distributed, independent metadata-pairs. The idea is that we can use some sort of symmetric operation, such as xor, to introduces deltas of the global state that can be committed atomically along with any other info to these metadata-pairs. This means that to figure out our global state, we xor together the global delta stored in every metadata-pair. Which means any commit can update the global state atomically, opening up a whole new set atomic possibilities. There is a couple of downsides. These globals may end up with deltas on every single metadata-pair, effectively duplicating the data for each block. Additionally, these globals need to have multiple copies in RAM. This means and globals need to be a bounded size and very small, since even small globals will have a large footprint. --- On top of xored-globals, it's trivial to fix our move logic. Here we've added an indirect delete tag which allows us to atomically specify a delete of any entry on the filesystem. Our move operation is now: 1. Copy entry to new metadata-pair and atomically xor globals to indirectly delete our original entry. 2. Delete the original entry and xor globals to remove the indirect delete. Extra exciting is that this now takes our relatively clumsy move operation into a sexy guaranteed O(1) move operation with no searching necessary (though we do need to xor globals during mount). Also reintroduced entry struct, now with a specific purpose to describe the metadata-pair + id combo needed by indirect deletes to locate an entry.
2018-05-29 17:35:23 +00:00
}
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
// complete commit with crc
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
err = lfs_dir_commitcrc(lfs, &commit);
if (err) {
if (err == LFS_ERR_CORRUPT) {
goto relocate;
}
return err;
}
// successful compaction, swap dir pair to indicate most recent
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
LFS_ASSERT(commit.off % lfs->cfg->prog_size == 0);
lfs_pair_swap(dir->pair);
dir->count = end - begin;
dir->off = commit.off;
dir->etag = commit.ptag;
2020-02-09 14:52:20 +00:00
// update gstate
lfs->gdelta = (lfs_gstate_t){0};
if (!relocated) {
lfs->gdisk = lfs->gstate;
}
}
break;
relocate:
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
// commit was corrupted, drop caches and prepare to relocate block
relocated = true;
lfs_cache_drop(lfs, &lfs->pcache);
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
if (!tired) {
LFS_DEBUG("Bad block at 0x%"PRIx32, dir->pair[1]);
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
}
// can't relocate superblock, filesystem is now frozen
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
if (lfs_pair_cmp(dir->pair, (const lfs_block_t[2]){0, 1}) == 0) {
LFS_WARN("Superblock 0x%"PRIx32" has become unwritable",
dir->pair[1]);
return LFS_ERR_NOSPC;
}
// relocate half of pair
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
int err = lfs_alloc(lfs, &dir->pair[1]);
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
if (err && (err != LFS_ERR_NOSPC || !tired)) {
return err;
}
Restructured block devices again for better test exploitation Also finished migrating tests with test_relocations and test_exhaustion. The issue I was running into when migrating these tests was a lack of flexibility with what you could do with the block devices. It was possible to hack in some hooks for things like bad blocks and power loss, but it wasn't clean or easily extendable. The solution here was to just put all of these test extensions into a third block device, testbd, that uses the other two example block devices internally. testbd has several useful features for testing. Note this makes it a pretty terrible block device _example_ since these hooks look more complicated than a block device needs to be. - testbd can simulate different erase values, supporting 1s, 0s, other byte patterns, or no erases at all (which can cause surprising bugs). This actually depends on the simulated erase values in ramdb and filebd. I did try to move this out of rambd/filebd, but it's not possible to simulate erases in testbd without buffering entire blocks and creating an excessive amount of extra write operations. - testbd also helps simulate power-loss by containing a "power cycles" counter that is decremented every write operation until it calls exit. This is notably faster than the previous gdb approach, which is valuable since the reentrant tests tend to take a while to resolve. - testbd also tracks wear, which can be manually set and read. This is very useful for testing things like bad block handling, wear leveling, or even changing the effective size of the block device at runtime.
2020-01-16 12:30:40 +00:00
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
tired = false;
continue;
}
if (relocated) {
// update references if we relocated
LFS_DEBUG("Relocating {0x%"PRIx32", 0x%"PRIx32"} "
"-> {0x%"PRIx32", 0x%"PRIx32"}",
oldpair[0], oldpair[1], dir->pair[0], dir->pair[1]);
int err = lfs_fs_relocate(lfs, oldpair, dir->pair);
if (err) {
return err;
}
}
return 0;
}
static int lfs_dir_commit(lfs_t *lfs, lfs_mdir_t *dir,
const struct lfs_mattr *attrs, int attrcount) {
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
// check for any inline files that aren't RAM backed and
// forcefully evict them, needed for filesystem consistency
for (lfs_file_t *f = (lfs_file_t*)lfs->mlist; f; f = f->next) {
if (dir != &f->m && lfs_pair_cmp(f->m.pair, dir->pair) == 0 &&
f->type == LFS_TYPE_REG && (f->flags & LFS_F_INLINE) &&
f->ctz.size > lfs->cfg->cache_size) {
int err = lfs_file_outline(lfs, f);
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
if (err) {
return err;
}
err = lfs_file_flush(lfs, f);
if (err) {
return err;
}
}
}
// calculate changes to the directory
lfs_mdir_t olddir = *dir;
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
bool hasdelete = false;
for (int i = 0; i < attrcount; i++) {
if (lfs_tag_type3(attrs[i].tag) == LFS_TYPE_CREATE) {
dir->count += 1;
} else if (lfs_tag_type3(attrs[i].tag) == LFS_TYPE_DELETE) {
LFS_ASSERT(dir->count > 0);
dir->count -= 1;
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
hasdelete = true;
} else if (lfs_tag_type1(attrs[i].tag) == LFS_TYPE_TAIL) {
dir->tail[0] = ((lfs_block_t*)attrs[i].buffer)[0];
dir->tail[1] = ((lfs_block_t*)attrs[i].buffer)[1];
dir->split = (lfs_tag_chunk(attrs[i].tag) & 1);
lfs_pair_fromle32(dir->tail);
}
}
// should we actually drop the directory block?
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
if (hasdelete && dir->count == 0) {
lfs_mdir_t pdir;
int err = lfs_fs_pred(lfs, dir->pair, &pdir);
if (err && err != LFS_ERR_NOENT) {
2020-02-09 14:52:20 +00:00
*dir = olddir;
return err;
}
if (err != LFS_ERR_NOENT && pdir.split) {
err = lfs_dir_drop(lfs, &pdir, dir);
if (err) {
*dir = olddir;
return err;
}
}
}
if (dir->erased || dir->count >= 0xff) {
// try to commit
struct lfs_commit commit = {
.block = dir->pair[0],
.off = dir->off,
.ptag = dir->etag,
.crc = 0xffffffff,
.begin = dir->off,
.end = lfs->cfg->block_size - 8,
};
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
// traverse attrs that need to be written out
lfs_pair_tole32(dir->tail);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
int err = lfs_dir_traverse(lfs,
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
dir, dir->off, dir->etag, attrs, attrcount,
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
0, 0, 0, 0, 0,
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
lfs_dir_commit_commit, &(struct lfs_dir_commit_commit){
lfs, &commit});
lfs_pair_fromle32(dir->tail);
if (err) {
if (err == LFS_ERR_NOSPC || err == LFS_ERR_CORRUPT) {
goto compact;
}
2020-02-09 14:52:20 +00:00
*dir = olddir;
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
return err;
}
// commit any global diffs if we have any
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs_gstate_t delta = {0};
lfs_gstate_xor(&delta, &lfs->gstate);
lfs_gstate_xor(&delta, &lfs->gdisk);
lfs_gstate_xor(&delta, &lfs->gdelta);
delta.tag &= ~LFS_MKTAG(0, 0, 0x3ff);
if (!lfs_gstate_iszero(&delta)) {
err = lfs_dir_getgstate(lfs, dir, &delta);
if (err) {
2020-02-09 14:52:20 +00:00
*dir = olddir;
return err;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs_gstate_tole32(&delta);
err = lfs_dir_commitattr(lfs, &commit,
LFS_MKTAG(LFS_TYPE_MOVESTATE, 0x3ff,
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
sizeof(delta)), &delta);
if (err) {
if (err == LFS_ERR_NOSPC || err == LFS_ERR_CORRUPT) {
goto compact;
}
2020-02-09 14:52:20 +00:00
*dir = olddir;
return err;
}
}
// finalize commit with the crc
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
err = lfs_dir_commitcrc(lfs, &commit);
if (err) {
if (err == LFS_ERR_NOSPC || err == LFS_ERR_CORRUPT) {
goto compact;
}
2020-02-09 14:52:20 +00:00
*dir = olddir;
return err;
}
// successful commit, update dir
Added a better solution for large prog sizes A current limitation of the lfs tag is the 10-bit (1024) length field. This field is used to indicate padding for commits and effectively limits the size of commits to 1KiB. Because commits must be prog size aligned, this is a problem on devices with prog size > 1024. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ] This can be increased to 12-bit (4096), but for NAND devices this is still to small to completely solve the issue. The previous workaround was to just create unaligned commits. This can occur naturally if littlefs is used on portable media as the prog size does not have to be consistent on different drivers. If littlefs sees an unaligned commit, it treats the dir as unerased and must compact the dir if it creates any new commits. Unfortunately this isn't great. It effectively means that every small commit forced an erase on devices with prog size > 1024. This is pretty terrible. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit |------------------- wasted ---------------------] A different solution, implemented here, is to use multiple crc tags to pad the commit until the remaining space fits in the padding. This effectively looks like multiple empty commits and has a small runtime cost to parse these tags, but otherwise does no harm. [---- 6KiB erase block ----] [-- 2KiB prog size --|-- 2KiB prog size --|-- 2KiB prog size --] [ 1KiB commit | noop | 1KiB commit | noop | 1KiB commit | noop ] It was a bit tricky to implement, but now we can effectively support unlimited prog sizes since there's no limit to the number of commits in a block. found by kazink and joicetm
2019-07-24 19:24:29 +00:00
LFS_ASSERT(commit.off % lfs->cfg->prog_size == 0);
dir->off = commit.off;
dir->etag = commit.ptag;
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
// and update gstate
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->gdisk = lfs->gstate;
lfs->gdelta = (lfs_gstate_t){0};
} else {
compact:
// fall back to compaction
lfs_cache_drop(lfs, &lfs->pcache);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
int err = lfs_dir_compact(lfs, dir, attrs, attrcount,
dir, 0, dir->count);
if (err) {
2020-02-09 14:52:20 +00:00
*dir = olddir;
return err;
}
}
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
// this complicated bit of logic is for fixing up any active
// metadata-pairs that we may have affected
//
// note we have to make two passes since the mdir passed to
// lfs_dir_commit could also be in this list, and even then
// we need to copy the pair so they don't get clobbered if we refetch
// our mdir.
for (struct lfs_mlist *d = lfs->mlist; d; d = d->next) {
if (&d->m != dir && lfs_pair_cmp(d->m.pair, olddir.pair) == 0) {
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
d->m = *dir;
for (int i = 0; i < attrcount; i++) {
if (lfs_tag_type3(attrs[i].tag) == LFS_TYPE_DELETE &&
d->id == lfs_tag_id(attrs[i].tag)) {
d->m.pair[0] = LFS_BLOCK_NULL;
d->m.pair[1] = LFS_BLOCK_NULL;
} else if (lfs_tag_type3(attrs[i].tag) == LFS_TYPE_DELETE &&
d->id > lfs_tag_id(attrs[i].tag)) {
d->id -= 1;
if (d->type == LFS_TYPE_DIR) {
((lfs_dir_t*)d)->pos -= 1;
}
} else if (lfs_tag_type3(attrs[i].tag) == LFS_TYPE_CREATE &&
d->id >= lfs_tag_id(attrs[i].tag)) {
d->id += 1;
if (d->type == LFS_TYPE_DIR) {
((lfs_dir_t*)d)->pos += 1;
}
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
}
}
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
}
}
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
for (struct lfs_mlist *d = lfs->mlist; d; d = d->next) {
if (lfs_pair_cmp(d->m.pair, olddir.pair) == 0) {
while (d->id >= d->m.count && d->m.split) {
// we split and id is on tail now
d->id -= d->m.count;
int err = lfs_dir_fetch(lfs, &d->m, d->m.tail);
if (err) {
return err;
}
}
}
}
return 0;
}
/// Top level directory operations ///
int lfs_mkdir(lfs_t *lfs, const char *path) {
LFS_TRACE("lfs_mkdir(%p, \"%s\")", (void*)lfs, path);
// deorphan if we haven't yet, needed at most once after poweron
int err = lfs_fs_forceconsistency(lfs);
if (err) {
LFS_TRACE("lfs_mkdir -> %d", err);
return err;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
struct lfs_mlist cwd;
cwd.next = lfs->mlist;
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
uint16_t id;
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
err = lfs_dir_find(lfs, &cwd.m, &path, &id);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (!(err == LFS_ERR_NOENT && id != 0x3ff)) {
LFS_TRACE("lfs_mkdir -> %d", (err < 0) ? err : LFS_ERR_EXIST);
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
return (err < 0) ? err : LFS_ERR_EXIST;
}
// check that name fits
lfs_size_t nlen = strlen(path);
if (nlen > lfs->name_max) {
LFS_TRACE("lfs_mkdir -> %d", LFS_ERR_NAMETOOLONG);
return LFS_ERR_NAMETOOLONG;
}
// build up new directory
lfs_alloc_ack(lfs);
lfs_mdir_t dir;
err = lfs_dir_alloc(lfs, &dir);
if (err) {
LFS_TRACE("lfs_mkdir -> %d", err);
return err;
}
// find end of list
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs_mdir_t pred = cwd.m;
while (pred.split) {
err = lfs_dir_fetch(lfs, &pred, pred.tail);
if (err) {
LFS_TRACE("lfs_mkdir -> %d", err);
return err;
}
}
// setup dir
lfs_pair_tole32(pred.tail);
err = lfs_dir_commit(lfs, &dir, LFS_MKATTRS(
{LFS_MKTAG(LFS_TYPE_SOFTTAIL, 0x3ff, 8), pred.tail}));
lfs_pair_fromle32(pred.tail);
if (err) {
LFS_TRACE("lfs_mkdir -> %d", err);
return err;
}
// current block end of list?
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
if (cwd.m.split) {
// update tails, this creates a desync
lfs_fs_preporphans(lfs, +1);
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
// it's possible our predecessor has to be relocated, and if
// our parent is our predecessor's predecessor, this could have
// caused our parent to go out of date, fortunately we can hook
// ourselves into littlefs to catch this
cwd.type = 0;
cwd.id = 0;
lfs->mlist = &cwd;
lfs_pair_tole32(dir.pair);
err = lfs_dir_commit(lfs, &pred, LFS_MKATTRS(
{LFS_MKTAG(LFS_TYPE_SOFTTAIL, 0x3ff, 8), dir.pair}));
lfs_pair_fromle32(dir.pair);
if (err) {
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->mlist = cwd.next;
LFS_TRACE("lfs_mkdir -> %d", err);
return err;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->mlist = cwd.next;
lfs_fs_preporphans(lfs, -1);
}
// now insert into our parent block
lfs_pair_tole32(dir.pair);
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
err = lfs_dir_commit(lfs, &cwd.m, LFS_MKATTRS(
{LFS_MKTAG(LFS_TYPE_CREATE, id, 0), NULL},
{LFS_MKTAG(LFS_TYPE_DIR, id, nlen), path},
{LFS_MKTAG(LFS_TYPE_DIRSTRUCT, id, 8), dir.pair},
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
{LFS_MKTAG_IF(!cwd.m.split,
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
LFS_TYPE_SOFTTAIL, 0x3ff, 8), dir.pair}));
lfs_pair_fromle32(dir.pair);
if (err) {
LFS_TRACE("lfs_mkdir -> %d", err);
return err;
}
LFS_TRACE("lfs_mkdir -> %d", 0);
return 0;
}
int lfs_dir_open(lfs_t *lfs, lfs_dir_t *dir, const char *path) {
LFS_TRACE("lfs_dir_open(%p, %p, \"%s\")", (void*)lfs, (void*)dir, path);
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
lfs_stag_t tag = lfs_dir_find(lfs, &dir->m, &path, NULL);
if (tag < 0) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_dir_open -> %"PRId32, tag);
return tag;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (lfs_tag_type3(tag) != LFS_TYPE_DIR) {
LFS_TRACE("lfs_dir_open -> %d", LFS_ERR_NOTDIR);
return LFS_ERR_NOTDIR;
}
lfs_block_t pair[2];
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (lfs_tag_id(tag) == 0x3ff) {
// handle root dir separately
pair[0] = lfs->root[0];
pair[1] = lfs->root[1];
} else {
// get dir pair from parent
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_stag_t res = lfs_dir_get(lfs, &dir->m, LFS_MKTAG(0x700, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_STRUCT, lfs_tag_id(tag), 8), pair);
if (res < 0) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_dir_open -> %"PRId32, res);
return res;
}
lfs_pair_fromle32(pair);
}
// fetch first pair
int err = lfs_dir_fetch(lfs, &dir->m, pair);
if (err) {
LFS_TRACE("lfs_dir_open -> %d", err);
return err;
}
// setup entry
dir->head[0] = dir->m.pair[0];
dir->head[1] = dir->m.pair[1];
dir->id = 0;
dir->pos = 0;
// add to list of mdirs
dir->type = LFS_TYPE_DIR;
dir->next = (lfs_dir_t*)lfs->mlist;
lfs->mlist = (struct lfs_mlist*)dir;
LFS_TRACE("lfs_dir_open -> %d", 0);
return 0;
}
int lfs_dir_close(lfs_t *lfs, lfs_dir_t *dir) {
LFS_TRACE("lfs_dir_close(%p, %p)", (void*)lfs, (void*)dir);
// remove from list of mdirs
for (struct lfs_mlist **p = &lfs->mlist; *p; p = &(*p)->next) {
if (*p == (struct lfs_mlist*)dir) {
*p = (*p)->next;
break;
}
}
LFS_TRACE("lfs_dir_close -> %d", 0);
return 0;
}
int lfs_dir_read(lfs_t *lfs, lfs_dir_t *dir, struct lfs_info *info) {
LFS_TRACE("lfs_dir_read(%p, %p, %p)",
(void*)lfs, (void*)dir, (void*)info);
memset(info, 0, sizeof(*info));
// special offset for '.' and '..'
if (dir->pos == 0) {
info->type = LFS_TYPE_DIR;
strcpy(info->name, ".");
dir->pos += 1;
LFS_TRACE("lfs_dir_read -> %d", true);
return true;
} else if (dir->pos == 1) {
info->type = LFS_TYPE_DIR;
strcpy(info->name, "..");
dir->pos += 1;
LFS_TRACE("lfs_dir_read -> %d", true);
return true;
}
while (true) {
if (dir->id == dir->m.count) {
if (!dir->m.split) {
LFS_TRACE("lfs_dir_read -> %d", false);
return false;
}
int err = lfs_dir_fetch(lfs, &dir->m, dir->m.tail);
if (err) {
LFS_TRACE("lfs_dir_read -> %d", err);
return err;
}
dir->id = 0;
}
int err = lfs_dir_getinfo(lfs, &dir->m, dir->id, info);
if (err && err != LFS_ERR_NOENT) {
LFS_TRACE("lfs_dir_read -> %d", err);
return err;
}
dir->id += 1;
if (err != LFS_ERR_NOENT) {
break;
}
}
dir->pos += 1;
LFS_TRACE("lfs_dir_read -> %d", true);
return true;
}
int lfs_dir_seek(lfs_t *lfs, lfs_dir_t *dir, lfs_off_t off) {
LFS_TRACE("lfs_dir_seek(%p, %p, %"PRIu32")",
(void*)lfs, (void*)dir, off);
// simply walk from head dir
int err = lfs_dir_rewind(lfs, dir);
if (err) {
LFS_TRACE("lfs_dir_seek -> %d", err);
return err;
}
// first two for ./..
dir->pos = lfs_min(2, off);
off -= dir->pos;
// skip superblock entry
dir->id = (off > 0 && lfs_pair_cmp(dir->head, lfs->root) == 0);
while (off > 0) {
int diff = lfs_min(dir->m.count - dir->id, off);
dir->id += diff;
dir->pos += diff;
off -= diff;
if (dir->id == dir->m.count) {
if (!dir->m.split) {
LFS_TRACE("lfs_dir_seek -> %d", LFS_ERR_INVAL);
return LFS_ERR_INVAL;
}
err = lfs_dir_fetch(lfs, &dir->m, dir->m.tail);
if (err) {
LFS_TRACE("lfs_dir_seek -> %d", err);
return err;
}
dir->id = 0;
}
}
LFS_TRACE("lfs_dir_seek -> %d", 0);
return 0;
}
lfs_soff_t lfs_dir_tell(lfs_t *lfs, lfs_dir_t *dir) {
LFS_TRACE("lfs_dir_tell(%p, %p)", (void*)lfs, (void*)dir);
(void)lfs;
LFS_TRACE("lfs_dir_tell -> %"PRId32, dir->pos);
return dir->pos;
}
int lfs_dir_rewind(lfs_t *lfs, lfs_dir_t *dir) {
LFS_TRACE("lfs_dir_rewind(%p, %p)", (void*)lfs, (void*)dir);
// reload the head dir
int err = lfs_dir_fetch(lfs, &dir->m, dir->head);
if (err) {
LFS_TRACE("lfs_dir_rewind -> %d", err);
return err;
}
dir->id = 0;
dir->pos = 0;
LFS_TRACE("lfs_dir_rewind -> %d", 0);
return 0;
}
/// File index list operations ///
static int lfs_ctz_index(lfs_t *lfs, lfs_off_t *off) {
lfs_off_t size = *off;
lfs_off_t b = lfs->cfg->block_size - 2*4;
lfs_off_t i = size / b;
if (i == 0) {
return 0;
}
i = (size - 4*(lfs_popc(i-1)+2)) / b;
*off = size - b*i - 4*lfs_popc(i);
return i;
}
static int lfs_ctz_find(lfs_t *lfs,
const lfs_cache_t *pcache, lfs_cache_t *rcache,
lfs_block_t head, lfs_size_t size,
lfs_size_t pos, lfs_block_t *block, lfs_off_t *off) {
if (size == 0) {
*block = LFS_BLOCK_NULL;
*off = 0;
return 0;
}
lfs_off_t current = lfs_ctz_index(lfs, &(lfs_off_t){size-1});
lfs_off_t target = lfs_ctz_index(lfs, &pos);
while (current > target) {
lfs_size_t skip = lfs_min(
lfs_npw2(current-target+1) - 1,
lfs_ctz(current));
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
int err = lfs_bd_read(lfs,
pcache, rcache, sizeof(head),
head, 4*skip, &head, sizeof(head));
head = lfs_fromle32(head);
if (err) {
return err;
}
current -= 1 << skip;
}
*block = head;
*off = pos;
return 0;
}
static int lfs_ctz_extend(lfs_t *lfs,
lfs_cache_t *pcache, lfs_cache_t *rcache,
lfs_block_t head, lfs_size_t size,
lfs_block_t *block, lfs_off_t *off) {
while (true) {
// go ahead and grab a block
lfs_block_t nblock;
int err = lfs_alloc(lfs, &nblock);
if (err) {
return err;
}
{
err = lfs_bd_erase(lfs, nblock);
if (err) {
if (err == LFS_ERR_CORRUPT) {
goto relocate;
}
return err;
}
if (size == 0) {
*block = nblock;
*off = 0;
return 0;
}
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
lfs_size_t noff = size - 1;
lfs_off_t index = lfs_ctz_index(lfs, &noff);
noff = noff + 1;
// just copy out the last block if it is incomplete
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
if (noff != lfs->cfg->block_size) {
for (lfs_off_t i = 0; i < noff; i++) {
uint8_t data;
err = lfs_bd_read(lfs,
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
NULL, rcache, noff-i,
head, i, &data, 1);
if (err) {
return err;
}
err = lfs_bd_prog(lfs,
pcache, rcache, true,
nblock, i, &data, 1);
if (err) {
if (err == LFS_ERR_CORRUPT) {
goto relocate;
}
return err;
}
}
*block = nblock;
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
*off = noff;
return 0;
}
// append block
index += 1;
lfs_size_t skips = lfs_ctz(index) + 1;
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
lfs_block_t nhead = head;
for (lfs_off_t i = 0; i < skips; i++) {
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
nhead = lfs_tole32(nhead);
err = lfs_bd_prog(lfs, pcache, rcache, true,
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
nblock, 4*i, &nhead, 4);
nhead = lfs_fromle32(nhead);
if (err) {
if (err == LFS_ERR_CORRUPT) {
goto relocate;
}
return err;
}
if (i != skips-1) {
err = lfs_bd_read(lfs,
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
NULL, rcache, sizeof(nhead),
nhead, 4*i, &nhead, sizeof(nhead));
nhead = lfs_fromle32(nhead);
if (err) {
return err;
}
}
}
*block = nblock;
*off = 4*skips;
return 0;
}
relocate:
LFS_DEBUG("Bad block at 0x%"PRIx32, nblock);
// just clear cache and try a new block
lfs_cache_drop(lfs, pcache);
}
}
static int lfs_ctz_traverse(lfs_t *lfs,
const lfs_cache_t *pcache, lfs_cache_t *rcache,
lfs_block_t head, lfs_size_t size,
int (*cb)(void*, lfs_block_t), void *data) {
if (size == 0) {
return 0;
}
lfs_off_t index = lfs_ctz_index(lfs, &(lfs_off_t){size-1});
while (true) {
int err = cb(data, head);
if (err) {
return err;
}
if (index == 0) {
return 0;
}
lfs_block_t heads[2];
int count = 2 - (index & 1);
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
err = lfs_bd_read(lfs,
pcache, rcache, count*sizeof(head),
head, 0, &heads, count*sizeof(head));
heads[0] = lfs_fromle32(heads[0]);
heads[1] = lfs_fromle32(heads[1]);
if (err) {
return err;
}
for (int i = 0; i < count-1; i++) {
err = cb(data, heads[i]);
if (err) {
return err;
}
}
head = heads[count-1];
index -= count;
}
}
/// Top level file operations ///
int lfs_file_opencfg(lfs_t *lfs, lfs_file_t *file,
const char *path, int flags,
const struct lfs_file_config *cfg) {
LFS_TRACE("lfs_file_opencfg(%p, %p, \"%s\", %x, %p {"
".buffer=%p, .attrs=%p, .attr_count=%"PRIu32"})",
(void*)lfs, (void*)file, path, flags,
(void*)cfg, cfg->buffer, (void*)cfg->attrs, cfg->attr_count);
// deorphan if we haven't yet, needed at most once after poweron
if ((flags & 3) != LFS_O_RDONLY) {
int err = lfs_fs_forceconsistency(lfs);
if (err) {
LFS_TRACE("lfs_file_opencfg -> %d", err);
return err;
}
}
// setup simple file details
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
int err;
file->cfg = cfg;
file->flags = flags | LFS_F_OPENED;
file->pos = 0;
file->off = 0;
file->cache.buffer = NULL;
// allocate entry for file if it doesn't exist
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
lfs_stag_t tag = lfs_dir_find(lfs, &file->m, &path, &file->id);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (tag < 0 && !(tag == LFS_ERR_NOENT && file->id != 0x3ff)) {
err = tag;
goto cleanup;
}
// get id, add to list of mdirs to catch update changes
file->type = LFS_TYPE_REG;
file->next = (lfs_file_t*)lfs->mlist;
lfs->mlist = (struct lfs_mlist*)file;
if (tag == LFS_ERR_NOENT) {
if (!(flags & LFS_O_CREAT)) {
err = LFS_ERR_NOENT;
goto cleanup;
}
// check that name fits
lfs_size_t nlen = strlen(path);
if (nlen > lfs->name_max) {
err = LFS_ERR_NAMETOOLONG;
goto cleanup;
}
// get next slot and create entry to remember name
err = lfs_dir_commit(lfs, &file->m, LFS_MKATTRS(
2020-04-03 01:05:55 +00:00
{LFS_MKTAG(LFS_TYPE_CREATE, file->id, 0), NULL},
{LFS_MKTAG(LFS_TYPE_REG, file->id, nlen), path},
2020-04-03 01:05:55 +00:00
{LFS_MKTAG(LFS_TYPE_INLINESTRUCT, file->id, 0), NULL}));
if (err) {
err = LFS_ERR_NAMETOOLONG;
goto cleanup;
}
tag = LFS_MKTAG(LFS_TYPE_INLINESTRUCT, 0, 0);
} else if (flags & LFS_O_EXCL) {
err = LFS_ERR_EXIST;
goto cleanup;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
} else if (lfs_tag_type3(tag) != LFS_TYPE_REG) {
err = LFS_ERR_ISDIR;
goto cleanup;
} else if (flags & LFS_O_TRUNC) {
// truncate if requested
tag = LFS_MKTAG(LFS_TYPE_INLINESTRUCT, file->id, 0);
file->flags |= LFS_F_DIRTY;
} else {
// try to load what's on disk, if it's inlined we'll fix it later
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
tag = lfs_dir_get(lfs, &file->m, LFS_MKTAG(0x700, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_STRUCT, file->id, 8), &file->ctz);
if (tag < 0) {
err = tag;
goto cleanup;
}
lfs_ctz_fromle32(&file->ctz);
}
// fetch attrs
for (unsigned i = 0; i < file->cfg->attr_count; i++) {
if ((file->flags & 3) != LFS_O_WRONLY) {
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_stag_t res = lfs_dir_get(lfs, &file->m,
LFS_MKTAG(0x7ff, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_USERATTR + file->cfg->attrs[i].type,
file->id, file->cfg->attrs[i].size),
file->cfg->attrs[i].buffer);
if (res < 0 && res != LFS_ERR_NOENT) {
err = res;
goto cleanup;
}
}
if ((file->flags & 3) != LFS_O_RDONLY) {
if (file->cfg->attrs[i].size > lfs->attr_max) {
err = LFS_ERR_NOSPC;
goto cleanup;
}
file->flags |= LFS_F_DIRTY;
}
}
// allocate buffer if needed
if (file->cfg->buffer) {
file->cache.buffer = file->cfg->buffer;
} else {
file->cache.buffer = lfs_malloc(lfs->cfg->cache_size);
if (!file->cache.buffer) {
err = LFS_ERR_NOMEM;
goto cleanup;
}
}
// zero to avoid information leak
lfs_cache_zero(lfs, &file->cache);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (lfs_tag_type3(tag) == LFS_TYPE_INLINESTRUCT) {
// load inline files
file->ctz.head = LFS_BLOCK_INLINE;
file->ctz.size = lfs_tag_size(tag);
file->flags |= LFS_F_INLINE;
file->cache.block = file->ctz.head;
file->cache.off = 0;
file->cache.size = lfs->cfg->cache_size;
// don't always read (may be new/trunc file)
if (file->ctz.size > 0) {
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_stag_t res = lfs_dir_get(lfs, &file->m,
LFS_MKTAG(0x700, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_STRUCT, file->id,
lfs_min(file->cache.size, 0x3fe)),
file->cache.buffer);
if (res < 0) {
err = res;
goto cleanup;
}
}
}
LFS_TRACE("lfs_file_opencfg -> %d", 0);
return 0;
cleanup:
// clean up lingering resources
file->flags |= LFS_F_ERRED;
lfs_file_close(lfs, file);
LFS_TRACE("lfs_file_opencfg -> %d", err);
return err;
}
int lfs_file_open(lfs_t *lfs, lfs_file_t *file,
const char *path, int flags) {
LFS_TRACE("lfs_file_open(%p, %p, \"%s\", %x)",
(void*)lfs, (void*)file, path, flags);
static const struct lfs_file_config defaults = {0};
int err = lfs_file_opencfg(lfs, file, path, flags, &defaults);
LFS_TRACE("lfs_file_open -> %d", err);
return err;
}
int lfs_file_close(lfs_t *lfs, lfs_file_t *file) {
LFS_TRACE("lfs_file_close(%p, %p)", (void*)lfs, (void*)file);
LFS_ASSERT(file->flags & LFS_F_OPENED);
int err = lfs_file_sync(lfs, file);
// remove from list of mdirs
for (struct lfs_mlist **p = &lfs->mlist; *p; p = &(*p)->next) {
if (*p == (struct lfs_mlist*)file) {
*p = (*p)->next;
break;
}
}
// clean up memory
if (!file->cfg->buffer) {
lfs_free(file->cache.buffer);
}
file->flags &= ~LFS_F_OPENED;
LFS_TRACE("lfs_file_close -> %d", err);
return err;
}
static int lfs_file_relocate(lfs_t *lfs, lfs_file_t *file) {
LFS_ASSERT(file->flags & LFS_F_OPENED);
while (true) {
// just relocate what exists into new block
lfs_block_t nblock;
int err = lfs_alloc(lfs, &nblock);
if (err) {
return err;
}
err = lfs_bd_erase(lfs, nblock);
if (err) {
if (err == LFS_ERR_CORRUPT) {
goto relocate;
}
return err;
}
// either read from dirty cache or disk
for (lfs_off_t i = 0; i < file->off; i++) {
uint8_t data;
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
if (file->flags & LFS_F_INLINE) {
err = lfs_dir_getread(lfs, &file->m,
// note we evict inline files before they can be dirty
NULL, &file->cache, file->off-i,
LFS_MKTAG(0xfff, 0x1ff, 0),
LFS_MKTAG(LFS_TYPE_INLINESTRUCT, file->id, 0),
i, &data, 1);
if (err) {
return err;
}
} else {
err = lfs_bd_read(lfs,
&file->cache, &lfs->rcache, file->off-i,
file->block, i, &data, 1);
if (err) {
return err;
}
}
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
err = lfs_bd_prog(lfs,
&lfs->pcache, &lfs->rcache, true,
nblock, i, &data, 1);
if (err) {
if (err == LFS_ERR_CORRUPT) {
goto relocate;
}
return err;
}
}
// copy over new state of file
memcpy(file->cache.buffer, lfs->pcache.buffer, lfs->cfg->cache_size);
file->cache.block = lfs->pcache.block;
file->cache.off = lfs->pcache.off;
file->cache.size = lfs->pcache.size;
lfs_cache_zero(lfs, &lfs->pcache);
file->block = nblock;
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
file->flags |= LFS_F_WRITING;
return 0;
relocate:
LFS_DEBUG("Bad block at 0x%"PRIx32, nblock);
// just clear cache and try a new block
lfs_cache_drop(lfs, &lfs->pcache);
}
}
static int lfs_file_outline(lfs_t *lfs, lfs_file_t *file) {
file->off = file->pos;
lfs_alloc_ack(lfs);
int err = lfs_file_relocate(lfs, file);
if (err) {
return err;
}
file->flags &= ~LFS_F_INLINE;
return 0;
}
static int lfs_file_flush(lfs_t *lfs, lfs_file_t *file) {
LFS_ASSERT(file->flags & LFS_F_OPENED);
if (file->flags & LFS_F_READING) {
if (!(file->flags & LFS_F_INLINE)) {
lfs_cache_drop(lfs, &file->cache);
}
file->flags &= ~LFS_F_READING;
}
if (file->flags & LFS_F_WRITING) {
lfs_off_t pos = file->pos;
if (!(file->flags & LFS_F_INLINE)) {
// copy over anything after current branch
lfs_file_t orig = {
.ctz.head = file->ctz.head,
.ctz.size = file->ctz.size,
.flags = LFS_O_RDONLY | LFS_F_OPENED,
.pos = file->pos,
.cache = lfs->rcache,
};
lfs_cache_drop(lfs, &lfs->rcache);
while (file->pos < file->ctz.size) {
// copy over a byte at a time, leave it up to caching
// to make this efficient
uint8_t data;
lfs_ssize_t res = lfs_file_read(lfs, &orig, &data, 1);
if (res < 0) {
return res;
}
res = lfs_file_write(lfs, file, &data, 1);
if (res < 0) {
return res;
}
// keep our reference to the rcache in sync
if (lfs->rcache.block != LFS_BLOCK_NULL) {
lfs_cache_drop(lfs, &orig.cache);
lfs_cache_drop(lfs, &lfs->rcache);
}
}
// write out what we have
while (true) {
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
int err = lfs_bd_flush(lfs, &file->cache, &lfs->rcache, true);
if (err) {
if (err == LFS_ERR_CORRUPT) {
goto relocate;
}
return err;
}
break;
relocate:
LFS_DEBUG("Bad block at 0x%"PRIx32, file->block);
err = lfs_file_relocate(lfs, file);
if (err) {
return err;
}
}
} else {
file->pos = lfs_max(file->pos, file->ctz.size);
}
// actual file updates
file->ctz.head = file->block;
file->ctz.size = file->pos;
file->flags &= ~LFS_F_WRITING;
file->flags |= LFS_F_DIRTY;
file->pos = pos;
}
return 0;
}
int lfs_file_sync(lfs_t *lfs, lfs_file_t *file) {
LFS_TRACE("lfs_file_sync(%p, %p)", (void*)lfs, (void*)file);
LFS_ASSERT(file->flags & LFS_F_OPENED);
if (file->flags & LFS_F_ERRED) {
// it's not safe to do anything if our file errored
LFS_TRACE("lfs_file_sync -> %d", 0);
return 0;
}
int err = lfs_file_flush(lfs, file);
if (err) {
file->flags |= LFS_F_ERRED;
LFS_TRACE("lfs_file_sync -> %d", err);
return err;
}
if ((file->flags & LFS_F_DIRTY) &&
!lfs_pair_isnull(file->m.pair)) {
// update dir entry
uint16_t type;
const void *buffer;
lfs_size_t size;
struct lfs_ctz ctz;
if (file->flags & LFS_F_INLINE) {
// inline the whole file
type = LFS_TYPE_INLINESTRUCT;
buffer = file->cache.buffer;
size = file->ctz.size;
} else {
// update the ctz reference
type = LFS_TYPE_CTZSTRUCT;
// copy ctz so alloc will work during a relocate
ctz = file->ctz;
lfs_ctz_tole32(&ctz);
buffer = &ctz;
size = sizeof(ctz);
}
// commit file data and attributes
err = lfs_dir_commit(lfs, &file->m, LFS_MKATTRS(
{LFS_MKTAG(type, file->id, size), buffer},
{LFS_MKTAG(LFS_FROM_USERATTRS, file->id,
file->cfg->attr_count), file->cfg->attrs}));
if (err) {
file->flags |= LFS_F_ERRED;
LFS_TRACE("lfs_file_sync -> %d", err);
return err;
}
file->flags &= ~LFS_F_DIRTY;
}
LFS_TRACE("lfs_file_sync -> %d", 0);
return 0;
}
lfs_ssize_t lfs_file_read(lfs_t *lfs, lfs_file_t *file,
void *buffer, lfs_size_t size) {
LFS_TRACE("lfs_file_read(%p, %p, %p, %"PRIu32")",
(void*)lfs, (void*)file, buffer, size);
LFS_ASSERT(file->flags & LFS_F_OPENED);
LFS_ASSERT((file->flags & 3) != LFS_O_WRONLY);
uint8_t *data = buffer;
lfs_size_t nsize = size;
if (file->flags & LFS_F_WRITING) {
// flush out any writes
int err = lfs_file_flush(lfs, file);
if (err) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_read -> %d", err);
return err;
}
}
if (file->pos >= file->ctz.size) {
// eof if past end
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_read -> %d", 0);
return 0;
}
size = lfs_min(size, file->ctz.size - file->pos);
nsize = size;
while (nsize > 0) {
// check if we need a new block
if (!(file->flags & LFS_F_READING) ||
file->off == lfs->cfg->block_size) {
if (!(file->flags & LFS_F_INLINE)) {
int err = lfs_ctz_find(lfs, NULL, &file->cache,
file->ctz.head, file->ctz.size,
file->pos, &file->block, &file->off);
if (err) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_read -> %d", err);
return err;
}
} else {
file->block = LFS_BLOCK_INLINE;
file->off = file->pos;
}
file->flags |= LFS_F_READING;
}
// read as much as we can in current block
lfs_size_t diff = lfs_min(nsize, lfs->cfg->block_size - file->off);
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
if (file->flags & LFS_F_INLINE) {
int err = lfs_dir_getread(lfs, &file->m,
NULL, &file->cache, lfs->cfg->block_size,
LFS_MKTAG(0xfff, 0x1ff, 0),
LFS_MKTAG(LFS_TYPE_INLINESTRUCT, file->id, 0),
file->off, data, diff);
if (err) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_read -> %d", err);
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
return err;
}
} else {
int err = lfs_bd_read(lfs,
NULL, &file->cache, lfs->cfg->block_size,
file->block, file->off, data, diff);
if (err) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_read -> %d", err);
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
return err;
}
}
file->pos += diff;
file->off += diff;
data += diff;
nsize -= diff;
}
LFS_TRACE("lfs_file_read -> %"PRId32, size);
return size;
}
lfs_ssize_t lfs_file_write(lfs_t *lfs, lfs_file_t *file,
const void *buffer, lfs_size_t size) {
LFS_TRACE("lfs_file_write(%p, %p, %p, %"PRIu32")",
(void*)lfs, (void*)file, buffer, size);
LFS_ASSERT(file->flags & LFS_F_OPENED);
LFS_ASSERT((file->flags & 3) != LFS_O_RDONLY);
const uint8_t *data = buffer;
lfs_size_t nsize = size;
if (file->flags & LFS_F_READING) {
// drop any reads
int err = lfs_file_flush(lfs, file);
if (err) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_write -> %d", err);
return err;
}
}
if ((file->flags & LFS_O_APPEND) && file->pos < file->ctz.size) {
file->pos = file->ctz.size;
}
if (file->pos + size > lfs->file_max) {
// Larger than file limit?
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_write -> %d", LFS_ERR_FBIG);
return LFS_ERR_FBIG;
}
if (!(file->flags & LFS_F_WRITING) && file->pos > file->ctz.size) {
// fill with zeros
lfs_off_t pos = file->pos;
file->pos = file->ctz.size;
while (file->pos < pos) {
lfs_ssize_t res = lfs_file_write(lfs, file, &(uint8_t){0}, 1);
if (res < 0) {
LFS_TRACE("lfs_file_write -> %"PRId32, res);
return res;
}
}
}
Added disk-backed limits on the name/attrs/inline sizes Being a portable, microcontroller-scale embedded filesystem, littlefs is presented with a relatively unique challenge. The amount of RAM available is on completely different scales from machine to machine, and what is normally a reasonable RAM assumption may break completely on an embedded system. A great example of this is file names. On almost every PC these days, the limit for a file name is 255 bytes. It's a very convenient limit for a number of reasons. However, on microcontrollers, allocating 255 bytes of RAM to do a file search can be unreasonable. The simplest solution (and one that has existing in littlefs for a while), is to let this limit be redefined to a smaller value on devices that need to save RAM. However, this presents an interesting portability issue. If these devices are plugged into a PC with relatively infinite RAM, nothing stops the PC from writing files with full 255-byte file names, which can't be read on the small device. One solution here is to store this limit on the superblock during format time. When mounting a disk, the filesystem implementation is responsible for checking this limit in the superblock. If it's larger than what can be read, raise an error. If it's smaller, respect the limit on the superblock and raise an error if the user attempts to exceed it. In this commit, this strategy is adopted for file names, inline files, and the size of all attributes, since these could impact the memory consumption of the filesystem. (Recording the attribute's limit is iffy, but is the only other arbitrary limit and could be used for disabling support of custom attributes). Note! This changes makes it very important to configure littlefs correctly at format time. If littlefs is formatted on a PC without changing the limits appropriately, it will be rejected by a smaller device.
2018-04-01 20:36:29 +00:00
if ((file->flags & LFS_F_INLINE) &&
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
lfs_max(file->pos+nsize, file->ctz.size) >
lfs_min(0x3fe, lfs_min(
lfs->cfg->cache_size, lfs->cfg->block_size/8))) {
// inline file doesn't fit anymore
int err = lfs_file_outline(lfs, file);
if (err) {
file->flags |= LFS_F_ERRED;
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_write -> %d", err);
return err;
}
}
while (nsize > 0) {
// check if we need a new block
if (!(file->flags & LFS_F_WRITING) ||
file->off == lfs->cfg->block_size) {
if (!(file->flags & LFS_F_INLINE)) {
if (!(file->flags & LFS_F_WRITING) && file->pos > 0) {
// find out which block we're extending from
int err = lfs_ctz_find(lfs, NULL, &file->cache,
file->ctz.head, file->ctz.size,
file->pos-1, &file->block, &file->off);
if (err) {
file->flags |= LFS_F_ERRED;
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_write -> %d", err);
return err;
}
// mark cache as dirty since we may have read data into it
lfs_cache_zero(lfs, &file->cache);
}
// extend file with new blocks
lfs_alloc_ack(lfs);
int err = lfs_ctz_extend(lfs, &file->cache, &lfs->rcache,
file->block, file->pos,
&file->block, &file->off);
if (err) {
file->flags |= LFS_F_ERRED;
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_write -> %d", err);
return err;
}
} else {
file->block = LFS_BLOCK_INLINE;
file->off = file->pos;
}
file->flags |= LFS_F_WRITING;
}
// program as much as we can in current block
lfs_size_t diff = lfs_min(nsize, lfs->cfg->block_size - file->off);
while (true) {
Revisited caching rules to optimize bus transactions The littlefs driver has always had this really weird quirk: larger cache sizes can significantly harm performance. This has probably been one of the most surprising pieces of configuraing and optimizing littlefs. The reason is that littlefs's caches are kinda dumb (this is somewhat intentional, as dumb caches take up much less code space than smart caches). When littlefs needs to read data, it will load the entire cache line. This means that even when we only need a small 4 byte piece of data, we may need to read a full 512 byte cache. And since microcontrollers may be reading from storage over relatively slow bus protocols, the time to send data over the bus may dominate other operations. Now that we have separate configuration options for "cache_size" and "read_size", we can start making littlefs's caches a bit smarter. They aren't going to be perfect, because code size is still a priority, but there are some small improvements we can do: 1. Program caches write to prog_size aligned units, but eagerly cache as much as possible. There's no downside to using the full cache in program operations. 2. Add a hint parameter to cached reads. This internal API allows callers to tell the cache how much data they expect to need. This avoids excess bus traffic, and now we can even bypass the cache if the caller provides enough of a buffer. We can still fall back to reading full cache-lines in the cases where we don't know how much data we need by providing the block size as the hint. We do this for directory fetches and for file reads. This has immediate improvements for both metadata-log traversal and CTZ skip-list traversal, since these both only need to read 4-byte pointers and can always bypass the cache, allowing reuse elsewhere.
2018-08-20 19:47:52 +00:00
int err = lfs_bd_prog(lfs, &file->cache, &lfs->rcache, true,
file->block, file->off, data, diff);
if (err) {
if (err == LFS_ERR_CORRUPT) {
goto relocate;
}
file->flags |= LFS_F_ERRED;
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_write -> %d", err);
return err;
}
break;
relocate:
err = lfs_file_relocate(lfs, file);
if (err) {
file->flags |= LFS_F_ERRED;
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_write -> %d", err);
return err;
}
}
file->pos += diff;
file->off += diff;
data += diff;
nsize -= diff;
lfs_alloc_ack(lfs);
}
file->flags &= ~LFS_F_ERRED;
LFS_TRACE("lfs_file_write -> %"PRId32, size);
return size;
}
lfs_soff_t lfs_file_seek(lfs_t *lfs, lfs_file_t *file,
lfs_soff_t off, int whence) {
LFS_TRACE("lfs_file_seek(%p, %p, %"PRId32", %d)",
(void*)lfs, (void*)file, off, whence);
LFS_ASSERT(file->flags & LFS_F_OPENED);
// write out everything beforehand, may be noop if rdonly
int err = lfs_file_flush(lfs, file);
if (err) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_seek -> %d", err);
return err;
}
// find new pos
lfs_off_t npos = file->pos;
if (whence == LFS_SEEK_SET) {
npos = off;
} else if (whence == LFS_SEEK_CUR) {
npos = file->pos + off;
} else if (whence == LFS_SEEK_END) {
npos = file->ctz.size + off;
}
if (npos > lfs->file_max) {
// file position out of range
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_seek -> %d", LFS_ERR_INVAL);
return LFS_ERR_INVAL;
}
// update pos
file->pos = npos;
LFS_TRACE("lfs_file_seek -> %"PRId32, npos);
return npos;
}
int lfs_file_truncate(lfs_t *lfs, lfs_file_t *file, lfs_off_t size) {
LFS_TRACE("lfs_file_truncate(%p, %p, %"PRIu32")",
(void*)lfs, (void*)file, size);
LFS_ASSERT(file->flags & LFS_F_OPENED);
LFS_ASSERT((file->flags & 3) != LFS_O_RDONLY);
if (size > LFS_FILE_MAX) {
LFS_TRACE("lfs_file_truncate -> %d", LFS_ERR_INVAL);
return LFS_ERR_INVAL;
}
lfs_off_t pos = file->pos;
lfs_off_t oldsize = lfs_file_size(lfs, file);
if (size < oldsize) {
// need to flush since directly changing metadata
int err = lfs_file_flush(lfs, file);
if (err) {
LFS_TRACE("lfs_file_truncate -> %d", err);
return err;
}
// lookup new head in ctz skip list
err = lfs_ctz_find(lfs, NULL, &file->cache,
file->ctz.head, file->ctz.size,
size, &file->block, &file->off);
if (err) {
LFS_TRACE("lfs_file_truncate -> %d", err);
return err;
}
file->ctz.head = file->block;
file->ctz.size = size;
file->flags |= LFS_F_DIRTY | LFS_F_READING;
} else if (size > oldsize) {
// flush+seek if not already at end
if (file->pos != oldsize) {
lfs_soff_t res = lfs_file_seek(lfs, file, 0, LFS_SEEK_END);
if (res < 0) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_truncate -> %"PRId32, res);
return (int)res;
}
}
// fill with zeros
while (file->pos < size) {
lfs_ssize_t res = lfs_file_write(lfs, file, &(uint8_t){0}, 1);
if (res < 0) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_truncate -> %"PRId32, res);
return (int)res;
}
}
}
// restore pos
lfs_soff_t res = lfs_file_seek(lfs, file, pos, LFS_SEEK_SET);
if (res < 0) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_truncate -> %"PRId32, res);
return (int)res;
}
LFS_TRACE("lfs_file_truncate -> %d", 0);
return 0;
}
lfs_soff_t lfs_file_tell(lfs_t *lfs, lfs_file_t *file) {
LFS_TRACE("lfs_file_tell(%p, %p)", (void*)lfs, (void*)file);
LFS_ASSERT(file->flags & LFS_F_OPENED);
(void)lfs;
LFS_TRACE("lfs_file_tell -> %"PRId32, file->pos);
return file->pos;
}
int lfs_file_rewind(lfs_t *lfs, lfs_file_t *file) {
LFS_TRACE("lfs_file_rewind(%p, %p)", (void*)lfs, (void*)file);
lfs_soff_t res = lfs_file_seek(lfs, file, 0, LFS_SEEK_SET);
if (res < 0) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_file_rewind -> %"PRId32, res);
return (int)res;
}
LFS_TRACE("lfs_file_rewind -> %d", 0);
return 0;
}
lfs_soff_t lfs_file_size(lfs_t *lfs, lfs_file_t *file) {
LFS_TRACE("lfs_file_size(%p, %p)", (void*)lfs, (void*)file);
LFS_ASSERT(file->flags & LFS_F_OPENED);
(void)lfs;
if (file->flags & LFS_F_WRITING) {
LFS_TRACE("lfs_file_size -> %"PRId32,
lfs_max(file->pos, file->ctz.size));
return lfs_max(file->pos, file->ctz.size);
} else {
LFS_TRACE("lfs_file_size -> %"PRId32, file->ctz.size);
return file->ctz.size;
}
}
2018-01-30 19:07:37 +00:00
/// General fs operations ///
int lfs_stat(lfs_t *lfs, const char *path, struct lfs_info *info) {
LFS_TRACE("lfs_stat(%p, \"%s\", %p)", (void*)lfs, path, (void*)info);
lfs_mdir_t cwd;
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
lfs_stag_t tag = lfs_dir_find(lfs, &cwd, &path, NULL);
if (tag < 0) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_stat -> %"PRId32, tag);
return (int)tag;
}
int err = lfs_dir_getinfo(lfs, &cwd, lfs_tag_id(tag), info);
LFS_TRACE("lfs_stat -> %d", err);
return err;
}
int lfs_remove(lfs_t *lfs, const char *path) {
LFS_TRACE("lfs_remove(%p, \"%s\")", (void*)lfs, path);
// deorphan if we haven't yet, needed at most once after poweron
int err = lfs_fs_forceconsistency(lfs);
if (err) {
LFS_TRACE("lfs_remove -> %d", err);
return err;
}
lfs_mdir_t cwd;
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
lfs_stag_t tag = lfs_dir_find(lfs, &cwd, &path, NULL);
if (tag < 0 || lfs_tag_id(tag) == 0x3ff) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_remove -> %"PRId32, (tag < 0) ? tag : LFS_ERR_INVAL);
return (tag < 0) ? (int)tag : LFS_ERR_INVAL;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
struct lfs_mlist dir;
dir.next = lfs->mlist;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (lfs_tag_type3(tag) == LFS_TYPE_DIR) {
// must be empty before removal
lfs_block_t pair[2];
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_stag_t res = lfs_dir_get(lfs, &cwd, LFS_MKTAG(0x700, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_STRUCT, lfs_tag_id(tag), 8), pair);
if (res < 0) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_remove -> %"PRId32, res);
return (int)res;
}
lfs_pair_fromle32(pair);
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
err = lfs_dir_fetch(lfs, &dir.m, pair);
if (err) {
LFS_TRACE("lfs_remove -> %d", err);
return err;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
if (dir.m.count > 0 || dir.m.split) {
LFS_TRACE("lfs_remove -> %d", LFS_ERR_NOTEMPTY);
return LFS_ERR_NOTEMPTY;
}
// mark fs as orphaned
lfs_fs_preporphans(lfs, +1);
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
// I know it's crazy but yes, dir can be changed by our parent's
// commit (if predecessor is child)
dir.type = 0;
dir.id = 0;
lfs->mlist = &dir;
}
// delete the entry
err = lfs_dir_commit(lfs, &cwd, LFS_MKATTRS(
2020-04-03 01:05:55 +00:00
{LFS_MKTAG(LFS_TYPE_DELETE, lfs_tag_id(tag), 0), NULL}));
if (err) {
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->mlist = dir.next;
LFS_TRACE("lfs_remove -> %d", err);
return err;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->mlist = dir.next;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (lfs_tag_type3(tag) == LFS_TYPE_DIR) {
// fix orphan
lfs_fs_preporphans(lfs, -1);
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
err = lfs_fs_pred(lfs, dir.m.pair, &cwd);
if (err) {
LFS_TRACE("lfs_remove -> %d", err);
return err;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
err = lfs_dir_drop(lfs, &cwd, &dir.m);
if (err) {
LFS_TRACE("lfs_remove -> %d", err);
return err;
}
}
LFS_TRACE("lfs_remove -> %d", 0);
return 0;
}
int lfs_rename(lfs_t *lfs, const char *oldpath, const char *newpath) {
LFS_TRACE("lfs_rename(%p, \"%s\", \"%s\")", (void*)lfs, oldpath, newpath);
// deorphan if we haven't yet, needed at most once after poweron
int err = lfs_fs_forceconsistency(lfs);
if (err) {
LFS_TRACE("lfs_rename -> %d", err);
return err;
}
// find old entry
lfs_mdir_t oldcwd;
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
lfs_stag_t oldtag = lfs_dir_find(lfs, &oldcwd, &oldpath, NULL);
if (oldtag < 0 || lfs_tag_id(oldtag) == 0x3ff) {
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
LFS_TRACE("lfs_rename -> %"PRId32,
(oldtag < 0) ? oldtag : LFS_ERR_INVAL);
return (oldtag < 0) ? (int)oldtag : LFS_ERR_INVAL;
}
// find new entry
lfs_mdir_t newcwd;
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
uint16_t newid;
lfs_stag_t prevtag = lfs_dir_find(lfs, &newcwd, &newpath, &newid);
if ((prevtag < 0 || lfs_tag_id(prevtag) == 0x3ff) &&
!(prevtag == LFS_ERR_NOENT && newid != 0x3ff)) {
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
LFS_TRACE("lfs_rename -> %"PRId32,
(prevtag < 0) ? prevtag : LFS_ERR_INVAL);
return (prevtag < 0) ? (int)prevtag : LFS_ERR_INVAL;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
// if we're in the same pair there's a few special cases...
bool samepair = (lfs_pair_cmp(oldcwd.pair, newcwd.pair) == 0);
uint16_t newoldid = lfs_tag_id(oldtag);
struct lfs_mlist prevdir;
prevdir.next = lfs->mlist;
if (prevtag == LFS_ERR_NOENT) {
// check that name fits
lfs_size_t nlen = strlen(newpath);
if (nlen > lfs->name_max) {
LFS_TRACE("lfs_rename -> %d", LFS_ERR_NAMETOOLONG);
return LFS_ERR_NAMETOOLONG;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
// there is a small chance we are being renamed in the same
// directory/ to an id less than our old id, the global update
// to handle this is a bit messy
if (samepair && newid <= newoldid) {
newoldid += 1;
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
} else if (lfs_tag_type3(prevtag) != lfs_tag_type3(oldtag)) {
LFS_TRACE("lfs_rename -> %d", LFS_ERR_ISDIR);
return LFS_ERR_ISDIR;
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
} else if (samepair && newid == newoldid) {
// we're renaming to ourselves??
LFS_TRACE("lfs_rename -> %d", 0);
return 0;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
} else if (lfs_tag_type3(prevtag) == LFS_TYPE_DIR) {
// must be empty before removal
lfs_block_t prevpair[2];
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_stag_t res = lfs_dir_get(lfs, &newcwd, LFS_MKTAG(0x700, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_STRUCT, newid, 8), prevpair);
if (res < 0) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_rename -> %"PRId32, res);
return (int)res;
}
lfs_pair_fromle32(prevpair);
// must be empty before removal
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
err = lfs_dir_fetch(lfs, &prevdir.m, prevpair);
if (err) {
LFS_TRACE("lfs_rename -> %d", err);
return err;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
if (prevdir.m.count > 0 || prevdir.m.split) {
LFS_TRACE("lfs_rename -> %d", LFS_ERR_NOTEMPTY);
return LFS_ERR_NOTEMPTY;
}
// mark fs as orphaned
lfs_fs_preporphans(lfs, +1);
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
// I know it's crazy but yes, dir can be changed by our parent's
// commit (if predecessor is child)
prevdir.type = 0;
prevdir.id = 0;
lfs->mlist = &prevdir;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
if (!samepair) {
lfs_fs_prepmove(lfs, newoldid, oldcwd.pair);
Switched to traversal-based compact logic This simplifies some of the interactions between reading and writing inside the commit logic. Unfortunately this change didn't decrease code size as was initially hoped, but it does offer a nice runtime improvement for the common case and should improve debugability. Before, the compact logic required three iterations: 1. iterate through all the ids in a directory 2. scan attrs bound to each id in the directory 3. lookup attrs in the in-progress commit The code for this, while terse and complicated, did have some nice side effect. The directory lookup logic could be reused for looking up in the in-progress commit, and iterating through each id allows us to know exactly how many ids we can fit during a compact. Giving us a O(n^3) compact and O(n^3) split. However, this was complicated by a few things. First, this compact logic doesn't handle deleted attrs. To work around this, I added a marker for the last commit (or first based on your perspective) which would indicate if a delete should be copied over. This worked but was a bit hacky and meant deletes weren't cleaned up on the first compact. Second, we can't actually figure out our compacted size until we compact. This worked ok except for the fact that splits will always have a failed compact. This means we waste an erase which could very expensive. It is possible to work around this by keeping our work, but with only a single prog cache this was very tricky and also somewhat hacky. Third, the interactions between reading and writing to the same block were tricky and error-prone. They should mostly be working now, but seeing this requirement go away does not make me sad. The new compact logic fixes these issues by moving the complexity into a general-purpose lfs_dir_traverse function which has much fewer side effects on the system. We can even use it for dry-runs to precompute our estimated size. How does it work? 1. iterate through all attr in the directory 2. for each attr, scan the rest of the directory to figure out the attr's history, this will change the attr based on dir modifications and may even exit early if the attr was deleted. The end result is a traversal function that gives us the resulting state of each attr in only O(n^2). To make this complete, we allow a bounded recursion into mcu-side move attrs, although this ends up being O(n^3) unlike moves in the original solution (however moves are less common. This gives us a nice traversal function we can use for compacts and moves, handles deletes, and is overall simpler to reason about. Two minor hiccups: 1. We need to handle create attrs specially, since this algorithm doesn't care or id order, which can cause problems since attr insertion are order sensitive. We can fix this by simply looking up each create (since there is only one per file) in order at the beginning of our traversal. This is oddly complimentary to the move logic, which also handles create attrs separately. 2. We no longer know exactly how many ids we can write to a dir during splits. However, since we can do a dry-run traversal, we can use that to simply binary search for the mid-point. This gives us a O(n^2) compact and O(n^2 log n) split, which is a nice minor improvement (remember n is bounded by block size).
2018-12-27 02:27:34 +00:00
}
// move over all attributes
err = lfs_dir_commit(lfs, &newcwd, LFS_MKATTRS(
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
{LFS_MKTAG_IF(prevtag != LFS_ERR_NOENT,
2020-04-03 01:05:55 +00:00
LFS_TYPE_DELETE, newid, 0), NULL},
{LFS_MKTAG(LFS_TYPE_CREATE, newid, 0), NULL},
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
{LFS_MKTAG(lfs_tag_type3(oldtag), newid, strlen(newpath)), newpath},
{LFS_MKTAG(LFS_FROM_MOVE, newid, lfs_tag_id(oldtag)), &oldcwd},
{LFS_MKTAG_IF(samepair,
2020-04-03 01:05:55 +00:00
LFS_TYPE_DELETE, newoldid, 0), NULL}));
if (err) {
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->mlist = prevdir.next;
LFS_TRACE("lfs_rename -> %d", err);
return err;
}
// let commit clean up after move (if we're different! otherwise move
// logic already fixed it for us)
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
if (!samepair && lfs_gstate_hasmove(&lfs->gstate)) {
// prep gstate and delete move id
lfs_fs_prepmove(lfs, 0x3ff, NULL);
err = lfs_dir_commit(lfs, &oldcwd, LFS_MKATTRS(
2020-04-03 01:05:55 +00:00
{LFS_MKTAG(LFS_TYPE_DELETE, lfs_tag_id(oldtag), 0), NULL}));
if (err) {
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->mlist = prevdir.next;
LFS_TRACE("lfs_rename -> %d", err);
return err;
}
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->mlist = prevdir.next;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (prevtag != LFS_ERR_NOENT && lfs_tag_type3(prevtag) == LFS_TYPE_DIR) {
// fix orphan
lfs_fs_preporphans(lfs, -1);
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
err = lfs_fs_pred(lfs, prevdir.m.pair, &newcwd);
if (err) {
LFS_TRACE("lfs_rename -> %d", err);
return err;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
err = lfs_dir_drop(lfs, &newcwd, &prevdir.m);
if (err) {
LFS_TRACE("lfs_rename -> %d", err);
return err;
}
}
LFS_TRACE("lfs_rename -> %d", 0);
return 0;
}
lfs_ssize_t lfs_getattr(lfs_t *lfs, const char *path,
uint8_t type, void *buffer, lfs_size_t size) {
LFS_TRACE("lfs_getattr(%p, \"%s\", %"PRIu8", %p, %"PRIu32")",
(void*)lfs, path, type, buffer, size);
lfs_mdir_t cwd;
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
lfs_stag_t tag = lfs_dir_find(lfs, &cwd, &path, NULL);
if (tag < 0) {
LFS_TRACE("lfs_getattr -> %"PRId32, tag);
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
return tag;
}
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
uint16_t id = lfs_tag_id(tag);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (id == 0x3ff) {
// special case for root
id = 0;
int err = lfs_dir_fetch(lfs, &cwd, lfs->root);
if (err) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_getattr -> %d", err);
return err;
}
}
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
tag = lfs_dir_get(lfs, &cwd, LFS_MKTAG(0x7ff, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_USERATTR + type,
id, lfs_min(size, lfs->attr_max)),
buffer);
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
if (tag < 0) {
if (tag == LFS_ERR_NOENT) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_getattr -> %d", LFS_ERR_NOATTR);
return LFS_ERR_NOATTR;
}
LFS_TRACE("lfs_getattr -> %"PRId32, tag);
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
return tag;
}
size = lfs_tag_size(tag);
LFS_TRACE("lfs_getattr -> %"PRId32, size);
return size;
}
static int lfs_commitattr(lfs_t *lfs, const char *path,
uint8_t type, const void *buffer, lfs_size_t size) {
lfs_mdir_t cwd;
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
lfs_stag_t tag = lfs_dir_find(lfs, &cwd, &path, NULL);
if (tag < 0) {
return tag;
}
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
uint16_t id = lfs_tag_id(tag);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (id == 0x3ff) {
// special case for root
id = 0;
int err = lfs_dir_fetch(lfs, &cwd, lfs->root);
if (err) {
return err;
}
}
return lfs_dir_commit(lfs, &cwd, LFS_MKATTRS(
{LFS_MKTAG(LFS_TYPE_USERATTR + type, id, size), buffer}));
}
int lfs_setattr(lfs_t *lfs, const char *path,
uint8_t type, const void *buffer, lfs_size_t size) {
LFS_TRACE("lfs_setattr(%p, \"%s\", %"PRIu8", %p, %"PRIu32")",
(void*)lfs, path, type, buffer, size);
if (size > lfs->attr_max) {
LFS_TRACE("lfs_setattr -> %d", LFS_ERR_NOSPC);
return LFS_ERR_NOSPC;
}
int err = lfs_commitattr(lfs, path, type, buffer, size);
LFS_TRACE("lfs_setattr -> %d", err);
return err;
}
int lfs_removeattr(lfs_t *lfs, const char *path, uint8_t type) {
LFS_TRACE("lfs_removeattr(%p, \"%s\", %"PRIu8")", (void*)lfs, path, type);
int err = lfs_commitattr(lfs, path, type, NULL, 0x3ff);
LFS_TRACE("lfs_removeattr -> %d", err);
return err;
}
/// Filesystem operations ///
static int lfs_init(lfs_t *lfs, const struct lfs_config *cfg) {
lfs->cfg = cfg;
int err = 0;
// validate that the lfs-cfg sizes were initiated properly before
// performing any arithmetic logics with them
LFS_ASSERT(lfs->cfg->read_size != 0);
LFS_ASSERT(lfs->cfg->prog_size != 0);
LFS_ASSERT(lfs->cfg->cache_size != 0);
// check that block size is a multiple of cache size is a multiple
// of prog and read sizes
LFS_ASSERT(lfs->cfg->cache_size % lfs->cfg->read_size == 0);
LFS_ASSERT(lfs->cfg->cache_size % lfs->cfg->prog_size == 0);
LFS_ASSERT(lfs->cfg->block_size % lfs->cfg->cache_size == 0);
// check that the block size is large enough to fit ctz pointers
LFS_ASSERT(4*lfs_npw2(0xffffffff / (lfs->cfg->block_size-2*4))
<= lfs->cfg->block_size);
// block_cycles = 0 is no longer supported.
//
// block_cycles is the number of erase cycles before littlefs evicts
// metadata logs as a part of wear leveling. Suggested values are in the
// range of 100-1000, or set block_cycles to -1 to disable block-level
// wear-leveling.
LFS_ASSERT(lfs->cfg->block_cycles != 0);
// setup read cache
if (lfs->cfg->read_buffer) {
lfs->rcache.buffer = lfs->cfg->read_buffer;
} else {
lfs->rcache.buffer = lfs_malloc(lfs->cfg->cache_size);
if (!lfs->rcache.buffer) {
err = LFS_ERR_NOMEM;
goto cleanup;
}
}
// setup program cache
if (lfs->cfg->prog_buffer) {
lfs->pcache.buffer = lfs->cfg->prog_buffer;
} else {
lfs->pcache.buffer = lfs_malloc(lfs->cfg->cache_size);
if (!lfs->pcache.buffer) {
err = LFS_ERR_NOMEM;
goto cleanup;
}
}
// zero to avoid information leaks
lfs_cache_zero(lfs, &lfs->rcache);
lfs_cache_zero(lfs, &lfs->pcache);
// setup lookahead, must be multiple of 64-bits, 32-bit aligned
LFS_ASSERT(lfs->cfg->lookahead_size > 0);
LFS_ASSERT(lfs->cfg->lookahead_size % 8 == 0 &&
(uintptr_t)lfs->cfg->lookahead_buffer % 4 == 0);
if (lfs->cfg->lookahead_buffer) {
lfs->free.buffer = lfs->cfg->lookahead_buffer;
} else {
lfs->free.buffer = lfs_malloc(lfs->cfg->lookahead_size);
if (!lfs->free.buffer) {
err = LFS_ERR_NOMEM;
goto cleanup;
}
}
Added disk-backed limits on the name/attrs/inline sizes Being a portable, microcontroller-scale embedded filesystem, littlefs is presented with a relatively unique challenge. The amount of RAM available is on completely different scales from machine to machine, and what is normally a reasonable RAM assumption may break completely on an embedded system. A great example of this is file names. On almost every PC these days, the limit for a file name is 255 bytes. It's a very convenient limit for a number of reasons. However, on microcontrollers, allocating 255 bytes of RAM to do a file search can be unreasonable. The simplest solution (and one that has existing in littlefs for a while), is to let this limit be redefined to a smaller value on devices that need to save RAM. However, this presents an interesting portability issue. If these devices are plugged into a PC with relatively infinite RAM, nothing stops the PC from writing files with full 255-byte file names, which can't be read on the small device. One solution here is to store this limit on the superblock during format time. When mounting a disk, the filesystem implementation is responsible for checking this limit in the superblock. If it's larger than what can be read, raise an error. If it's smaller, respect the limit on the superblock and raise an error if the user attempts to exceed it. In this commit, this strategy is adopted for file names, inline files, and the size of all attributes, since these could impact the memory consumption of the filesystem. (Recording the attribute's limit is iffy, but is the only other arbitrary limit and could be used for disabling support of custom attributes). Note! This changes makes it very important to configure littlefs correctly at format time. If littlefs is formatted on a PC without changing the limits appropriately, it will be rejected by a smaller device.
2018-04-01 20:36:29 +00:00
// check that the size limits are sane
LFS_ASSERT(lfs->cfg->name_max <= LFS_NAME_MAX);
lfs->name_max = lfs->cfg->name_max;
if (!lfs->name_max) {
lfs->name_max = LFS_NAME_MAX;
}
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
LFS_ASSERT(lfs->cfg->file_max <= LFS_FILE_MAX);
lfs->file_max = lfs->cfg->file_max;
if (!lfs->file_max) {
lfs->file_max = LFS_FILE_MAX;
Added disk-backed limits on the name/attrs/inline sizes Being a portable, microcontroller-scale embedded filesystem, littlefs is presented with a relatively unique challenge. The amount of RAM available is on completely different scales from machine to machine, and what is normally a reasonable RAM assumption may break completely on an embedded system. A great example of this is file names. On almost every PC these days, the limit for a file name is 255 bytes. It's a very convenient limit for a number of reasons. However, on microcontrollers, allocating 255 bytes of RAM to do a file search can be unreasonable. The simplest solution (and one that has existing in littlefs for a while), is to let this limit be redefined to a smaller value on devices that need to save RAM. However, this presents an interesting portability issue. If these devices are plugged into a PC with relatively infinite RAM, nothing stops the PC from writing files with full 255-byte file names, which can't be read on the small device. One solution here is to store this limit on the superblock during format time. When mounting a disk, the filesystem implementation is responsible for checking this limit in the superblock. If it's larger than what can be read, raise an error. If it's smaller, respect the limit on the superblock and raise an error if the user attempts to exceed it. In this commit, this strategy is adopted for file names, inline files, and the size of all attributes, since these could impact the memory consumption of the filesystem. (Recording the attribute's limit is iffy, but is the only other arbitrary limit and could be used for disabling support of custom attributes). Note! This changes makes it very important to configure littlefs correctly at format time. If littlefs is formatted on a PC without changing the limits appropriately, it will be rejected by a smaller device.
2018-04-01 20:36:29 +00:00
}
LFS_ASSERT(lfs->cfg->attr_max <= LFS_ATTR_MAX);
lfs->attr_max = lfs->cfg->attr_max;
if (!lfs->attr_max) {
lfs->attr_max = LFS_ATTR_MAX;
Added disk-backed limits on the name/attrs/inline sizes Being a portable, microcontroller-scale embedded filesystem, littlefs is presented with a relatively unique challenge. The amount of RAM available is on completely different scales from machine to machine, and what is normally a reasonable RAM assumption may break completely on an embedded system. A great example of this is file names. On almost every PC these days, the limit for a file name is 255 bytes. It's a very convenient limit for a number of reasons. However, on microcontrollers, allocating 255 bytes of RAM to do a file search can be unreasonable. The simplest solution (and one that has existing in littlefs for a while), is to let this limit be redefined to a smaller value on devices that need to save RAM. However, this presents an interesting portability issue. If these devices are plugged into a PC with relatively infinite RAM, nothing stops the PC from writing files with full 255-byte file names, which can't be read on the small device. One solution here is to store this limit on the superblock during format time. When mounting a disk, the filesystem implementation is responsible for checking this limit in the superblock. If it's larger than what can be read, raise an error. If it's smaller, respect the limit on the superblock and raise an error if the user attempts to exceed it. In this commit, this strategy is adopted for file names, inline files, and the size of all attributes, since these could impact the memory consumption of the filesystem. (Recording the attribute's limit is iffy, but is the only other arbitrary limit and could be used for disabling support of custom attributes). Note! This changes makes it very important to configure littlefs correctly at format time. If littlefs is formatted on a PC without changing the limits appropriately, it will be rejected by a smaller device.
2018-04-01 20:36:29 +00:00
}
// setup default state
lfs->root[0] = LFS_BLOCK_NULL;
lfs->root[1] = LFS_BLOCK_NULL;
lfs->mlist = NULL;
lfs->seed = 0;
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->gdisk = (lfs_gstate_t){0};
lfs->gstate = (lfs_gstate_t){0};
lfs->gdelta = (lfs_gstate_t){0};
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
#ifdef LFS_MIGRATE
lfs->lfs1 = NULL;
#endif
return 0;
cleanup:
lfs_deinit(lfs);
return err;
}
static int lfs_deinit(lfs_t *lfs) {
// free allocated memory
if (!lfs->cfg->read_buffer) {
lfs_free(lfs->rcache.buffer);
}
if (!lfs->cfg->prog_buffer) {
lfs_free(lfs->pcache.buffer);
}
2017-04-29 17:50:23 +00:00
if (!lfs->cfg->lookahead_buffer) {
lfs_free(lfs->free.buffer);
2017-04-29 17:50:23 +00:00
}
return 0;
}
int lfs_format(lfs_t *lfs, const struct lfs_config *cfg) {
LFS_TRACE("lfs_format(%p, %p {.context=%p, "
".read=%p, .prog=%p, .erase=%p, .sync=%p, "
".read_size=%"PRIu32", .prog_size=%"PRIu32", "
".block_size=%"PRIu32", .block_count=%"PRIu32", "
".block_cycles=%"PRIu32", .cache_size=%"PRIu32", "
".lookahead_size=%"PRIu32", .read_buffer=%p, "
".prog_buffer=%p, .lookahead_buffer=%p, "
".name_max=%"PRIu32", .file_max=%"PRIu32", "
".attr_max=%"PRIu32"})",
(void*)lfs, (void*)cfg, cfg->context,
(void*)(uintptr_t)cfg->read, (void*)(uintptr_t)cfg->prog,
(void*)(uintptr_t)cfg->erase, (void*)(uintptr_t)cfg->sync,
cfg->read_size, cfg->prog_size, cfg->block_size, cfg->block_count,
cfg->block_cycles, cfg->cache_size, cfg->lookahead_size,
cfg->read_buffer, cfg->prog_buffer, cfg->lookahead_buffer,
cfg->name_max, cfg->file_max, cfg->attr_max);
int err = 0;
{
err = lfs_init(lfs, cfg);
if (err) {
LFS_TRACE("lfs_format -> %d", err);
return err;
}
// create free lookahead
memset(lfs->free.buffer, 0, lfs->cfg->lookahead_size);
lfs->free.off = 0;
lfs->free.size = lfs_min(8*lfs->cfg->lookahead_size,
lfs->cfg->block_count);
lfs->free.i = 0;
lfs_alloc_ack(lfs);
// create root dir
lfs_mdir_t root;
err = lfs_dir_alloc(lfs, &root);
if (err) {
goto cleanup;
}
// write one superblock
lfs_superblock_t superblock = {
.version = LFS_DISK_VERSION,
.block_size = lfs->cfg->block_size,
.block_count = lfs->cfg->block_count,
.name_max = lfs->name_max,
.file_max = lfs->file_max,
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
.attr_max = lfs->attr_max,
};
lfs_superblock_tole32(&superblock);
err = lfs_dir_commit(lfs, &root, LFS_MKATTRS(
2020-04-03 01:05:55 +00:00
{LFS_MKTAG(LFS_TYPE_CREATE, 0, 0), NULL},
{LFS_MKTAG(LFS_TYPE_SUPERBLOCK, 0, 8), "littlefs"},
{LFS_MKTAG(LFS_TYPE_INLINESTRUCT, 0, sizeof(superblock)),
&superblock}));
if (err) {
goto cleanup;
}
// sanity check that fetch works
err = lfs_dir_fetch(lfs, &root, (const lfs_block_t[2]){0, 1});
if (err) {
goto cleanup;
}
// force compaction to prevent accidentally mounting any
// older version of littlefs that may live on disk
root.erased = false;
err = lfs_dir_commit(lfs, &root, NULL, 0);
if (err) {
goto cleanup;
}
}
cleanup:
lfs_deinit(lfs);
LFS_TRACE("lfs_format -> %d", err);
return err;
}
int lfs_mount(lfs_t *lfs, const struct lfs_config *cfg) {
LFS_TRACE("lfs_mount(%p, %p {.context=%p, "
".read=%p, .prog=%p, .erase=%p, .sync=%p, "
".read_size=%"PRIu32", .prog_size=%"PRIu32", "
".block_size=%"PRIu32", .block_count=%"PRIu32", "
".block_cycles=%"PRIu32", .cache_size=%"PRIu32", "
".lookahead_size=%"PRIu32", .read_buffer=%p, "
".prog_buffer=%p, .lookahead_buffer=%p, "
".name_max=%"PRIu32", .file_max=%"PRIu32", "
".attr_max=%"PRIu32"})",
(void*)lfs, (void*)cfg, cfg->context,
(void*)(uintptr_t)cfg->read, (void*)(uintptr_t)cfg->prog,
(void*)(uintptr_t)cfg->erase, (void*)(uintptr_t)cfg->sync,
cfg->read_size, cfg->prog_size, cfg->block_size, cfg->block_count,
cfg->block_cycles, cfg->cache_size, cfg->lookahead_size,
cfg->read_buffer, cfg->prog_buffer, cfg->lookahead_buffer,
cfg->name_max, cfg->file_max, cfg->attr_max);
int err = lfs_init(lfs, cfg);
if (err) {
LFS_TRACE("lfs_mount -> %d", err);
return err;
}
// scan directory blocks for superblock and any global updates
lfs_mdir_t dir = {.tail = {0, 1}};
lfs_block_t cycle = 0;
while (!lfs_pair_isnull(dir.tail)) {
if (cycle >= lfs->cfg->block_count/2) {
// loop detected
err = LFS_ERR_CORRUPT;
goto cleanup;
}
cycle += 1;
// fetch next block in tail list
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_stag_t tag = lfs_dir_fetchmatch(lfs, &dir, dir.tail,
LFS_MKTAG(0x7ff, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_SUPERBLOCK, 0, 8),
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
NULL,
lfs_dir_find_match, &(struct lfs_dir_find_match){
lfs, "littlefs", 8});
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
if (tag < 0) {
err = tag;
goto cleanup;
}
// has superblock?
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
if (tag && !lfs_tag_isdelete(tag)) {
// update root
lfs->root[0] = dir.pair[0];
lfs->root[1] = dir.pair[1];
// grab superblock
lfs_superblock_t superblock;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
tag = lfs_dir_get(lfs, &dir, LFS_MKTAG(0x7ff, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_INLINESTRUCT, 0, sizeof(superblock)),
&superblock);
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
if (tag < 0) {
err = tag;
goto cleanup;
}
lfs_superblock_fromle32(&superblock);
// check version
uint16_t major_version = (0xffff & (superblock.version >> 16));
uint16_t minor_version = (0xffff & (superblock.version >> 0));
if ((major_version != LFS_DISK_VERSION_MAJOR ||
minor_version > LFS_DISK_VERSION_MINOR)) {
LFS_ERROR("Invalid version v%"PRIu16".%"PRIu16,
major_version, minor_version);
err = LFS_ERR_INVAL;
goto cleanup;
}
Added root entry and expanding superblocks Expanding superblocks has been on my wishlist for a while. The basic idea is that instead of maintaining a fixed offset blocks {0, 1} to the the root directory (1 pointer), we maintain a dynamically sized linked-list of superblocks that point to the actual root. If the number of writes to the root exceeds some value, we increase the size of the superblock linked-list. This can leverage existing metadata-pair operations. The revision count for metadata-pairs provides some knowledge on how much wear we've put on the superblock, and the threaded linked-list can also be reused for this purpose. This means superblock expansion is both optional and cheap to implement. Expanding superblocks helps both extremely small and extremely large filesystem (extreme being relative of course). On the small end, we can actually collapse the superblock into the root directory and drop the hard requirement of 4-blocks for the superblock. On the large end, our superblock will now last longer than the rest of the filesystem. Each time we expand, the number of cycles until the superblock dies is increased by a power. Before we were stuck with this layout: level cycles limit layout 1 E^2 390 MiB s0 -> root Now we expand every time a fixed offset is exceeded: level cycles limit layout 0 E 4 KiB s0+root 1 E^2 390 MiB s0 -> root 2 E^3 37 TiB s0 -> s1 -> root 3 E^4 3.6 EiB s0 -> s1 -> s2 -> root ... Where the cycles are the number of cycles before death, and the limit is the worst-case size a filesystem where early superblock death becomes a concern (all writes to root using this formula: E^|s| = E*B, E = erase cycles = 100000, B = block count, assuming 4096 byte blocks). Note we can also store copies of the superblock entry on the expanded superblocks. This may help filesystem recover tools in the future.
2018-08-06 18:30:51 +00:00
// check superblock configuration
if (superblock.name_max) {
if (superblock.name_max > lfs->name_max) {
LFS_ERROR("Unsupported name_max (%"PRIu32" > %"PRIu32")",
superblock.name_max, lfs->name_max);
err = LFS_ERR_INVAL;
goto cleanup;
}
Added disk-backed limits on the name/attrs/inline sizes Being a portable, microcontroller-scale embedded filesystem, littlefs is presented with a relatively unique challenge. The amount of RAM available is on completely different scales from machine to machine, and what is normally a reasonable RAM assumption may break completely on an embedded system. A great example of this is file names. On almost every PC these days, the limit for a file name is 255 bytes. It's a very convenient limit for a number of reasons. However, on microcontrollers, allocating 255 bytes of RAM to do a file search can be unreasonable. The simplest solution (and one that has existing in littlefs for a while), is to let this limit be redefined to a smaller value on devices that need to save RAM. However, this presents an interesting portability issue. If these devices are plugged into a PC with relatively infinite RAM, nothing stops the PC from writing files with full 255-byte file names, which can't be read on the small device. One solution here is to store this limit on the superblock during format time. When mounting a disk, the filesystem implementation is responsible for checking this limit in the superblock. If it's larger than what can be read, raise an error. If it's smaller, respect the limit on the superblock and raise an error if the user attempts to exceed it. In this commit, this strategy is adopted for file names, inline files, and the size of all attributes, since these could impact the memory consumption of the filesystem. (Recording the attribute's limit is iffy, but is the only other arbitrary limit and could be used for disabling support of custom attributes). Note! This changes makes it very important to configure littlefs correctly at format time. If littlefs is formatted on a PC without changing the limits appropriately, it will be rejected by a smaller device.
2018-04-01 20:36:29 +00:00
lfs->name_max = superblock.name_max;
}
Added disk-backed limits on the name/attrs/inline sizes Being a portable, microcontroller-scale embedded filesystem, littlefs is presented with a relatively unique challenge. The amount of RAM available is on completely different scales from machine to machine, and what is normally a reasonable RAM assumption may break completely on an embedded system. A great example of this is file names. On almost every PC these days, the limit for a file name is 255 bytes. It's a very convenient limit for a number of reasons. However, on microcontrollers, allocating 255 bytes of RAM to do a file search can be unreasonable. The simplest solution (and one that has existing in littlefs for a while), is to let this limit be redefined to a smaller value on devices that need to save RAM. However, this presents an interesting portability issue. If these devices are plugged into a PC with relatively infinite RAM, nothing stops the PC from writing files with full 255-byte file names, which can't be read on the small device. One solution here is to store this limit on the superblock during format time. When mounting a disk, the filesystem implementation is responsible for checking this limit in the superblock. If it's larger than what can be read, raise an error. If it's smaller, respect the limit on the superblock and raise an error if the user attempts to exceed it. In this commit, this strategy is adopted for file names, inline files, and the size of all attributes, since these could impact the memory consumption of the filesystem. (Recording the attribute's limit is iffy, but is the only other arbitrary limit and could be used for disabling support of custom attributes). Note! This changes makes it very important to configure littlefs correctly at format time. If littlefs is formatted on a PC without changing the limits appropriately, it will be rejected by a smaller device.
2018-04-01 20:36:29 +00:00
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
if (superblock.file_max) {
if (superblock.file_max > lfs->file_max) {
LFS_ERROR("Unsupported file_max (%"PRIu32" > %"PRIu32")",
superblock.file_max, lfs->file_max);
err = LFS_ERR_INVAL;
goto cleanup;
}
Added disk-backed limits on the name/attrs/inline sizes Being a portable, microcontroller-scale embedded filesystem, littlefs is presented with a relatively unique challenge. The amount of RAM available is on completely different scales from machine to machine, and what is normally a reasonable RAM assumption may break completely on an embedded system. A great example of this is file names. On almost every PC these days, the limit for a file name is 255 bytes. It's a very convenient limit for a number of reasons. However, on microcontrollers, allocating 255 bytes of RAM to do a file search can be unreasonable. The simplest solution (and one that has existing in littlefs for a while), is to let this limit be redefined to a smaller value on devices that need to save RAM. However, this presents an interesting portability issue. If these devices are plugged into a PC with relatively infinite RAM, nothing stops the PC from writing files with full 255-byte file names, which can't be read on the small device. One solution here is to store this limit on the superblock during format time. When mounting a disk, the filesystem implementation is responsible for checking this limit in the superblock. If it's larger than what can be read, raise an error. If it's smaller, respect the limit on the superblock and raise an error if the user attempts to exceed it. In this commit, this strategy is adopted for file names, inline files, and the size of all attributes, since these could impact the memory consumption of the filesystem. (Recording the attribute's limit is iffy, but is the only other arbitrary limit and could be used for disabling support of custom attributes). Note! This changes makes it very important to configure littlefs correctly at format time. If littlefs is formatted on a PC without changing the limits appropriately, it will be rejected by a smaller device.
2018-04-01 20:36:29 +00:00
Added support for RAM-independent reading of inline files One of the new features in LittleFS is "inline files", which is the inlining of small files in the parent directory. Inline files have a big limitation in that they no longer have a dedicated scratch area to write out data before commit-time. This is fine as long as inline files are small enough to fit in RAM. However, this dependency on RAM creates an uncomfortable situation for portability, with larger devices able to create larger files than smaller devices. This problem is especially important on embedded systems, where RAM is at a premium. Recently, I realized this RAM requirement is necessary for _writing_ inline files, but not for _reading_ inline files. By allowing fetches of specific slices of inline files it's possible to read inline files without the RAM to back it. However however, this creates a conflict with COW semantics. Normally, when a file is open twice, it is referenced by a COW data structure that can be updated independently. Inlines files that fit in RAM also allows independent updates, but the moment an inline file can't fit in RAM, any updates to that directory block could corrupt open files referencing the inline file. The fact that this behaviour is only inconsistent for inline files created on a different device with more RAM creates a potential nightmare for user experience. Fortunately, there is a workaround for this. When we are commiting to a directory, any open files needs to live in a COW structure or in RAM. While we could move large inline files to COW structures at open time, this would break the separation of read/write operations and could lead to write errors at read time (ie ENOSPC). But since this is only an issue for commits, we can defer the move to a COW structure to any commits to that directory. This means when committing to a directory we need to find any _open_ large inline files and evict them from the directory, leaving the file with a new COW structure even if it was opened read only. While complicated, the end result is inline files that can use the MAX RAM that is available, but can be read with MIN RAM, even with multiple write operations happening to the underlying directory block. This prevents users from needing to learn the idiosyncrasies of inline files to use the filesystem portably.
2019-01-13 17:08:42 +00:00
lfs->file_max = superblock.file_max;
}
if (superblock.attr_max) {
if (superblock.attr_max > lfs->attr_max) {
LFS_ERROR("Unsupported attr_max (%"PRIu32" > %"PRIu32")",
superblock.attr_max, lfs->attr_max);
err = LFS_ERR_INVAL;
goto cleanup;
}
lfs->attr_max = superblock.attr_max;
}
}
// has gstate?
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
err = lfs_dir_getgstate(lfs, &dir, &lfs->gstate);
if (err) {
goto cleanup;
}
}
// found superblock?
if (lfs_pair_isnull(lfs->root)) {
err = LFS_ERR_INVAL;
goto cleanup;
}
// update littlefs with gstate
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
if (!lfs_gstate_iszero(&lfs->gstate)) {
LFS_DEBUG("Found pending gstate 0x%08"PRIx32"%08"PRIx32"%08"PRIx32,
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->gstate.tag,
lfs->gstate.pair[0],
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->gstate.pair[1]);
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->gstate.tag += !lfs_tag_isvalid(lfs->gstate.tag);
lfs->gdisk = lfs->gstate;
// setup free lookahead, to distribute allocations uniformly across
// boots, we start the allocator at a random location
lfs->free.off = lfs->seed % lfs->cfg->block_count;
lfs_alloc_drop(lfs);
LFS_TRACE("lfs_mount -> %d", 0);
return 0;
cleanup:
lfs_unmount(lfs);
LFS_TRACE("lfs_mount -> %d", err);
return err;
}
int lfs_unmount(lfs_t *lfs) {
LFS_TRACE("lfs_unmount(%p)", (void*)lfs);
int err = lfs_deinit(lfs);
LFS_TRACE("lfs_unmount -> %d", err);
return err;
}
/// Filesystem filesystem operations ///
int lfs_fs_traverseraw(lfs_t *lfs,
int (*cb)(void *data, lfs_block_t block), void *data,
bool includeorphans) {
// iterate over metadata pairs
lfs_mdir_t dir = {.tail = {0, 1}};
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
#ifdef LFS_MIGRATE
// also consider v1 blocks during migration
if (lfs->lfs1) {
int err = lfs1_traverse(lfs, cb, data);
if (err) {
return err;
}
dir.tail[0] = lfs->root[0];
dir.tail[1] = lfs->root[1];
}
#endif
lfs_block_t cycle = 0;
while (!lfs_pair_isnull(dir.tail)) {
if (cycle >= lfs->cfg->block_count/2) {
// loop detected
return LFS_ERR_CORRUPT;
}
cycle += 1;
for (int i = 0; i < 2; i++) {
int err = cb(data, dir.tail[i]);
if (err) {
return err;
}
}
// iterate through ids in directory
int err = lfs_dir_fetch(lfs, &dir, dir.tail);
if (err) {
return err;
}
for (uint16_t id = 0; id < dir.count; id++) {
struct lfs_ctz ctz;
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_stag_t tag = lfs_dir_get(lfs, &dir, LFS_MKTAG(0x700, 0x3ff, 0),
LFS_MKTAG(LFS_TYPE_STRUCT, id, sizeof(ctz)), &ctz);
if (tag < 0) {
if (tag == LFS_ERR_NOENT) {
continue;
}
return tag;
}
lfs_ctz_fromle32(&ctz);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (lfs_tag_type3(tag) == LFS_TYPE_CTZSTRUCT) {
err = lfs_ctz_traverse(lfs, NULL, &lfs->rcache,
ctz.head, ctz.size, cb, data);
if (err) {
return err;
}
} else if (includeorphans &&
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
lfs_tag_type3(tag) == LFS_TYPE_DIRSTRUCT) {
for (int i = 0; i < 2; i++) {
err = cb(data, (&ctz.head)[i]);
if (err) {
return err;
}
}
}
}
}
// iterate over any open files
for (lfs_file_t *f = (lfs_file_t*)lfs->mlist; f; f = f->next) {
if (f->type != LFS_TYPE_REG) {
continue;
}
if ((f->flags & LFS_F_DIRTY) && !(f->flags & LFS_F_INLINE)) {
int err = lfs_ctz_traverse(lfs, &f->cache, &lfs->rcache,
f->ctz.head, f->ctz.size, cb, data);
if (err) {
return err;
}
}
if ((f->flags & LFS_F_WRITING) && !(f->flags & LFS_F_INLINE)) {
int err = lfs_ctz_traverse(lfs, &f->cache, &lfs->rcache,
f->block, f->pos, cb, data);
if (err) {
return err;
}
}
}
return 0;
}
int lfs_fs_traverse(lfs_t *lfs,
int (*cb)(void *data, lfs_block_t block), void *data) {
LFS_TRACE("lfs_fs_traverse(%p, %p, %p)",
(void*)lfs, (void*)(uintptr_t)cb, data);
int err = lfs_fs_traverseraw(lfs, cb, data, true);
LFS_TRACE("lfs_fs_traverse -> %d", 0);
return err;
}
static int lfs_fs_pred(lfs_t *lfs,
const lfs_block_t pair[2], lfs_mdir_t *pdir) {
// iterate over all directory directory entries
pdir->tail[0] = 0;
pdir->tail[1] = 1;
lfs_block_t cycle = 0;
while (!lfs_pair_isnull(pdir->tail)) {
if (cycle >= lfs->cfg->block_count/2) {
// loop detected
return LFS_ERR_CORRUPT;
}
cycle += 1;
if (lfs_pair_cmp(pdir->tail, pair) == 0) {
return 0;
}
int err = lfs_dir_fetch(lfs, pdir, pdir->tail);
if (err) {
return err;
}
}
return LFS_ERR_NOENT;
}
Introduced xored-globals logic to fix fundamental problem with moves This was a big roadblock for a while: with the new feature of inlined files, the existing move logic was fundamentally flawed. To pull off atomic moves between two different metadata-pairs, littlefs uses a simple, if a bit clumsy trick. 1. Marks entry as "moving" 2. Copies entry to new metadata-pair 3. Deletes old entry If power is lost before the move operation is completed, we will find the "moving" tag. This means there may or may not be an incomplete move on the filesystem. In this case, we simply search for the moved entry, if we find it, we remove the old entry, otherwise we just remove the "moving" tag. This worked perfectly, until we introduced inlined files. See, unlike the existing directory and ctz entries, inlined files have no guarantee they are unique. There is nothing we can search for that will allow us to find a moved file unless we assign entries globally-unique ids. (note that moves are fundamentally rename operations, so searching for names does not make sense). --- Solving this problem required completely restructuring how littlefs handled moves and pulled out a really old idea that had been left in the cutting room floor back when littlefs was going through many designs: xored-globals. The problem xored-globals solves is the need to maintain some global state via commits to these distributed, independent metadata-pairs. The idea is that we can use some sort of symmetric operation, such as xor, to introduces deltas of the global state that can be committed atomically along with any other info to these metadata-pairs. This means that to figure out our global state, we xor together the global delta stored in every metadata-pair. Which means any commit can update the global state atomically, opening up a whole new set atomic possibilities. There is a couple of downsides. These globals may end up with deltas on every single metadata-pair, effectively duplicating the data for each block. Additionally, these globals need to have multiple copies in RAM. This means and globals need to be a bounded size and very small, since even small globals will have a large footprint. --- On top of xored-globals, it's trivial to fix our move logic. Here we've added an indirect delete tag which allows us to atomically specify a delete of any entry on the filesystem. Our move operation is now: 1. Copy entry to new metadata-pair and atomically xor globals to indirectly delete our original entry. 2. Delete the original entry and xor globals to remove the indirect delete. Extra exciting is that this now takes our relatively clumsy move operation into a sexy guaranteed O(1) move operation with no searching necessary (though we do need to xor globals during mount). Also reintroduced entry struct, now with a specific purpose to describe the metadata-pair + id combo needed by indirect deletes to locate an entry.
2018-05-29 17:35:23 +00:00
struct lfs_fs_parent_match {
lfs_t *lfs;
const lfs_block_t pair[2];
};
static int lfs_fs_parent_match(void *data,
lfs_tag_t tag, const void *buffer) {
struct lfs_fs_parent_match *find = data;
lfs_t *lfs = find->lfs;
const struct lfs_diskoff *disk = buffer;
(void)tag;
lfs_block_t child[2];
int err = lfs_bd_read(lfs,
&lfs->pcache, &lfs->rcache, lfs->cfg->block_size,
disk->block, disk->off, &child, sizeof(child));
if (err) {
return err;
}
lfs_pair_fromle32(child);
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
return (lfs_pair_cmp(child, find->pair) == 0) ? LFS_CMP_EQ : LFS_CMP_LT;
}
static lfs_stag_t lfs_fs_parent(lfs_t *lfs, const lfs_block_t pair[2],
lfs_mdir_t *parent) {
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
// use fetchmatch with callback to find pairs
parent->tail[0] = 0;
parent->tail[1] = 1;
lfs_block_t cycle = 0;
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
while (!lfs_pair_isnull(parent->tail)) {
if (cycle >= lfs->cfg->block_count/2) {
// loop detected
return LFS_ERR_CORRUPT;
}
cycle += 1;
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
lfs_stag_t tag = lfs_dir_fetchmatch(lfs, parent, parent->tail,
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
LFS_MKTAG(0x7ff, 0, 0x3ff),
LFS_MKTAG(LFS_TYPE_DIRSTRUCT, 0, 8),
NULL,
Switched to strongly ordered directories Instead of storing files in an arbitrary order, we now store files in ascending lexicographical order by filename. Although a big change, this actually has little impact on how littlefs works internally. We need to support file insertion, and compare file names to find our position. But since we already need to scan the entire directory block, this adds relatively little overhead. What this does allow, is the potential to add B-tree support in the future in a backwards compatible manner. How could you add B-trees to littlefs? 1. Add an optional "child" tag with a pointer that allows you to skip to a position in the metadata-pair list that composes the directory 2. When splitting a metadata-pair (sound familiar?), we either insert a second child tag in our parent, or we create a new root containing the child tags. 3. Each layer needs a bit stored in the tail-pointer to indicate if we're going to the next layer. This can be created trivially when we create a new root. 4. During lookup we keep two pointers containing the bounds of our search. We may need to iterate through multiple metadata-pairs in our linked-list, but this gives us a O(log n) lookup cost in a balanced tree. 5. During deletion we also delete any children pointers. Note that children pointers must come before the actual file entry. This gives us a B-tree implementation that is compatible with the current directory layout (assuming the files are ordered). This means that B-trees could be supported by a host PC and ignored on a small device. And during power-loss, we never end up with a broken filesystem, just a less-than-optimal tree. Note that we don't handle removes, so it's possible for a tree to become unbalanced. But worst case that's the same as the current linked-list implementation. All we need to do now is keep directories ordered. If we decide to drop B-tree support in the future or the B-tree implementation turns out inherently flawed, we can just drop the ordered requirement without breaking compatibility and recover the code cost.
2018-10-04 19:49:34 +00:00
lfs_fs_parent_match, &(struct lfs_fs_parent_match){
lfs, {pair[0], pair[1]}});
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
if (tag && tag != LFS_ERR_NOENT) {
Added root entry and expanding superblocks Expanding superblocks has been on my wishlist for a while. The basic idea is that instead of maintaining a fixed offset blocks {0, 1} to the the root directory (1 pointer), we maintain a dynamically sized linked-list of superblocks that point to the actual root. If the number of writes to the root exceeds some value, we increase the size of the superblock linked-list. This can leverage existing metadata-pair operations. The revision count for metadata-pairs provides some knowledge on how much wear we've put on the superblock, and the threaded linked-list can also be reused for this purpose. This means superblock expansion is both optional and cheap to implement. Expanding superblocks helps both extremely small and extremely large filesystem (extreme being relative of course). On the small end, we can actually collapse the superblock into the root directory and drop the hard requirement of 4-blocks for the superblock. On the large end, our superblock will now last longer than the rest of the filesystem. Each time we expand, the number of cycles until the superblock dies is increased by a power. Before we were stuck with this layout: level cycles limit layout 1 E^2 390 MiB s0 -> root Now we expand every time a fixed offset is exceeded: level cycles limit layout 0 E 4 KiB s0+root 1 E^2 390 MiB s0 -> root 2 E^3 37 TiB s0 -> s1 -> root 3 E^4 3.6 EiB s0 -> s1 -> s2 -> root ... Where the cycles are the number of cycles before death, and the limit is the worst-case size a filesystem where early superblock death becomes a concern (all writes to root using this formula: E^|s| = E*B, E = erase cycles = 100000, B = block count, assuming 4096 byte blocks). Note we can also store copies of the superblock entry on the expanded superblocks. This may help filesystem recover tools in the future.
2018-08-06 18:30:51 +00:00
return tag;
}
}
return LFS_ERR_NOENT;
}
static int lfs_fs_relocate(lfs_t *lfs,
const lfs_block_t oldpair[2], lfs_block_t newpair[2]) {
// update internal root
if (lfs_pair_cmp(oldpair, lfs->root) == 0) {
lfs->root[0] = newpair[0];
lfs->root[1] = newpair[1];
}
// update internally tracked dirs
for (struct lfs_mlist *d = lfs->mlist; d; d = d->next) {
if (lfs_pair_cmp(oldpair, d->m.pair) == 0) {
d->m.pair[0] = newpair[0];
d->m.pair[1] = newpair[1];
}
if (d->type == LFS_TYPE_DIR &&
lfs_pair_cmp(oldpair, ((lfs_dir_t*)d)->head) == 0) {
((lfs_dir_t*)d)->head[0] = newpair[0];
((lfs_dir_t*)d)->head[1] = newpair[1];
}
}
// find parent
lfs_mdir_t parent;
lfs_stag_t tag = lfs_fs_parent(lfs, oldpair, &parent);
if (tag < 0 && tag != LFS_ERR_NOENT) {
return tag;
}
if (tag != LFS_ERR_NOENT) {
// update disk, this creates a desync
lfs_fs_preporphans(lfs, +1);
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
// fix pending move in this pair? this looks like an optimization but
// is in fact _required_ since relocating may outdate the move.
uint16_t moveid = 0x3ff;
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
if (lfs_gstate_hasmovehere(&lfs->gstate, parent.pair)) {
moveid = lfs_tag_id(lfs->gstate.tag);
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
LFS_DEBUG("Fixing move while relocating "
"{0x%"PRIx32", 0x%"PRIx32"} 0x%"PRIx16"\n",
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
parent.pair[0], parent.pair[1], moveid);
lfs_fs_prepmove(lfs, 0x3ff, NULL);
if (moveid < lfs_tag_id(tag)) {
tag -= LFS_MKTAG(0, 1, 0);
}
}
lfs_pair_tole32(newpair);
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
int err = lfs_dir_commit(lfs, &parent, LFS_MKATTRS(
{LFS_MKTAG_IF(moveid != 0x3ff,
2020-04-03 01:05:55 +00:00
LFS_TYPE_DELETE, moveid, 0), NULL},
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
{tag, newpair}));
lfs_pair_fromle32(newpair);
if (err) {
return err;
}
// next step, clean up orphans
lfs_fs_preporphans(lfs, -1);
}
// find pred
int err = lfs_fs_pred(lfs, oldpair, &parent);
if (err && err != LFS_ERR_NOENT) {
return err;
}
// if we can't find dir, it must be new
if (err != LFS_ERR_NOENT) {
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
// fix pending move in this pair? this looks like an optimization but
// is in fact _required_ since relocating may outdate the move.
uint16_t moveid = 0x3ff;
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
if (lfs_gstate_hasmovehere(&lfs->gstate, parent.pair)) {
moveid = lfs_tag_id(lfs->gstate.tag);
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
LFS_DEBUG("Fixing move while relocating "
"{0x%"PRIx32", 0x%"PRIx32"} 0x%"PRIx16"\n",
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
parent.pair[0], parent.pair[1], moveid);
lfs_fs_prepmove(lfs, 0x3ff, NULL);
}
// replace bad pair, either we clean up desync, or no desync occured
lfs_pair_tole32(newpair);
err = lfs_dir_commit(lfs, &parent, LFS_MKATTRS(
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
{LFS_MKTAG_IF(moveid != 0x3ff,
2020-04-03 01:05:55 +00:00
LFS_TYPE_DELETE, moveid, 0), NULL},
{LFS_MKTAG(LFS_TYPE_TAIL + parent.split, 0x3ff, 8), newpair}));
lfs_pair_fromle32(newpair);
if (err) {
return err;
}
}
return 0;
}
static void lfs_fs_preporphans(lfs_t *lfs, int8_t orphans) {
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
LFS_ASSERT(lfs_tag_size(lfs->gstate.tag) > 0 || orphans >= 0);
lfs->gstate.tag += orphans;
lfs->gstate.tag = ((lfs->gstate.tag & ~LFS_MKTAG(0x800, 0, 0)) |
((uint32_t)lfs_gstate_hasorphans(&lfs->gstate) << 31));
}
static void lfs_fs_prepmove(lfs_t *lfs,
uint16_t id, const lfs_block_t pair[2]) {
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->gstate.tag = ((lfs->gstate.tag & ~LFS_MKTAG(0x7ff, 0x3ff, 0)) |
((id != 0x3ff) ? LFS_MKTAG(LFS_TYPE_DELETE, id, 0) : 0));
lfs->gstate.pair[0] = (id != 0x3ff) ? pair[0] : 0;
lfs->gstate.pair[1] = (id != 0x3ff) ? pair[1] : 0;
}
static int lfs_fs_demove(lfs_t *lfs) {
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
if (!lfs_gstate_hasmove(&lfs->gdisk)) {
return 0;
}
// Fix bad moves
LFS_DEBUG("Fixing move {0x%"PRIx32", 0x%"PRIx32"} 0x%"PRIx16,
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs->gdisk.pair[0],
lfs->gdisk.pair[1],
lfs_tag_id(lfs->gdisk.tag));
// fetch and delete the moved entry
lfs_mdir_t movedir;
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
int err = lfs_dir_fetch(lfs, &movedir, lfs->gdisk.pair);
if (err) {
return err;
}
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
// prep gstate and delete move id
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
uint16_t moveid = lfs_tag_id(lfs->gdisk.tag);
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
lfs_fs_prepmove(lfs, 0x3ff, NULL);
err = lfs_dir_commit(lfs, &movedir, LFS_MKATTRS(
2020-04-03 01:05:55 +00:00
{LFS_MKTAG(LFS_TYPE_DELETE, moveid, 0), NULL}));
if (err) {
return err;
}
return 0;
}
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
static int lfs_fs_deorphan(lfs_t *lfs) {
if (!lfs_gstate_hasorphans(&lfs->gstate)) {
return 0;
}
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
// Fix any orphans
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
lfs_mdir_t pdir = {.split = true, .tail = {0, 1}};
lfs_mdir_t dir;
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
// iterate over all directory directory entries
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
while (!lfs_pair_isnull(pdir.tail)) {
int err = lfs_dir_fetch(lfs, &dir, pdir.tail);
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
if (err) {
return err;
}
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
// check head blocks for orphans
if (!pdir.split) {
// check if we have a parent
lfs_mdir_t parent;
lfs_stag_t tag = lfs_fs_parent(lfs, pdir.tail, &parent);
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
if (tag < 0 && tag != LFS_ERR_NOENT) {
return tag;
}
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
if (tag == LFS_ERR_NOENT) {
// we are an orphan
LFS_DEBUG("Fixing orphan {0x%"PRIx32", 0x%"PRIx32"}",
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
pdir.tail[0], pdir.tail[1]);
err = lfs_dir_drop(lfs, &pdir, &dir);
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
if (err) {
return err;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
// refetch tail
continue;
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
}
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
lfs_block_t pair[2];
Cleaned up tag encoding, now with clear chunk field Before, the tag format's type field was limited to 9-bits. This sounds like a lot, but this field needed to encode up to 256 user-specified types. This limited the flexibility of the encoded types. As time went on, more bits in the type field were repurposed for various things, leaving a rather fragile type field. Here we make the jump to full 11-bit type fields. This comes at the cost of a smaller length field, however the use of the length field was always going to come with a RAM limitation. Rather than putting pressure on RAM for inline files, the new type field lets us encode a chunk number, splitting up inline files into multiple updatable units. This actually pushes the theoretical inline max from 8KiB to 256KiB! (Note that we only allow a single 1KiB chunk for now, chunky inline files is just a theoretical future improvement). Here is the new 32-bit tag format, note that there are multiple levels of types which break down into more info: [---- 32 ----] [1|-- 11 --|-- 10 --|-- 10 --] ^. ^ . ^ ^- entry length |. | . \------------ file id chunk info |. \-----.------------------ type info (type3) \.-----------.------------------ valid bit [-3-|-- 8 --] ^ ^- chunk info \------- type info (type1) Additionally, I've split the CREATE tag into separate SPLICE and NAME tags. This simplified the new compact logic a bit. For now, littlefs still follows the rule that a NAME tag precedes any other tags related to a file, but this can change in the future.
2018-12-29 13:53:12 +00:00
lfs_stag_t res = lfs_dir_get(lfs, &parent,
LFS_MKTAG(0x7ff, 0x3ff, 0), tag, pair);
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
if (res < 0) {
return res;
}
lfs_pair_fromle32(pair);
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
if (!lfs_pair_sync(pair, pdir.tail)) {
// we have desynced
LFS_DEBUG("Fixing half-orphan {0x%"PRIx32", 0x%"PRIx32"} "
"-> {0x%"PRIx32", 0x%"PRIx32"}",
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
pdir.tail[0], pdir.tail[1], pair[0], pair[1]);
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
lfs_pair_tole32(pair);
err = lfs_dir_commit(lfs, &pdir, LFS_MKATTRS(
{LFS_MKTAG(LFS_TYPE_SOFTTAIL, 0x3ff, 8), pair}));
lfs_pair_fromle32(pair);
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
if (err) {
return err;
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
// refetch tail
continue;
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
}
}
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
pdir = dir;
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
}
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
// mark orphans as fixed
lfs_fs_preporphans(lfs, -lfs_gstate_getorphans(&lfs->gstate));
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
return 0;
}
static int lfs_fs_forceconsistency(lfs_t *lfs) {
int err = lfs_fs_demove(lfs);
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
if (err) {
return err;
}
err = lfs_fs_deorphan(lfs);
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
if (err) {
return err;
}
Added building blocks for dynamic wear-leveling Initially, littlefs relied entirely on bad-block detection for wear-leveling. Conceptually, at the end of a devices lifespan, all blocks would be worn evenly, even if they weren't worn out at the same time. However, this doesn't work for all devices, rather than causing corruption during writes, wear reduces a devices "sticking power", causing bits to flip over time. This means for many devices, true wear-leveling (dynamic or static) is required. Fortunately, way back at the beginning, littlefs was designed to do full dynamic wear-leveling, only dropping it when making the retrospectively short-sighted realization that bad-block detection is theoretically sufficient. We can enable dynamic wear-leveling with only a few tweaks to littlefs. These can be implemented without breaking backwards compatibility. 1. Evict metadata-pairs after a certain number of writes. Eviction in this case is identical to a relocation to recover from a bad block. We move our data and stick the old block back into our pool of blocks. For knowing when to evict, we already have a revision count for each metadata-pair which gives us enough information. We add the configuration option block_cycles and evict when our revision count is a multiple of this value. 2. Now all blocks participate in COW behaviour. However we don't store the state of our allocator, so every boot cycle we reuse the first blocks on storage. This is very bad on a microcontroller, where we may reboot often. We need a way to spread our usage across the disk. To pull this off, we can simply randomize which block we start our allocator at. But we need a random number generator that is different on each boot. Fortunately we have a great source of entropy, our filesystem. So we seed our block allocator with a simple hash of the CRCs on our metadata-pairs. This can be done for free since we already need to scan the metadata-pairs during mount. What we end up with is a uniform distribution of wear on storage. The wear is not perfect, if a block is used for metadata it gets more wear, and the randomization may not be exact. But we can never actually get perfect wear-leveling, since we're already resigned to dynamic wear-leveling at the file level. With the addition of metadata logging, we end up with a really interesting two-stage wear-leveling algorithm. At the low-level, metadata is statically wear-leveled. At the high-level, blocks are dynamically wear-leveled. --- This specific commit implements the first step, eviction of metadata pairs. Entertwining this into the already complicated compact logic was a bit annoying, however we can combine the logic for superblock expansion with the logic for metadata-pair eviction.
2018-08-08 21:34:56 +00:00
return 0;
}
static int lfs_fs_size_count(void *p, lfs_block_t block) {
(void)block;
lfs_size_t *size = p;
*size += 1;
return 0;
}
lfs_ssize_t lfs_fs_size(lfs_t *lfs) {
LFS_TRACE("lfs_fs_size(%p)", (void*)lfs);
lfs_size_t size = 0;
int err = lfs_fs_traverseraw(lfs, lfs_fs_size_count, &size, false);
if (err) {
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_fs_size -> %d", err);
return err;
}
2019-11-22 04:29:57 +00:00
LFS_TRACE("lfs_fs_size -> %d", err);
return size;
}
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
#ifdef LFS_MIGRATE
////// Migration from littelfs v1 below this //////
/// Version info ///
// Software library version
// Major (top-nibble), incremented on backwards incompatible changes
// Minor (bottom-nibble), incremented on feature additions
#define LFS1_VERSION 0x00010007
#define LFS1_VERSION_MAJOR (0xffff & (LFS1_VERSION >> 16))
#define LFS1_VERSION_MINOR (0xffff & (LFS1_VERSION >> 0))
// Version of On-disk data structures
// Major (top-nibble), incremented on backwards incompatible changes
// Minor (bottom-nibble), incremented on feature additions
#define LFS1_DISK_VERSION 0x00010001
#define LFS1_DISK_VERSION_MAJOR (0xffff & (LFS1_DISK_VERSION >> 16))
#define LFS1_DISK_VERSION_MINOR (0xffff & (LFS1_DISK_VERSION >> 0))
/// v1 Definitions ///
// File types
enum lfs1_type {
LFS1_TYPE_REG = 0x11,
LFS1_TYPE_DIR = 0x22,
LFS1_TYPE_SUPERBLOCK = 0x2e,
};
typedef struct lfs1 {
lfs_block_t root[2];
} lfs1_t;
typedef struct lfs1_entry {
lfs_off_t off;
struct lfs1_disk_entry {
uint8_t type;
uint8_t elen;
uint8_t alen;
uint8_t nlen;
union {
struct {
lfs_block_t head;
lfs_size_t size;
} file;
lfs_block_t dir[2];
} u;
} d;
} lfs1_entry_t;
typedef struct lfs1_dir {
struct lfs1_dir *next;
lfs_block_t pair[2];
lfs_off_t off;
lfs_block_t head[2];
lfs_off_t pos;
struct lfs1_disk_dir {
uint32_t rev;
lfs_size_t size;
lfs_block_t tail[2];
} d;
} lfs1_dir_t;
typedef struct lfs1_superblock {
lfs_off_t off;
struct lfs1_disk_superblock {
uint8_t type;
uint8_t elen;
uint8_t alen;
uint8_t nlen;
lfs_block_t root[2];
uint32_t block_size;
uint32_t block_count;
uint32_t version;
char magic[8];
} d;
} lfs1_superblock_t;
/// Low-level wrappers v1->v2 ///
static void lfs1_crc(uint32_t *crc, const void *buffer, size_t size) {
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
*crc = lfs_crc(*crc, buffer, size);
}
static int lfs1_bd_read(lfs_t *lfs, lfs_block_t block,
lfs_off_t off, void *buffer, lfs_size_t size) {
// if we ever do more than writes to alternating pairs,
// this may need to consider pcache
return lfs_bd_read(lfs, &lfs->pcache, &lfs->rcache, size,
block, off, buffer, size);
}
static int lfs1_bd_crc(lfs_t *lfs, lfs_block_t block,
lfs_off_t off, lfs_size_t size, uint32_t *crc) {
for (lfs_off_t i = 0; i < size; i++) {
uint8_t c;
int err = lfs1_bd_read(lfs, block, off+i, &c, 1);
if (err) {
return err;
}
lfs1_crc(crc, &c, 1);
}
return 0;
}
/// Endian swapping functions ///
static void lfs1_dir_fromle32(struct lfs1_disk_dir *d) {
d->rev = lfs_fromle32(d->rev);
d->size = lfs_fromle32(d->size);
d->tail[0] = lfs_fromle32(d->tail[0]);
d->tail[1] = lfs_fromle32(d->tail[1]);
}
static void lfs1_dir_tole32(struct lfs1_disk_dir *d) {
d->rev = lfs_tole32(d->rev);
d->size = lfs_tole32(d->size);
d->tail[0] = lfs_tole32(d->tail[0]);
d->tail[1] = lfs_tole32(d->tail[1]);
}
static void lfs1_entry_fromle32(struct lfs1_disk_entry *d) {
d->u.dir[0] = lfs_fromle32(d->u.dir[0]);
d->u.dir[1] = lfs_fromle32(d->u.dir[1]);
}
static void lfs1_entry_tole32(struct lfs1_disk_entry *d) {
d->u.dir[0] = lfs_tole32(d->u.dir[0]);
d->u.dir[1] = lfs_tole32(d->u.dir[1]);
}
static void lfs1_superblock_fromle32(struct lfs1_disk_superblock *d) {
d->root[0] = lfs_fromle32(d->root[0]);
d->root[1] = lfs_fromle32(d->root[1]);
d->block_size = lfs_fromle32(d->block_size);
d->block_count = lfs_fromle32(d->block_count);
d->version = lfs_fromle32(d->version);
}
///// Metadata pair and directory operations ///
static inline lfs_size_t lfs1_entry_size(const lfs1_entry_t *entry) {
return 4 + entry->d.elen + entry->d.alen + entry->d.nlen;
}
static int lfs1_dir_fetch(lfs_t *lfs,
lfs1_dir_t *dir, const lfs_block_t pair[2]) {
// copy out pair, otherwise may be aliasing dir
const lfs_block_t tpair[2] = {pair[0], pair[1]};
bool valid = false;
// check both blocks for the most recent revision
for (int i = 0; i < 2; i++) {
struct lfs1_disk_dir test;
int err = lfs1_bd_read(lfs, tpair[i], 0, &test, sizeof(test));
lfs1_dir_fromle32(&test);
if (err) {
if (err == LFS_ERR_CORRUPT) {
continue;
}
return err;
}
if (valid && lfs_scmp(test.rev, dir->d.rev) < 0) {
continue;
}
if ((0x7fffffff & test.size) < sizeof(test)+4 ||
(0x7fffffff & test.size) > lfs->cfg->block_size) {
continue;
}
uint32_t crc = 0xffffffff;
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
lfs1_dir_tole32(&test);
lfs1_crc(&crc, &test, sizeof(test));
lfs1_dir_fromle32(&test);
err = lfs1_bd_crc(lfs, tpair[i], sizeof(test),
(0x7fffffff & test.size) - sizeof(test), &crc);
if (err) {
if (err == LFS_ERR_CORRUPT) {
continue;
}
return err;
}
if (crc != 0) {
continue;
}
valid = true;
// setup dir in case it's valid
dir->pair[0] = tpair[(i+0) % 2];
dir->pair[1] = tpair[(i+1) % 2];
dir->off = sizeof(dir->d);
dir->d = test;
}
if (!valid) {
LFS_ERROR("Corrupted dir pair at {0x%"PRIx32", 0x%"PRIx32"}",
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
tpair[0], tpair[1]);
return LFS_ERR_CORRUPT;
}
return 0;
}
static int lfs1_dir_next(lfs_t *lfs, lfs1_dir_t *dir, lfs1_entry_t *entry) {
while (dir->off + sizeof(entry->d) > (0x7fffffff & dir->d.size)-4) {
if (!(0x80000000 & dir->d.size)) {
entry->off = dir->off;
return LFS_ERR_NOENT;
}
int err = lfs1_dir_fetch(lfs, dir, dir->d.tail);
if (err) {
return err;
}
dir->off = sizeof(dir->d);
dir->pos += sizeof(dir->d) + 4;
}
int err = lfs1_bd_read(lfs, dir->pair[0], dir->off,
&entry->d, sizeof(entry->d));
lfs1_entry_fromle32(&entry->d);
if (err) {
return err;
}
entry->off = dir->off;
dir->off += lfs1_entry_size(entry);
dir->pos += lfs1_entry_size(entry);
return 0;
}
/// littlefs v1 specific operations ///
int lfs1_traverse(lfs_t *lfs, int (*cb)(void*, lfs_block_t), void *data) {
if (lfs_pair_isnull(lfs->lfs1->root)) {
return 0;
}
// iterate over metadata pairs
lfs1_dir_t dir;
lfs1_entry_t entry;
lfs_block_t cwd[2] = {0, 1};
while (true) {
for (int i = 0; i < 2; i++) {
int err = cb(data, cwd[i]);
if (err) {
return err;
}
}
int err = lfs1_dir_fetch(lfs, &dir, cwd);
if (err) {
return err;
}
// iterate over contents
while (dir.off + sizeof(entry.d) <= (0x7fffffff & dir.d.size)-4) {
err = lfs1_bd_read(lfs, dir.pair[0], dir.off,
&entry.d, sizeof(entry.d));
lfs1_entry_fromle32(&entry.d);
if (err) {
return err;
}
dir.off += lfs1_entry_size(&entry);
if ((0x70 & entry.d.type) == (0x70 & LFS1_TYPE_REG)) {
err = lfs_ctz_traverse(lfs, NULL, &lfs->rcache,
entry.d.u.file.head, entry.d.u.file.size, cb, data);
if (err) {
return err;
}
}
}
// we also need to check if we contain a threaded v2 directory
lfs_mdir_t dir2 = {.split=true, .tail={cwd[0], cwd[1]}};
while (dir2.split) {
err = lfs_dir_fetch(lfs, &dir2, dir2.tail);
if (err) {
break;
}
for (int i = 0; i < 2; i++) {
err = cb(data, dir2.pair[i]);
if (err) {
return err;
}
}
}
cwd[0] = dir.d.tail[0];
cwd[1] = dir.d.tail[1];
if (lfs_pair_isnull(cwd)) {
break;
}
}
return 0;
}
static int lfs1_moved(lfs_t *lfs, const void *e) {
if (lfs_pair_isnull(lfs->lfs1->root)) {
return 0;
}
// skip superblock
lfs1_dir_t cwd;
int err = lfs1_dir_fetch(lfs, &cwd, (const lfs_block_t[2]){0, 1});
if (err) {
return err;
}
// iterate over all directory directory entries
lfs1_entry_t entry;
while (!lfs_pair_isnull(cwd.d.tail)) {
err = lfs1_dir_fetch(lfs, &cwd, cwd.d.tail);
if (err) {
return err;
}
while (true) {
err = lfs1_dir_next(lfs, &cwd, &entry);
if (err && err != LFS_ERR_NOENT) {
return err;
}
if (err == LFS_ERR_NOENT) {
break;
}
if (!(0x80 & entry.d.type) &&
memcmp(&entry.d.u, e, sizeof(entry.d.u)) == 0) {
return true;
}
}
}
return false;
}
/// Filesystem operations ///
static int lfs1_mount(lfs_t *lfs, struct lfs1 *lfs1,
const struct lfs_config *cfg) {
int err = 0;
{
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
err = lfs_init(lfs, cfg);
if (err) {
return err;
}
lfs->lfs1 = lfs1;
lfs->lfs1->root[0] = LFS_BLOCK_NULL;
lfs->lfs1->root[1] = LFS_BLOCK_NULL;
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
// setup free lookahead
lfs->free.off = 0;
lfs->free.size = 0;
lfs->free.i = 0;
lfs_alloc_ack(lfs);
// load superblock
lfs1_dir_t dir;
lfs1_superblock_t superblock;
err = lfs1_dir_fetch(lfs, &dir, (const lfs_block_t[2]){0, 1});
if (err && err != LFS_ERR_CORRUPT) {
goto cleanup;
}
if (!err) {
err = lfs1_bd_read(lfs, dir.pair[0], sizeof(dir.d),
&superblock.d, sizeof(superblock.d));
lfs1_superblock_fromle32(&superblock.d);
if (err) {
goto cleanup;
}
lfs->lfs1->root[0] = superblock.d.root[0];
lfs->lfs1->root[1] = superblock.d.root[1];
}
if (err || memcmp(superblock.d.magic, "littlefs", 8) != 0) {
LFS_ERROR("Invalid superblock at {0x%"PRIx32", 0x%"PRIx32"}",
0, 1);
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
err = LFS_ERR_CORRUPT;
goto cleanup;
}
uint16_t major_version = (0xffff & (superblock.d.version >> 16));
uint16_t minor_version = (0xffff & (superblock.d.version >> 0));
if ((major_version != LFS1_DISK_VERSION_MAJOR ||
minor_version > LFS1_DISK_VERSION_MINOR)) {
LFS_ERROR("Invalid version v%d.%d", major_version, minor_version);
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
err = LFS_ERR_INVAL;
goto cleanup;
}
return 0;
}
cleanup:
lfs_deinit(lfs);
return err;
}
static int lfs1_unmount(lfs_t *lfs) {
return lfs_deinit(lfs);
}
/// v1 migration ///
int lfs_migrate(lfs_t *lfs, const struct lfs_config *cfg) {
LFS_TRACE("lfs_migrate(%p, %p {.context=%p, "
".read=%p, .prog=%p, .erase=%p, .sync=%p, "
".read_size=%"PRIu32", .prog_size=%"PRIu32", "
".block_size=%"PRIu32", .block_count=%"PRIu32", "
".block_cycles=%"PRIu32", .cache_size=%"PRIu32", "
".lookahead_size=%"PRIu32", .read_buffer=%p, "
".prog_buffer=%p, .lookahead_buffer=%p, "
".name_max=%"PRIu32", .file_max=%"PRIu32", "
".attr_max=%"PRIu32"})",
(void*)lfs, (void*)cfg, cfg->context,
(void*)(uintptr_t)cfg->read, (void*)(uintptr_t)cfg->prog,
(void*)(uintptr_t)cfg->erase, (void*)(uintptr_t)cfg->sync,
cfg->read_size, cfg->prog_size, cfg->block_size, cfg->block_count,
cfg->block_cycles, cfg->cache_size, cfg->lookahead_size,
cfg->read_buffer, cfg->prog_buffer, cfg->lookahead_buffer,
cfg->name_max, cfg->file_max, cfg->attr_max);
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
struct lfs1 lfs1;
int err = lfs1_mount(lfs, &lfs1, cfg);
if (err) {
LFS_TRACE("lfs_migrate -> %d", err);
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
return err;
}
{
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
// iterate through each directory, copying over entries
// into new directory
lfs1_dir_t dir1;
lfs_mdir_t dir2;
dir1.d.tail[0] = lfs->lfs1->root[0];
dir1.d.tail[1] = lfs->lfs1->root[1];
while (!lfs_pair_isnull(dir1.d.tail)) {
// iterate old dir
err = lfs1_dir_fetch(lfs, &dir1, dir1.d.tail);
if (err) {
goto cleanup;
}
// create new dir and bind as temporary pretend root
err = lfs_dir_alloc(lfs, &dir2);
if (err) {
goto cleanup;
}
dir2.rev = dir1.d.rev;
dir1.head[0] = dir1.pair[0];
dir1.head[1] = dir1.pair[1];
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
lfs->root[0] = dir2.pair[0];
lfs->root[1] = dir2.pair[1];
err = lfs_dir_commit(lfs, &dir2, NULL, 0);
if (err) {
goto cleanup;
}
while (true) {
lfs1_entry_t entry1;
err = lfs1_dir_next(lfs, &dir1, &entry1);
if (err && err != LFS_ERR_NOENT) {
goto cleanup;
}
if (err == LFS_ERR_NOENT) {
break;
}
// check that entry has not been moved
if (entry1.d.type & 0x80) {
int moved = lfs1_moved(lfs, &entry1.d.u);
if (moved < 0) {
err = moved;
goto cleanup;
}
if (moved) {
continue;
}
entry1.d.type &= ~0x80;
}
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
// also fetch name
char name[LFS_NAME_MAX+1];
memset(name, 0, sizeof(name));
err = lfs1_bd_read(lfs, dir1.pair[0],
entry1.off + 4+entry1.d.elen+entry1.d.alen,
name, entry1.d.nlen);
if (err) {
goto cleanup;
}
bool isdir = (entry1.d.type == LFS1_TYPE_DIR);
// create entry in new dir
err = lfs_dir_fetch(lfs, &dir2, lfs->root);
if (err) {
goto cleanup;
}
uint16_t id;
err = lfs_dir_find(lfs, &dir2, &(const char*){name}, &id);
if (!(err == LFS_ERR_NOENT && id != 0x3ff)) {
err = (err < 0) ? err : LFS_ERR_EXIST;
goto cleanup;
}
lfs1_entry_tole32(&entry1.d);
err = lfs_dir_commit(lfs, &dir2, LFS_MKATTRS(
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
{LFS_MKTAG(LFS_TYPE_CREATE, id, 0)},
{LFS_MKTAG_IF_ELSE(isdir,
LFS_TYPE_DIR, id, entry1.d.nlen,
LFS_TYPE_REG, id, entry1.d.nlen),
name},
{LFS_MKTAG_IF_ELSE(isdir,
LFS_TYPE_DIRSTRUCT, id, sizeof(entry1.d.u),
LFS_TYPE_CTZSTRUCT, id, sizeof(entry1.d.u)),
&entry1.d.u}));
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
lfs1_entry_fromle32(&entry1.d);
if (err) {
goto cleanup;
}
}
if (!lfs_pair_isnull(dir1.d.tail)) {
// find last block and update tail to thread into fs
err = lfs_dir_fetch(lfs, &dir2, lfs->root);
if (err) {
goto cleanup;
}
while (dir2.split) {
err = lfs_dir_fetch(lfs, &dir2, dir2.tail);
if (err) {
goto cleanup;
}
}
lfs_pair_tole32(dir2.pair);
err = lfs_dir_commit(lfs, &dir2, LFS_MKATTRS(
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
{LFS_MKTAG(LFS_TYPE_SOFTTAIL, 0x3ff, 8), dir1.d.tail}));
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
lfs_pair_fromle32(dir2.pair);
if (err) {
goto cleanup;
}
}
// Copy over first block to thread into fs. Unfortunately
// if this fails there is not much we can do.
LFS_DEBUG("Migrating {0x%"PRIx32", 0x%"PRIx32"} "
"-> {0x%"PRIx32", 0x%"PRIx32"}",
lfs->root[0], lfs->root[1], dir1.head[0], dir1.head[1]);
err = lfs_bd_erase(lfs, dir1.head[1]);
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
if (err) {
goto cleanup;
}
err = lfs_dir_fetch(lfs, &dir2, lfs->root);
if (err) {
goto cleanup;
}
for (lfs_off_t i = 0; i < dir2.off; i++) {
uint8_t dat;
err = lfs_bd_read(lfs,
NULL, &lfs->rcache, dir2.off,
dir2.pair[0], i, &dat, 1);
if (err) {
goto cleanup;
}
err = lfs_bd_prog(lfs,
&lfs->pcache, &lfs->rcache, true,
dir1.head[1], i, &dat, 1);
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
if (err) {
goto cleanup;
}
}
err = lfs_bd_flush(lfs, &lfs->pcache, &lfs->rcache, true);
if (err) {
goto cleanup;
}
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
}
// Create new superblock. This marks a successful migration!
err = lfs1_dir_fetch(lfs, &dir1, (const lfs_block_t[2]){0, 1});
if (err) {
goto cleanup;
}
dir2.pair[0] = dir1.pair[0];
dir2.pair[1] = dir1.pair[1];
dir2.rev = dir1.d.rev;
dir2.off = sizeof(dir2.rev);
dir2.etag = 0xffffffff;
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
dir2.count = 0;
dir2.tail[0] = lfs->lfs1->root[0];
dir2.tail[1] = lfs->lfs1->root[1];
dir2.erased = false;
dir2.split = true;
lfs_superblock_t superblock = {
.version = LFS_DISK_VERSION,
.block_size = lfs->cfg->block_size,
.block_count = lfs->cfg->block_count,
.name_max = lfs->name_max,
.file_max = lfs->file_max,
.attr_max = lfs->attr_max,
};
lfs_superblock_tole32(&superblock);
err = lfs_dir_commit(lfs, &dir2, LFS_MKATTRS(
Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain
2020-01-20 23:35:45 +00:00
{LFS_MKTAG(LFS_TYPE_CREATE, 0, 0)},
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
{LFS_MKTAG(LFS_TYPE_SUPERBLOCK, 0, 8), "littlefs"},
{LFS_MKTAG(LFS_TYPE_INLINESTRUCT, 0, sizeof(superblock)),
&superblock}));
if (err) {
goto cleanup;
}
// sanity check that fetch works
err = lfs_dir_fetch(lfs, &dir2, (const lfs_block_t[2]){0, 1});
if (err) {
goto cleanup;
}
// force compaction to prevent accidentally mounting v1
dir2.erased = false;
err = lfs_dir_commit(lfs, &dir2, NULL, 0);
if (err) {
goto cleanup;
}
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
}
cleanup:
lfs1_unmount(lfs);
LFS_TRACE("lfs_migrate -> %d", err);
Added migration from littlefs v1 This is the help the introduction of littlefs v2, which is disk incompatible with littlefs v1. While v2 can't mount v1, what we can do is provide an optional migration, which can convert v1 into v2 partially in-place. At worse, we only need to carry over the readonly operations on v1, which are much less complicated than the write operations, so the extra code cost may be as low as 25% of the v1 code size. Also, because v2 contains only metadata changes, it's possible to avoid copying file data during the update. Enabling the migration requires two steps 1. Defining LFS_MIGRATE 2. Call lfs_migrate (only available with the above macro) Each macro multiplies the number of configurations needed to be tested, so I've been avoiding macro controlled features since there's still work to be done around testing the single configuration that's already available. However, here the cost would be too high if we included migration code in the standard build. We can't use the lfs_migrate function for link time gc because of a dependency between the allocator and v1 data structures. So how does lfs_migrate work? It turned out to be a bit complicated, but the answer is a multistep process that relies on mounting v1 readonly and building the metadata skeleton needed by v2. 1. For each directory, create a v2 directory 2. Copy over v1 entries into v2 directory, including the soft-tail entry 3. Move head block of v2 directory into the unused metadata block in v1 directory. This results in both a v1 and v2 directory sharing the same metadata pair. 4. Finally, create a new superblock in the unused metadata block of the v1 superblock. Just like with normal metadata updates, the completion of the write to the second metadata block marks a succesful migration that can be mounted with littlefs v2. And all of this can occur atomically, enabling complete fallback if power is lost of an error occurs. Note there are several limitations with this solution. 1. While migration doesn't duplicate file data, it does temporarily duplicate all metadata. This can cause a device to run out of space if storage is tight and the filesystem as many files. If the device was created with >~2x the expected storage, it should be fine. 2. The current implementation is not able to recover if the metadata pairs develop bad blocks. It may be possilbe to workaround this, but it creates the problem that directories may change location during the migration. The other solutions I've looked at are complicated and require superlinear runtime. Currently I don't think it's worth fixing this limitation. 3. Enabling the migration requires additional code size. Currently this looks like it's roughly 11% at least on x86. And, if any failure does occur, no harm is done to the original v1 filesystem on disk.
2019-02-23 03:34:03 +00:00
return err;
}
#endif