third_party_littlefs/tests/test_orphans.toml

121 lines
4.1 KiB
TOML
Raw Permalink Normal View History

[[case]] # orphan test
in = "lfs.c"
if = 'LFS_PROG_SIZE <= 0x3fe' # only works with one crc per commit
code = '''
lfs_format(&lfs, &cfg) => 0;
lfs_mount(&lfs, &cfg) => 0;
lfs_mkdir(&lfs, "parent") => 0;
lfs_mkdir(&lfs, "parent/orphan") => 0;
lfs_mkdir(&lfs, "parent/child") => 0;
lfs_remove(&lfs, "parent/orphan") => 0;
lfs_unmount(&lfs) => 0;
// corrupt the child's most recent commit, this should be the update
// to the linked-list entry, which should orphan the orphan. Note this
// makes a lot of assumptions about the remove operation.
lfs_mount(&lfs, &cfg) => 0;
lfs_dir_open(&lfs, &dir, "parent/child") => 0;
lfs_block_t block = dir.m.pair[0];
lfs_dir_close(&lfs, &dir) => 0;
lfs_unmount(&lfs) => 0;
uint8_t bbuffer[LFS_BLOCK_SIZE];
cfg.read(&cfg, block, 0, bbuffer, LFS_BLOCK_SIZE) => 0;
int off = LFS_BLOCK_SIZE-1;
while (off >= 0 && bbuffer[off] == LFS_ERASE_VALUE) {
off -= 1;
}
memset(&bbuffer[off-3], LFS_BLOCK_SIZE, 3);
cfg.erase(&cfg, block) => 0;
cfg.prog(&cfg, block, 0, bbuffer, LFS_BLOCK_SIZE) => 0;
cfg.sync(&cfg) => 0;
lfs_mount(&lfs, &cfg) => 0;
lfs_stat(&lfs, "parent/orphan", &info) => LFS_ERR_NOENT;
lfs_stat(&lfs, "parent/child", &info) => 0;
lfs_fs_size(&lfs) => 8;
lfs_unmount(&lfs) => 0;
lfs_mount(&lfs, &cfg) => 0;
lfs_stat(&lfs, "parent/orphan", &info) => LFS_ERR_NOENT;
lfs_stat(&lfs, "parent/child", &info) => 0;
lfs_fs_size(&lfs) => 8;
// this mkdir should both create a dir and deorphan, so size
// should be unchanged
lfs_mkdir(&lfs, "parent/otherchild") => 0;
lfs_stat(&lfs, "parent/orphan", &info) => LFS_ERR_NOENT;
lfs_stat(&lfs, "parent/child", &info) => 0;
lfs_stat(&lfs, "parent/otherchild", &info) => 0;
lfs_fs_size(&lfs) => 8;
lfs_unmount(&lfs) => 0;
lfs_mount(&lfs, &cfg) => 0;
lfs_stat(&lfs, "parent/orphan", &info) => LFS_ERR_NOENT;
lfs_stat(&lfs, "parent/child", &info) => 0;
lfs_stat(&lfs, "parent/otherchild", &info) => 0;
lfs_fs_size(&lfs) => 8;
lfs_unmount(&lfs) => 0;
'''
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
[[case]] # reentrant testing for orphans, basically just spam mkdir/remove
reentrant = true
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
# TODO fix this case, caused by non-DAG trees
if = '!(DEPTH == 3 && LFS_CACHE_SIZE != 64)'
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
define = [
Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.
2020-01-29 07:45:19 +00:00
{FILES=6, DEPTH=1, CYCLES=20},
{FILES=26, DEPTH=1, CYCLES=20},
{FILES=3, DEPTH=3, CYCLES=20},
Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->| parent |-. | gstate | | .-| a |-' | '--------' | X <- child is orphaned | .--------. '>| child |-> | gstate | | a | '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.
2020-01-22 04:18:19 +00:00
]
code = '''
err = lfs_mount(&lfs, &cfg);
if (err) {
lfs_format(&lfs, &cfg) => 0;
lfs_mount(&lfs, &cfg) => 0;
}
srand(1);
const char alpha[] = "abcdefghijklmnopqrstuvwxyz";
for (int i = 0; i < CYCLES; i++) {
// create random path
char full_path[256];
for (int d = 0; d < DEPTH; d++) {
sprintf(&full_path[2*d], "/%c", alpha[rand() % FILES]);
}
// if it does not exist, we create it, else we destroy
int res = lfs_stat(&lfs, full_path, &info);
if (res == LFS_ERR_NOENT) {
// create each directory in turn, ignore if dir already exists
for (int d = 0; d < DEPTH; d++) {
strcpy(path, full_path);
path[2*d+2] = '\0';
err = lfs_mkdir(&lfs, path);
assert(!err || err == LFS_ERR_EXIST);
}
for (int d = 0; d < DEPTH; d++) {
strcpy(path, full_path);
path[2*d+2] = '\0';
lfs_stat(&lfs, path, &info) => 0;
assert(strcmp(info.name, &path[2*d+1]) == 0);
assert(info.type == LFS_TYPE_DIR);
}
} else {
// is valid dir?
assert(strcmp(info.name, &full_path[2*(DEPTH-1)+1]) == 0);
assert(info.type == LFS_TYPE_DIR);
// try to delete path in reverse order, ignore if dir is not empty
for (int d = DEPTH-1; d >= 0; d--) {
strcpy(path, full_path);
path[2*d+2] = '\0';
err = lfs_remove(&lfs, path);
assert(!err || err == LFS_ERR_NOTEMPTY);
}
lfs_stat(&lfs, full_path, &info) => LFS_ERR_NOENT;
}
}
lfs_unmount(&lfs) => 0;
'''