Currently this is just lfs.c and lfs_util.c. Previously this included
the block devices, but this meant all of the scripts needed to
explicitly deselect the block devices to avoid reporting build
size/coverage info on them.
Note that test.py still explicitly adds the block devices for compiling
tests, which is their main purpose. Humorously this means the block
devices will probably be compiled into most builds in this repo anyways.
This was lost in the Travis -> GitHub transition, in serializing some of
the jobs, I missed that we need to clean between tests with different
geometry configurations. Otherwise we end up running outdated binaries,
which explains some of the weird test behavior we were seeing.
Also tweaked a few script things:
- Better subprocess error reporting (dump stderr on failure)
- Fixed a BUILDDIR rule issue in test.py
- Changed test-not-run status to None instead of undefined
Now littlefs's Makefile can work with a custom build directory
for compilation output. Just set the BUILDDIR variable and the Makefile
will take care of the rest.
make BUILDDIR=build size
This makes it very easy to compare builds with different compile-time
configurations or different cross-compilers.
This meant most of code.py's build isolation is no longer needed,
so revisted the scripts and cleaned/tweaked a number of things.
Also bought code.py in line with coverage.py, fixing some of the
inconsistencies that were created while developing these scripts.
One change to note was removing the inline measuring logic, I realized
this feature is unnecessary thanks to GCC's -fkeep-static-functions and
-fno-inline flags.
Since we already have fairly complicated scriptts, I figured it wouldn't
be too hard to use the gcov tools and directly parse their output. Boy
was I wrong.
The gcov intermediary format is a bit of a mess. In version 5.4, a
text-based intermediary format is written to a single .gcov file per
executable. This changed sometime before version 7.5, when it started
writing separate .gcov files per .o files. And in version 9 this
intermediary format has been entirely replaced with an incompatible json
format!
Ironically, this means the internal-only .gcda/.gcno binary format has
actually been more stable than the intermediary format.
Also there's no way to avoid temporary .gcov files generated in the
project root, which risks messing with how test.py runs parallel tests.
Fortunately this looks like it will be fixed in gcov version 9.
---
Ended up switching to lcov, which was the right way to go. lcov handles
all of the gcov parsing, provides an easily parsable output, and even
provides a set of higher-level commands to manage coverage collection
from different runs.
Since this is all provided by lcov, was able to simplify coverage.py
quite a bit. Now it just parses the .info files output by lcov.
Now coverage information can be collected if you provide the --coverage
to test.py. Internally this uses GCC's gcov instrumentation along with a
new script, coverage.py, to parse *.gcov files.
The main use for this is finding coverage info during CI runs. There's a
risk that the instrumentation may make it more difficult to debug, so I
decided to not make coverage collection enabled by default.
Mostly taken from .travis.yml, biggest changes were around how to get
the status updates to work.
We can't use a token on PRs the same way we could in Travis, so instead
we use a second workflow that checks every pull request for "status"
artifacts, and create the actual statuses in the "workflow_run" event,
where we have full access to repo secrets.
Inspired by Linux's Bloat-O-Meter, code_size.py wraps nm to provide
function-level code size, and supports detailed comparison between
different builds.
One difference is that code_size.py invokes littlefs's build system
similarly to test.py, creating a duplicate build in the "sizes"
directory. This makes it easy to monitor a cross-compiled build size
while simultaneously testing on the host machine.
Not sure how this went unnoticed, I guess this is the first bug that
needed in-depth inspection after the a last-minute argument cleanup
in the debug scripts.
- Standardized littlefs debug statements to use hex prefixes and
brackets for printing pairs.
- Removed the entry behavior for readtree and made -t the default.
This is because 1. the CTZ skip-list parsing was broken, which is not
surprising, and 2. the entry parsing was more complicated than useful.
This functionality may be better implemented as a proper filesystem
read script, complete with directory tree dumping.
- Changed test.py's --gdb argument to take [init, main, assert],
this matches the names of the stages in C's startup.
- Added printing of tail to all mdir dumps in readtree/readmdir.
- Added a print for if any mdirs are corrupted in readtree.
- Added debug script side-effects to .gitignore.
- Changed readmdir.py to print the metadata pair and revision count,
which is useful when debugging commit issues.
- Added truncated data view to readtree.py by default. This does mean
readtree.py must read all files on the filesystem to show the
truncated data, hopefully this does not end up being a problem.
- Made overall representation hopefully more readable, including moving
superblock under the root dir, userattrs under files, fixing a gstate
rendering issue.
- Added rendering of soft-tails as dotted-arrows, hopefully this isn't
too noisy.
- Fixed explode_asserts.py off-by-1 in #line mapping caused by a strip
call in the assert generation eating newlines. The script matches
line numbers between the original+modified files by emitting assert
statements that use the same number of lines. An off-by-1 here causes
the entire file to map lines incorrectly, which can be very annoying.
- Added caching to Travis install dirs, because otherwise
pip3 install fails randomly
- Increased size of littlefs-fuse disk because test script has
a larger footprint now
- Skip a couple of reentrant tests under byte-level writes because
the tests just take too long and cause Travis to bail due to no
output for 10m
- Fixed various Valgrind errors
- Suppressed uninit checks for tests where LFS_BLOCK_ERASE_VALUE == -1.
In this case rambd goes uninitialized, which is fine for rambd's
purposes. Note I couldn't figure out how to limit this suppression
to only the malloc in rambd, this doesn't seem possible with Valgrind.
- Fixed memory leaks in exhaustion tests
- Fixed off-by-1 string null-terminator issue in paths tests
- Fixed lfs_file_sync issue caused by revealed by fixing memory leaks
in exhaustion tests. Getting ENOSPC during a file write puts the file
in a bad state where littlefs doesn't know how to write it out safely.
In this case, lfs_file_sync and lfs_file_close return 0 without
writing out state so that device-side resources can still be cleaned
up. To recover from ENOSPC, the file needs to be reopened and the
writes recreated. Not sure if there is a better way to handle this.
- Added some quality-of-life improvements to Valgrind testing
- Fit Valgrind messages into truncated output when not in verbose mode
- Turned on origin tracking
RAM-backed testing is faster than file-backed testing. This is why
test.py uses rambd by default.
So why add support for tmpfs-backed disks if we can already run tests in
RAM? For reentrant testing.
Under reentrant testing we simulate power-loss by forcefully exiting the
test program at specific times. To make this power-loss meaningful, we need to
persist the disk across these power-losses. However, it's interesting to
note this persistence doesn't need to be actually backed by the
filesystem.
It may be possible to rearchitecture the tests to simulate power-loss a
different way, by say, using coroutines or setjmp/longjmp to leave
behind ongoing filesystem operations without terminating the program
completely. But at this point, I think it's best to work with what we
have.
And simply putting the test disks into a tmpfs mount-point seems to
work just fine.
Note this does force serialization of the tests, which isn't required
otherwise. Currently they are only serialized due to limitations in
test.py. If a future change wants to perallelize the tests, it may need
to rework RAM-backed reentrant tests.
Moved .travis.yml over to use the new test framework. A part of this
involved testing all of the configurations ran on the old framework
and deciding which to carry over. The new framework duplicates some of
the cases tested by the configurations so some configurations could be
dropped.
The .travis.yml includes some extreme ones, such as no inline files,
relocations every cycle, no intrinsics, power-loss every byte, unaligned
block_count and lookahead, and odd read_sizes.
There were several configurations were some tests failed because of
limitations in the tests themselves, so many conditions were added
to make sure the configurations can run on as many tests as possible.
Added indention so there was a more clear separation between the tag
description and tag data.
Also took the best parts of readmdir.py and added it to readtree.py.
Initially I was thinking it was best for these to have completely
independent data representations, since you could always call readtree
to get more info, but this becomes tedius when needed to look at
low-level tag info across multiple directories on the filesystem.
It's interesting how many ways block devices can show failed writes:
1. prog can error
2. erase can error
3. read can error after writing (ECC failure)
4. prog doesn't error but doesn't write the data correctly
5. erase doesn't error but doesn't erase correctly
Can read fail without an error? Yes, though this appears the same as
prog and erase failing.
These weren't all simulated by testbd since I unintentionally assumed
the block device could always error. Fixed by added additional bad-black
behaviors to testbd.
Note: This also includes a small fix where we can miss bad writes if the
underlying block device contains a valid commit with the exact same
size in the exact same offset.
Fixes:
- Fixed reproducability issue when we can't read a directory revision
- Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
- Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
- Fixed cleanup issue if we run out of space while extending a CTZ skip-list
- Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan
Also:
- Added cycle-detection to readtree.py
- Allowed pseudo-C expressions in test conditions (and it's
beautifully hacky, see line 187 of test.py)
- Better handling of ctrl-C during test runs
- Added build-only mode to test.py
- Limited stdout of test failures to 5 lines unless in verbose mode
Explanation of fixes below
1. Fixed reproducability issue when we can't read a directory revision
An interesting subtlety of the block-device layer is that the
block-device is allowed to return LFS_ERR_CORRUPT on reads to
untouched blocks. This can easily happen if a user is using ECC or
some sort of CMAC on their blocks. Normally we never run into this,
except for the optimization around directory revisions where we use
uninitialized data to start our revision count.
We correctly handle this case by ignoring whats on disk if the read
fails, but end up using unitialized RAM instead. This is not an issue
for normal use, though it can lead to a small information leak.
However it creates a big problem for reproducability, which is very
helpful for debugging.
I ended up running into a case where the RAM values for the revision
count was different, causing two identical runs to wear-level at
different times, leading to one version running out of space before a
bug occured because it expanded the superblock early.
2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size
This could be caused if the previous tag was a valid commit and we
lost power causing a partially written tag as the start of a new
commit.
Fortunately we already have a separate condition for exceeding the
block size, so we can force that case to always treat the mdir as
unerased.
3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to
outline a file in lfs_file_sync
Most operations involving metadata-pairs treat the mdir struct as
entirely temporary and throw it out if any error occurs. Except for
lfs_file_sync since the mdir is also a part of the file struct.
This is relevant because of a cleanup issue in lfs_dir_compact that
usually doesn't have side-effects. The issue is that lfs_fs_relocate
can fail. It needs to allocate new blocks to relocate to, and as the
disk reaches its end of life, it can fail with ENOSPC quite often.
If lfs_fs_relocate fails, the containing lfs_dir_compact would return
immediately without restoring the previous state of the mdir. If a new
commit comes in on the same mdir, the old state left there could
corrupt the filesystem.
It's interesting to note this is forced to happen in lfs_file_sync,
since it always tries to outline the file if it gets ENOSPC (ENOSPC
can mean both no blocks to allocate and that the mdir is full). I'm
not actually sure this bit of code is necessary anymore, we may be
able to remove it.
4. Fixed cleanup issue if we run out of space while extending a CTZ
skip-list
The actually CTZ skip-list logic itself hasn't been touched in more
than a year at this point, so I was surprised to find a bug here. But
it turns out the CTZ skip-list could be put in an invalid state if we
run out of space while trying to extend the skip-list.
This only becomes a problem if we keep the file open, clean up some
space elsewhere, and then continue to write to the open file without
modifying it. Fortunately an easy fix.
5. Fixed missing half-orphans when allocating blocks during
lfs_fs_deorphan
This was a really interesting bug. Normally, we don't have to worry
about allocations, since we force consistency before we are allowed
to allocate blocks. But what about the deorphan operation itself?
Don't we need to allocate blocks if we relocate while deorphaning?
It turns out the deorphan operation can lead to allocating blocks
while there's still orphans and half-orphans on the threaded
linked-list. Orphans aren't an issue, but half-orphans may contain
references to blocks in the outdated half, which doesn't get scanned
during the normal allocation pass.
Fortunately we already fetch directory entries to check CTZ lists, so
we can also check half-orphans here. However this causes
lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do
about this yet.
- Removed old tests and test scripts
- Reorganize the block devices to live under one directory
- Plugged new test framework into Makefile
renamed:
- scripts/test_.py -> scripts/test.py
- tests_ -> tests
- {file,ram,test}bd/* -> bd/*
It took a surprising amount of effort to make the Makefile behave since
it turns out the "test_%" rule could override "tests/test_%.toml.test"
which is generated as part of test.py.
The root of the problem was the notorious Python quirk with mutable
default parameters. The default defines for the TestSuite class ended
up being mutated as the class determined the permutations to test,
corrupting other test's defines.
However, the only define that was mutated this way was the CACHE_SIZE
config in test_entries.
The crazy thing was how this small innocuous change would cause
"./scripts/test.py -nr test_relocations" and "./scripts/test.py -nr"
to drift out of sync only after a commit spanning the different cache
sizes would be written out with a different number of prog calls. This
offset the power-cycle counter enough to cause one case to make it to
an erase, and the other to not.
Normally, the difference between a successful/unsuccessful erase
wouldn't change the result of a test, but in this case it offset the
revision count used for wear-leveling, causing one run run expand the
superblock and the other to not.
This change to the filesystem would then propogate through the rest of
the test, making it difficult to reproduce test failures.
Fortunately the fix was to just make a copy of the default define
dictionary. This should also prevent accidently mutating of dicts
belonging to our caller.
Oh, also fixed a buffer overflow in test_files.
Normally I wouldn't consider optimizing this sort of script, but
explode_asserts.py proved to be terribly inefficient and dominated
the build time for running tests. It was slow enough to be distracting
when attempting to test patches while debugging. Just running
explode_asserts.py was ~10x slower than the rest of the compilation
process.
After implementing a proper tokenizer and switching to a handwritten
recursive descent parser, I was able to speed up explode_asserts.py
by ~5x and make test compilation much more tolerable.
I don't think this was a limitaiton of parsy, but rather switching
to a recursive descent parser made it much easier to find the hotspots
where parsing was wasting cycles (string slicing for one).
It's interesting to note that while the assert patterns can be parsed
with a LL(1) parser (by dumping seen tokens if a pattern fails),
I didn't bother as it's much easier to write the patterns with LL(k)
and parsing asserts is predicated by the "assert" string.
A few other tweaks:
- allowed combining different test modes in one run
- added a --no-internal option
- changed test_.py to start counting cases from 1
- added assert(memcmp(a, b) == 0) matching
- added better handling of string escapes in assert messages
time to run tests:
before: 1m31.122s
after: 0m41.447s
The power-cycled-relocation test with random renames has been the most
aggressive test applied to littlefs so far, with:
- Random nested directory creation
- Random nested directory removal
- Random nested directory renames (this could make the
threaded linked-list very interesting)
- Relocating blocks every write (maximum wear-leveling)
- Incrementally cycling power every write
Also added a couple other tests to test_orphans and test_relocations.
The good news is the added testing worked well, it found quite a number
of complex and subtle bugs that have been difficult to find.
1. It's actually possible for our parent to be relocated and go out of
sync in lfs_mkdir. This can happen if our predecessor's predecessor
is our parent as we are threading ourselves into the filesystem's
threaded list. (note this doesn't happen if our predecessor _is_ our
parent, as we then update our parent in a single commit).
This is annoying because it only happens if our parent is a long (>1
pair) directory, otherwise we wouldn't need to catch relocations.
Fortunately we can reuse the internal open file/dir linked-list to
catch relocations easily, as long as we're careful to unhook our
parent whenever lfs_mkdir returns.
2. Even more surprising, it's possible for the child in lfs_remove
to be relocated while we delete the entry from our parent. This
can happen if we are our own parent's predecessor, since we need
to be updated then if our parent relocates.
Fortunately we can also hook into the open linked-list here.
Note this same issue was present in lfs_rename.
Fortunately, this means now all fetched dirs are hooked into the
open linked-list if they are needed across a commit. This means
we shouldn't need assumptions about tree movement for correctness.
3. lfs_rename("deja/vu", "deja/vu") with the same source and destination
was broken and tried to delete the entry twice.
4. Managing gstate deltas when we lose power during relocations was
broken. And unfortunately complicated.
The issue happens when we lose power during a relocation while
removing a directory.
When we remove a directory, we need to move the contents of its
gstate delta to another directory or we'll corrupt littlefs gstate.
(gstate is an xor of all deltas on the filesystem). We used to just
xor the gstate into our parent's gstate, however this isn't correct.
The gstate isn't built out of the directory tree, but rather out of
the threaded linked-list (which exists to make collecting this
gstate efficient).
Because we have to remove our dir in two operations, there's a point
were both the updated parent and child can exist in threaded
linked-list and duplicate the child's gstate delta.
.--------.
->| parent |-.
| gstate | |
.-| a |-'
| '--------'
| X <- child is orphaned
| .--------.
'>| child |->
| gstate |
| a |
'--------'
What we need to do is save our child's gstate and only give it to our
predecessor, since this finalizes the removal of the child.
However we still need to make valid updates to the gstate to mark
that we've created an orphan when we start removing the child.
This led to a small rework of how the gstate is handled. Now we have
a separation of the gpending state that should be written out ASAP
and the gdelta state that is collected from orphans awaiting
deletion.
5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing
more than one orphan after a power-cycle. Having more than one orphan
is very rare, but of course very possible. Fortunately this was just
a mistake with using a break the in the deorphan, perhaps left from
v1 where multiple orphans weren't possible?
Note that we use a continue to force a refetch of the orphaned block.
This is needed in the case of a half-orphan, since the fetched
half-orphan may have an outdated tail pointer.
The root of the problem was some assumptions about what tags could be
sent to lfs_dir_commit.
- The first assumption is that there could be only one splice (create/delete)
tag at a time, which is trivially broken by the core commit in lfs_rename.
- The second assumption is that there is at most one create and one delete in
a single commit. This is less obvious but turns out to not be true in
the case that we rename a file such that it overwrites another file in
the same directory (1 delete for source file, 1 delete for destination).
- The third assumption was that there was an ordering to the
delete/creates passed to lfs_dir_commit. It may be possible to force all
deletes to follow creates by rearranging the tags in lfs_rename, but
this risks overflowing tag ids.
The way the lfs_dir_commit first collected the "deletetag" and "createtag"
broke all three of these assumptions. And because we lose the ordering
information we can no longer apply the directory changes to open files
correctly. The file ids may be shifted in a way that doesn't reflect the
actual operations on disk.
These problems were made worst by lfs_dir_commit cleaning up moves
implicitly, which also creates deletes implicitly. While cleaning up moves
in lfs_dir_commit may save some code size, it makes the commit logic much more
difficult to implement correctly.
This bug turned into pulling out a dead tree stump, roots and all.
I ended up reworking how lfs_dir_commit updates open files so that it
has less assumptions, now it just traverses the commit tags multiple
times in order to update file ids after a successful commit in the
correct order.
This also got rid of the dir copy by carefully updating split dirs
after all files have an up-to-date copy of the original dir.
I also just removed the implicit move cleanup. It turns out the only
commits that can occur before we have cleaned up the move is in
lfs_fs_relocate, so it was simple enough to explicitly handle this case
when we update our parent and pred during a relocate.
Cases where we may need to fix moves:
- In lfs_rename when we move a file/dir
- In lfs_demove if we lose power
- In lfs_fs_relocate if we have to relocate our parent and we find it
had a pending move (or else the move will be outdated)
- In lfs_fs_relocate if we have to relocate our predecessor and we find it
had a pending move (or else the move will be outdated)
Note the two cases in lfs_fs_relocate may be recursive. But
lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not
possible for pending moves to spill out into other filesystem commits
And of couse, I added several tests to cover these situations. Hopefully
the rename-with-open-files logic should be fairly locked down now.
found with initial fix by eastmoutain
Also fixed a bug in dir splitting when there's a large number of open
files, which was the main reason I was trying to make it easier to debug
disk images.
One part of the recent test changes was to move away from the
file-per-block emubd and instead simulate storage with a single
contiguous file. The file-per-block format was marginally useful
at the beginning, but as the remaining bugs get more subtle, it
becomes more useful to inspect littlefs through scripts that
make the underlying metadata more human-readable.
The key benefit of switching to a contiguous file is these same
scripts can be reused for real disk images and can even read through
/dev/sdb or similar.
- ./scripts/readblock.py disk block_size block
off data
00000000: 71 01 00 00 f0 0f ff f7 6c 69 74 74 6c 65 66 73 q.......littlefs
00000010: 2f e0 00 10 00 00 02 00 00 02 00 00 00 04 00 00 /...............
00000020: ff 00 00 00 ff ff ff 7f fe 03 00 00 20 00 04 19 ...............
00000030: 61 00 00 0c 00 62 20 30 0c 09 a0 01 00 00 64 00 a....b 0......d.
...
readblock.py prints a hex dump of a given block on disk. It's basically
just "dd if=disk bs=block_size count=1 skip=block | xxd -g1 -" but with
less typing.
- ./scripts/readmdir.py disk block_size block1 block2
off tag type id len data (truncated)
0000003b: 0020000a dir 0 10 63 6f 6c 64 63 6f 66 66 coldcoff
00000049: 20000008 dirstruct 0 8 02 02 00 00 03 02 00 00 ........
00000008: 00200409 dir 1 9 68 6f 74 63 6f 66 66 65 hotcoffe
00000015: 20000408 dirstruct 1 8 fe 01 00 00 ff 01 00 00 ........
readmdir.py prints info about the tags in a metadata pair on disk. It
can print the currently active tags as well as the raw log of the
metadata pair.
- ./scripts/readtree.py disk block_size
superblock "littlefs"
version v2.0
block_size 512
block_count 1024
name_max 255
file_max 2147483647
attr_max 1022
gstate 0x000000000000000000000000
dir "/"
mdir {0x0, 0x1} rev 3
v id 0 superblock "littlefs" inline size 24
mdir {0x77, 0x78} rev 1
id 0 dir "coffee" dir {0x1fc, 0x1fd}
dir "/coffee"
mdir {0x1fd, 0x1fc} rev 2
id 0 dir "coldcoffee" dir {0x202, 0x203}
id 1 dir "hotcoffee" dir {0x1fe, 0x1ff}
dir "/coffee/coldcoffee"
mdir {0x202, 0x203} rev 1
dir "/coffee/warmcoffee"
mdir {0x200, 0x201} rev 1
readtree.py parses the littlefs tree and prints info about the
semantics of what's on disk. This includes the superblock,
global-state, and directories/metadata-pairs. It doesn't print
the filesystem tree though, that could be a different tool.
Also finished migrating tests with test_relocations and test_exhaustion.
The issue I was running into when migrating these tests was a lack of
flexibility with what you could do with the block devices. It was
possible to hack in some hooks for things like bad blocks and power
loss, but it wasn't clean or easily extendable.
The solution here was to just put all of these test extensions into a
third block device, testbd, that uses the other two example block
devices internally.
testbd has several useful features for testing. Note this makes it a
pretty terrible block device _example_ since these hooks look more
complicated than a block device needs to be.
- testbd can simulate different erase values, supporting 1s, 0s, other byte
patterns, or no erases at all (which can cause surprising bugs). This
actually depends on the simulated erase values in ramdb and filebd.
I did try to move this out of rambd/filebd, but it's not possible to
simulate erases in testbd without buffering entire blocks and creating
an excessive amount of extra write operations.
- testbd also helps simulate power-loss by containing a "power cycles"
counter that is decremented every write operation until it calls exit.
This is notably faster than the previous gdb approach, which is
valuable since the reentrant tests tend to take a while to resolve.
- testbd also tracks wear, which can be manually set and read. This is
very useful for testing things like bad block handling, wear leveling,
or even changing the effective size of the block device at runtime.
Even with adding better reentrance testing, the bad-block tests are
still very useful at isolating the block eviction logic.
This also required rewriting a bit of the internal testing wirework
to allow custom block devices which opens up quite a bit more straegies
for testing.
Both test_move and test_orphan needed internal knowledge which comes
with the addition of the "in" attribute. This was in the plan for the
test-revamp from the beginning as it really opens up the ability to
write more unit-style-tests using internal knowledge of how littlefs
works. More unit-style-tests should help _fix_ bugs by limiting the
scope of the test and where the bug could be hiding.
The "in" attribute effectively runs tests _inside_ the .c file
specified, giving the test access to all static members without
needed to change their visibility.
This involved some minor tweaks for the various types of tests, added
predicates to the test framework (necessary for test_entries and
test_alloc), and cleaned up some of the testing semantics such as
reporting how many tests are filtered, showing permutation config on
the result screen, and properly inheriting suite config in cases.
The idea behind emubd (file per block), was neat, but doesn't add much
value over a block device that just operates on a single linear file
(other than adding a significant amount of overhead). Initially it
helped with debugging, but when the metadata format became more complex
in v2, most debugging ends up going through the debug.py script anyways.
Aside from being simpler, moving to filebd means it is also possible to
mount disk images directly.
Also introduced rambd, which keeps the disk contents in RAM. This is
very useful for testing where it increases the speed _significantly_.
- test_dirs w/ filebd - 0m7.170s
- test_dirs w/ rambd - 0m0.966s
These follow the emubd model of using the lfs_config for geometry. I'm
not convinced this is the best approach, but it gets the job done.
I've also added lfs_ramdb_createcfg to add additional config similar to
lfs_file_opencfg. This is useful for specifying erase_value, which tells
the block device to simulate erases similar to flash devices. Note that
the default (-1) meets the minimum block device requirements and is the
most performant.
Aside from reworking the internals of test_.py to work well with
inherited TestCase classes, this also provides the two main features
that were the main reason for revamping the test framework
1. ./scripts/test_.py --reentrant
Runs reentrant tests (tests with reentrant=true in the .toml
configuration) under gdb such that the program is killed on every
call to lfs_emubd_prog or lfs_emubd_erase.
Currently this just increments a number of prog/erases to skip, which
means it doesn't necessarily check every possible branch of the test,
but this should still provide a good coverage of power-loss tests.
2. ./scripts/test_.py --gdb
Run the tests and if a failure is hit, drop into GDB. In theory this
will be very useful for reproducing and debugging test failures.
Note this can be combined with --reentrant to drop into GDB on the
exact cycle of power-loss where the tests fail.
- Reworked how permutations work
- Now with global defines as well (apply to all code)
- Also supports lists of different permutation sets
- Added better cleanup in tests and "make clean"
This is the start of reworking littlefs's testing framework based on
lessons learned from the initial testing framework.
1. The testing framework needs to be _flexible_. It was hacky, which by
itself isn't a downside, but it wasn't _flexible_. This limited what
could be done with the tests and there ended up being many
workarounds just to reproduce bugs.
The idea behind this revamped framework is to separate the
description of tests (tests/test_dirs.toml) and the running of tests
(scripts/test.py).
Now, with the logic moved entirely to python, it's possible to run
the test under varying environments. In addition to the "just don't
assert" run, I'm also looking to run the tests in valgrind for memory
checking, and an environment with simulated power-loss.
The test description can also contain abstract attributes that help
control how tests can be ran, such as "leaky" to identify tests where
memory leaks are expected. This keeps test limitations at a minimum
without limiting how the tests can be ran.
2. Multi-stage-process tests didn't really add value and limited what
the testing environment.
Unmounting + mounting can be done in a single process to test the
same logic. It would be really difficult to make this fail only
when memory is zeroed, though that can still be caught by
power-resilient tests.
Requiring every test to be a single process adds several options
for test execution, such as using a RAM-backed block device for
speed, or even running the tests on a device.
3. Added fancy assert interception. This wasn't really a requirement,
but something I've been wanting to experiment with for a while.
During testing, scripts/explode_asserts.py is added to the build
process. This is a custom C-preprocessor that parses out assert
statements and replaces them with _very_ verbose asserts that
wouldn't normally be possible with just C macros.
It even goes as far as to report the arguments to strcmp, since the
lack of visibility here was very annoying.
tests_/test_dirs.toml:186:assert: assert failed with "..", expected eq "..."
assert(strcmp(info.name, "...") == 0);
One downside is that simply parsing C in python is slower than the
entire rest of the compilation, but fortunately this can be
alleviated by parallelizing the test builds through make.
Other neat bits:
- All generated files are a suffix of the test description, this helps
cleanup and means it's (theoretically) possible to parallelize the
tests.
- The generated test.c is shoved base64 into an ad-hoc Makefile, this
means it doesn't force a rebuild of tests all the time.
- Test parameterizing is now easier.
- Hopefully this framework can be repurposed also for benchmarks in the
future.
This is a minor quality of life change to help debugging, specifically
when debugging test failures.
Before, the test framework used hex, while the log output used decimal.
This was slightly annoying to convert between.
Why not output lengths/offset in hex? I don't have a big reason. I find
it easier to reason about lengths in decimal and ids (such as addresses
or block numbers) in hex. But this may just be me.
- Now test errors have correct line reporting! #line directives
are passed to the compiler that reference the relevant line in
the test case shell script.
--- Multi-block directory ---
./tests/test_dirs.sh:109: assert failed with 0, expected 1
lfs_unmount(&lfs) => 1
- Cleaned up the number of implicit global variables provided to
tests. A lot of these were infrequently used and made it difficult
to remember what was provided. This isn't an MCU, so there's very
little cost to stack allocations when needed.
- Minimized the results.py script (previously stats.py) output to
match minimization of test output.