Byte-level writes are expensive and not suggested (caches >= 4 bytes
make much more sense), however there are many corner cases with
byte-level writes that can be easy to miss (power-loss leaving single
bytes written to disk).
Unfortunately, byte-level writes mixed with power-loss testing, the
Travis infrastructure, and Arm Thumb instruction set simulation
exceeds the 50-minute budget Travis allocates for jobs.
For now I'm disabling the byte-level tests under Qemu, with the hope that
performance improvements in littlefs will let us turn these tests back
on in the future.
- Added caching to Travis install dirs, because otherwise
pip3 install fails randomly
- Increased size of littlefs-fuse disk because test script has
a larger footprint now
- Skip a couple of reentrant tests under byte-level writes because
the tests just take too long and cause Travis to bail due to no
output for 10m
- Fixed various Valgrind errors
- Suppressed uninit checks for tests where LFS_BLOCK_ERASE_VALUE == -1.
In this case rambd goes uninitialized, which is fine for rambd's
purposes. Note I couldn't figure out how to limit this suppression
to only the malloc in rambd, this doesn't seem possible with Valgrind.
- Fixed memory leaks in exhaustion tests
- Fixed off-by-1 string null-terminator issue in paths tests
- Fixed lfs_file_sync issue caused by revealed by fixing memory leaks
in exhaustion tests. Getting ENOSPC during a file write puts the file
in a bad state where littlefs doesn't know how to write it out safely.
In this case, lfs_file_sync and lfs_file_close return 0 without
writing out state so that device-side resources can still be cleaned
up. To recover from ENOSPC, the file needs to be reopened and the
writes recreated. Not sure if there is a better way to handle this.
- Added some quality-of-life improvements to Valgrind testing
- Fit Valgrind messages into truncated output when not in verbose mode
- Turned on origin tracking
The core of littlefs's CI testing is the full test suite, `make test`, run
under a number of configurations:
- Processor architecture:
- x86 (native)
- Arm Thumb
- MIPS
- PowerPC
- Storage geometry:
- rs=16 ps=16 cs=64 bs=512 (default)
- rs=1 ps=1 cs=64 bs=4KiB (NOR flash)
- rs=512 ps=512 cs=512 bs=512 (eMMC)
- rs=4KiB ps=4KiB cs=4KiB bs=32KiB (NAND flash)
- Other corner cases:
- no intrinsics
- no inline
- byte-level read/writes
- single block-cycles
- odd block counts
- odd block sizes
The number of different configurations we need to test quickly exceeds the
50 minute time limit Travis has on jobs. Fortunately, we can split these
tests out into multiple jobs. This seems to be the intended course of
action for large CI "builds" in Travis, as this gives Travis a finer
grain of control over limiting builds.
Unfortunately, this created a couple issues:
1. The Travis configuration isn't actually that flexible. It allows a
single "matrix expansion" which can be generated from top-level lists
of different configurations. But it doesn't let you generate a matrix
from two seperate environment variable lists (for arch + geometry).
Without multiple matrix expansions, we're stuck writing out each test
permutation by hand.
On the bright-side, this was a good chance to really learn how YAML
anchors work. I'm torn because on one hand anchors add what feels
like unnecessary complexity to a config language, on the other hand,
they did help quite a bit in working around Travis's limitations.
2. Now that we have 47 jobs instead of 7, reporting a separate status
for each job stops making sense.
What I've opted for here is to use a special NAME variable to
deduplicate jobs, and used a few state-less rules to hopefully have
the reported status make sense most of the time.
- Overwrite "pending" statuses so that the last job to start owns the
most recent "pending" status
- Don't overwrite "failure" statuses unless the job number matches
our own (in the case of CI restarts)
- Don't write "success" statuses unless the job number matches our
own, this should delay a green check-mark until the last-to-start
job finishes
- Always overwrite non-failures with "failure" statuses
This does mean a temporary "success" may appear if the last job
terminates before earlier jobs. But this is the simpliest solution
I can think of without storing some complex state somewhere.
Note we can only report the size this way because it's cheap to
calculate in every job.
RAM-backed testing is faster than file-backed testing. This is why
test.py uses rambd by default.
So why add support for tmpfs-backed disks if we can already run tests in
RAM? For reentrant testing.
Under reentrant testing we simulate power-loss by forcefully exiting the
test program at specific times. To make this power-loss meaningful, we need to
persist the disk across these power-losses. However, it's interesting to
note this persistence doesn't need to be actually backed by the
filesystem.
It may be possible to rearchitecture the tests to simulate power-loss a
different way, by say, using coroutines or setjmp/longjmp to leave
behind ongoing filesystem operations without terminating the program
completely. But at this point, I think it's best to work with what we
have.
And simply putting the test disks into a tmpfs mount-point seems to
work just fine.
Note this does force serialization of the tests, which isn't required
otherwise. Currently they are only serialized due to limitations in
test.py. If a future change wants to perallelize the tests, it may need
to rework RAM-backed reentrant tests.
Moved .travis.yml over to use the new test framework. A part of this
involved testing all of the configurations ran on the old framework
and deciding which to carry over. The new framework duplicates some of
the cases tested by the configurations so some configurations could be
dropped.
The .travis.yml includes some extreme ones, such as no inline files,
relocations every cycle, no intrinsics, power-loss every byte, unaligned
block_count and lookahead, and odd read_sizes.
There were several configurations were some tests failed because of
limitations in the tests themselves, so many conditions were added
to make sure the configurations can run on as many tests as possible.
Sometimes small, single line code change hides behind it a complicated
story. This is one of those times.
If you look at this diff, you may note that this is a case of
lfs_dir_fetchmatch not correctly handling a tag that invalidates a
callback used to search for some condition, in this case a search for a
parent, which is invalidated by a later dir tag overwritting the
previous dir pair.
But how can this happen? Dir-pair-tags are only overwritten during
relocations (when a block goes bad or exceeds the block_cycles config
option for dynamic wear-leveling). Other dir operations create new
directory entries. And the only lfs_dir_fetchmatch condition that relies
on overwrites (as opposed to proper deletes) is when we need to find a
directory's parent, an operation that only occurs during a _different_
relocation. And a false _positive_, can only happen if we don't have a
parent. Which is really unlikely when we search for directory parents!
This bug and minimal test case was found by Matthew Renzelmann. In a
unfortunate series of events, first a file creation causes a directory
split to occur. This creates a new, orphaned metadata-pair containing
our new file. However, the revision count on this metadata-pair
indicates the pair is due for relocation as a part of wear-leveling.
Normally, this is fine, even though this metadata-pair has no parent,
the lfs_dir_find should return ENOENT and continue without error.
However, here we get hit by our fetchmatch bug. A previous, unrelated
relocation overwrites a pair which just happens to contain the block
allocated for a new metadata-pair. When we search for a parent,
lfs_dir_fetchmatch incorrectly finds this old, outdated metadata pair
and incorrectly tells our orphan it's found its parent.
As you can imagine the orphan's dissapointment must be immense.
So an unfortunately timed dir split triggers a relocation which
incorrectly finds a previously written parent that has been outdated
by another relocation.
As a solution we can outdate our found tag if it is overwritten by
an exact match during lfs_dir_fetchmatch.
As a part of this I started adding a new set of tests: tests/test_relocations,
for aggressive relocations tests. This is already by appended to by
another PR. I suspect relocations is relatively under-tested and is
becoming more important due to recent improvements in wear-leveling.
This is a result of feedback that the current release notes made it too
difficult to see what changes happened on patch releases. From my
experience as well it became difficult to chase down which release a
commit landed on.
The risk is that this creates additional noise, both for the release
page and for user notifications. I am open to feedback if this causes a
problem.
Other tweaks on the CI side, these came from iteration with the same
scheme for coru and equeue:
- Changed version branch updates to be atomic (vN and vN-prefix). This
makes it a bit easier to fix if one of the pushes fails due to a rogue
branch with the same name.
- Added GEKY_BOT_DRAFT as a CI macro that can optionally switch between
only creating drafts or immediately posting a release. The default is
what I will be trying with littlefs which is to draft minor/major
releases, but automatically create patch release.
The real benefit of automatic releases is to use on tiny repos that
don't really have an active maintainer. Though this is definitely no
longer the case with littlefs, and I'm happy it has gained this much
attention.
This is primarily to get better test coverage over devices with very
large erase/prog/read sizes. The unfortunate state of the tests is
that most of them rely on a specific block device size, so that
ENOSPC and ECORRUPT errors occur in specific situations.
This should be improved in the future, but at least for now we can
open up some of the simpler tests to run on these different
configurations.
Also added testing over both 0x00 and 0xff erase values in emubd.
Also added a number of small file tests that expose issues prevalent
on NAND devices.
Both the littlefs-fuse and littlefs-migration test jobs depend on
the external littlefs-fuse repo. But unfortunately, the automatic
patching to update the external repo with the version under test
does not work with the prefix branches.
In this case we can just skip these tests, they've already been tested
multiple times to get to this point.
The script itself is a part of .travis.yml, using ./scripts/prefix.py
for applying prefixes to the source code.
This purpose of the automatic job is to provide a branch containing
version prefixes, to avoid name conflicts in binaries containing
different major versions of littlefs with only a git clone.
As a part of each release, two branches and a tag are created:
- vN - moving branch
- vN-prefix - moving branch
- vN.N.N - immutable tag
The major version branch (vM) is created on major releases, but updated
every patch release. The patch version tag (vM.M.P) is created every
patch release. Patch releases occur every time a commit is merged into
master, though multiple merges may be coalesced.
The major prefix branch (vM-prefix) is modified with the ./scripts/prefix.py
script. Note that this branch is updated as a synthetic merge commit
with the previous history of vM-prefix. The reason for this is to allow
users to easily update vM-prefix with a `git pull` as they would for
other branches.
A---B---C---D---E master, v1, v1.7.3
\ \ \
F-------G---H v1-prefix
This is the help the introduction of littlefs v2, which is disk
incompatible with littlefs v1. While v2 can't mount v1, what we can
do is provide an optional migration, which can convert v1 into v2
partially in-place.
At worse, we only need to carry over the readonly operations on v1,
which are much less complicated than the write operations, so the extra
code cost may be as low as 25% of the v1 code size. Also, because v2
contains only metadata changes, it's possible to avoid copying file
data during the update.
Enabling the migration requires two steps
1. Defining LFS_MIGRATE
2. Call lfs_migrate (only available with the above macro)
Each macro multiplies the number of configurations needed to be tested,
so I've been avoiding macro controlled features since there's still work
to be done around testing the single configuration that's already
available. However, here the cost would be too high if we included migration
code in the standard build. We can't use the lfs_migrate function for
link time gc because of a dependency between the allocator and v1 data
structures.
So how does lfs_migrate work? It turned out to be a bit complicated, but
the answer is a multistep process that relies on mounting v1 readonly and
building the metadata skeleton needed by v2.
1. For each directory, create a v2 directory
2. Copy over v1 entries into v2 directory, including the soft-tail entry
3. Move head block of v2 directory into the unused metadata block in v1
directory. This results in both a v1 and v2 directory sharing the
same metadata pair.
4. Finally, create a new superblock in the unused metadata block of the
v1 superblock.
Just like with normal metadata updates, the completion of the write to
the second metadata block marks a succesful migration that can be
mounted with littlefs v2. And all of this can occur atomically, enabling
complete fallback if power is lost of an error occurs.
Note there are several limitations with this solution.
1. While migration doesn't duplicate file data, it does temporarily
duplicate all metadata. This can cause a device to run out of space if
storage is tight and the filesystem as many files. If the device was
created with >~2x the expected storage, it should be fine.
2. The current implementation is not able to recover if the metadata
pairs develop bad blocks. It may be possilbe to workaround this, but
it creates the problem that directories may change location during
the migration. The other solutions I've looked at are complicated and
require superlinear runtime. Currently I don't think it's worth
fixing this limitation.
3. Enabling the migration requires additional code size. Currently this
looks like it's roughly 11% at least on x86.
And, if any failure does occur, no harm is done to the original v1
filesystem on disk.
This was an oversight on my part when adding strict ordering to
directories. Unfortunately now we can't take advantage of the atomic
creation of tail+dir entries. Now we need to first create the tail, then
create the actually directory entry. If we lose power, the orphan is
cleaned up like orphans created during remove.
Note that we still take advantage of the atomic tail+dir entries if we
are an end block. This is actually because this corner case is
complicated to _not_ do atomically, needing to update the directory we
just committed to.
The fact that the lookahead buffer uses bits instead of bytes is an
internal detail. Poking this through to the user API has caused a decent
amount of confusion. Most buffers are provided as bytes and the
inconsistency here can be surprising.
The use of bytes instead of bits also makes us forward compatible in
the case that we want to change the lookahead internal representation
(hint segment list).
Additionally, we change the configuration name to lookahead_size. This
matches other configurations, such as cache_size and read_size, while
also notifying the user that something important changed at compile time
(by breaking).
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
This is a downside caused by relying on and external repo for testing,
but also storing the CI configuration inside this repo. Fortunately we
can use a temporary v2-alpha branch in the FUSE repo mirroring the
v2-alpha branch for testing.
The introduction of an explicit cache_size configuration allows
customization of the cache buffers independently from the hardware
read/write sizes.
This has been one of littlefs's main handicaps. Without a distinction
between cache units and hardware limitations, littlefs isn't able to
read or program _less_ than the cache size. This leads to the
counter-intuitive case where larger cache sizes can actually be harmful,
since larger read/prog sizes require sending more data over the bus if
we're only accessing a small set of data (for example the CTZ skip-list
traversal).
This is compounded with metadata logging, since a large program size
limits the number of commits we can write out in a single metadata
block. It really doesn't make sense to link program size + cache
size here.
With a separate cache_size configuration, we can be much smarter about
what we actually read/write from disk.
This also simplifies cache handling a bit. Before there were two
possible cache sizes, but these were rarely used. Note that the
cache_size is NOT written to the superblock and can be freely changed
without breaking backwards compatibility.
Before, release notes with a list of changes were created every
patch release. Unfortunately, it looks like this will create a lot of
noise on github, with a notification every patch release, which may be
as often as every time a PR is merged.
Rather than creating all of this noise for relatively uninteresting
changes, the script will now stick to simple tags, and create the
release notes only on minor releases.
I think this is what several of you were originally suggesting,
sorry about the journey, at least I learned a lot.
Fetching all tags was triggering the pagination system inside the github
API. This prevent version tags from being found.
Modified to use the version tag prefix in the ref lookup, however this
still may cause an issue if there are still enough patch releases to trigger
pagination.
Simpleish solution is to grab the link header to jump to the last page,
since pagination results appear to be in sorted order.
Previously, littlefs had mutable versions. That is, anytime a new commit
landed on master, the bot would update the most recent version to
contain the patch. The idea was that this would make sure users always
had the most recent bug fixes. Immutable snapshots could be accessed
through the git hashes.
However, at this point multiple developers have pointed out that this is
confusing, with mutable versions being non-standard and surprising.
This new release process adopts SemVer in its entirety, with
incrementing patch numbers and immutable versions.
When a new commit lands on master:
1. The major/minor version is taken from lfs.h
2. The most recent patch version is looked up on GitHub and incremented
3. A changelog is built out of the commits to the previous version
4. A new release is created on GitHub
Additionally, any commits that land while CI is still running are
coalesced together. Which means multiple PRs can land in a single
release.
This was causing code sizes to be reported with several of the logging
functions still built in. A useful number, but not the minimum
achievable code size.
Using gcc cross compilers and qemu:
- make test CC="arm-linux-gnueabi-gcc --static -mthumb" EXEC="qemu-arm"
- make test CC="powerpc-linux-gnu-gcc --static" EXEC="qemu-ppc"
- make test CC="mips-linux-gnu-gcc --static" EXEC="qemu-mips"
Also separated out Travis jobs and added some size reporting
The most useful part of -Werror is preventing code from being
merged that has warnings. However it is annoying for users who may have
different compilers with different warnings. Limiting -Werror to CI only
covers the main concern about warnings without limiting users.
This was a small hole in the logic that handles initializing the
lookahead buffer. To imitate exhaustion (so the block allocator
will trigger a scan), the lookahead buffer is rewound a full
lookahead and set up to look like it is exhausted. However,
unlike normal allocation, this rewind was not kept aligned to
a multiple of the scan size, which is limited by both the
lookahead buffer and the total storage size.
This bug went unnoticed for so long because it only causes
problems when the block device is both:
1. Not aligned to the lookahead buffer (not a power of 2)
2. Smaller than the lookahead buffer
While this seems like a strange corner case for a block device,
this turned out to be very common for internal flash, especially
when a handleful of blocks are reserved for code.
The littlefs allows buffers to be passed statically in the case
that a system does not have a heap. Unfortunately, this means we
can't round up in the case of an unaligned lookahead buffer.
Double unfortunately, rounding down after clamping to the block device
size could result in a lookahead of zero for block devices < 32 blocks
large.
The assert in littlefs does catch this case, but rounding down prevents
support for < 32 block devices.
The solution is to simply require a 32-bit aligned buffer with an
assert. This avoids runtime problems while allowing a user to pass
in the correct buffer for < 32 block devices. Rounding up can be
handled at higher API levels.
Simply limiting the lookahead region to the size of
the block device fixes the problem.
Also added logic to limit the allocated region and
floor to nearest word, since the additional memory
couldn't really be used effectively.
When the lookahead buffer wraps around in an unaligned filesystem, it's
possible for blocks at the beginning of the disk to have a negative distance
from the lookahead, but still reside in the lookahead buffer.
Switching to signed modulo doesn't quite work due to how negative modulo
is implemented in C, so the simple solution is to shift the region to be
positive.