Currently unused, the insertion of new file entries in arbitrary
locations in a metadata-pair is very easy to add into the existing
metadata logging.
The only tricky things:
1. Name tags must strictly precede any tags related to a file. We can
pull this off during a compact, but must make two passes. One for the
name tag, one for the file. Though a benefit of this is that now our
scans during moves can exit early upon finding the name tag.
1. We need to handle name tags appearing out of order. This makes name
tags symmetric to deletes, although it doesn't seem like we can
leverage this fact very well. Note this also means we need to make
the superblock tag a type of name tag.
The valid bit present in tags is a requirement to properly detect the
end of commits in metadata logs. The way it works is that the CRC entry is
allowed to specify what is needed from the next tag's valid bit. If it's
incorrect, we've reached the end of the commit. We then set the valid bit to
indicate when we tried to program a new commit. If we lose power, this
commit will still be thrown out by a bad checksum.
However, the valid bit is unused outside of the CRC entry. Here we turn on the
valid bit for all tags, which means we have a decent chance of exiting early
if we hit a half-written commit. We still need to guarantee detection of
the valid bit on commits following the CRC entry, so we allow the CRC
entry to flip the expected valid bit.
The only tricky part is what valid bit we expect by default, since this
is used on the first commit on a metadata log. Here we default to a 1,
which gives us the fastest exit on blocks that erase to 0. This is
because blocks that erase to 1s will implicitly flip the valid bit of
the next tag, allowing us to exit on the next tag.
If we defaulted to 0, we could exit faster on disks that erase to 1, but
would need to scan the entire block on disks that erase to 0 before we
realize a CRC commit is never coming.
While ECORRUPT is not a wrong error code, it doesn't match other
instances of hitting a corrupt block during write. During writes, if
blocks are detected as corrupt their data is evicted and moved to a new
clean block. This means that at the end of a disk's lifetime, exhaustion
errors will be reported as ENOSPC when littlefs can't find any new block
to store the data.
This has the benefit of matching behaviour when a new file is written
and no more blocks can be found, due to either a small disk or corrupted
blocks on disk. To littlefs it's like the disk shrinks in size over
time.
The initial implementation of inline files was thrown together fairly
quicky, however it has worked well so far and there hasn't been much
reason to change it.
One shortcut was to trick file writes into thinking they are writing to
imaginary blocks. This works well and reuses most of the file code
paths, as long as we don't flush the imaginary block out to disk.
Initially we did this by limiting inline_max to cache_max-1, ensuring
that the cache never fills up and gets flushed. This was a rather dirty
hack, the better solution, implemented here, is to handle the
representation of an "imaginary" block correctly all the way down into
the cache layer.
So now for files specifically, the value -1 represents a null pointer,
and the value -2 represents an "imaginary" block. This may become a
problem if the number of blocks approaches the max, however this -2
value is never written to disk and can be changed in the future without
breaking compatibility.
This was a pretty simple oversight on my part. Conceptually, there's no
difference between lfs_fs_getattr and lfs_getattr("/"). Any operations
on directories can be applied "globally" by referring to the root
directory.
Implementation wise, this actually fixes the "corner case" of storing
attributes on the root directory, which is broken since the root
directory doesn't have a related entry. Instead we need to use the root
superblock for this purpose.
Fewer functions means less code to document and maintain, so this is a
nice benefit. Now we just have a single lfs_getattr/setattr/removeattr set
of functions along with the ability to access attributes atomically in
lfs_file_opencfg.
This implements the second step of full dynamic wear-leveling, block
allocation randomization. This is the key part the uniformly distributes
wear across the filesystem, even through reboots.
The entropy actually comes from the filesystem itself, by xoring
together all of the CRCs in the metadata-pairs on the filesystem. While
this sounds like a ridiculous operation, it's easy to do when we already
scan the metadata-pairs at mount time.
This gives us a random number we can use for block allocation.
Unfortunately it's not a great general purpose random generator as the
output only changes every filesystem write. Fortunately that's exactly
when we need our allocator.
---
Additionally, the randomization created a mess for the testing
framework. Fortunately, this method of randomization is deterministic.
A very useful property for reproducing bugs.
Initially, littlefs relied entirely on bad-block detection for
wear-leveling. Conceptually, at the end of a devices lifespan, all
blocks would be worn evenly, even if they weren't worn out at the same
time. However, this doesn't work for all devices, rather than causing
corruption during writes, wear reduces a devices "sticking power",
causing bits to flip over time. This means for many devices, true
wear-leveling (dynamic or static) is required.
Fortunately, way back at the beginning, littlefs was designed to do full
dynamic wear-leveling, only dropping it when making the retrospectively
short-sighted realization that bad-block detection is theoretically
sufficient. We can enable dynamic wear-leveling with only a few tweaks
to littlefs. These can be implemented without breaking backwards
compatibility.
1. Evict metadata-pairs after a certain number of writes. Eviction in
this case is identical to a relocation to recover from a bad block.
We move our data and stick the old block back into our pool of
blocks.
For knowing when to evict, we already have a revision count for each
metadata-pair which gives us enough information. We add the
configuration option block_cycles and evict when our revision count
is a multiple of this value.
2. Now all blocks participate in COW behaviour. However we don't store
the state of our allocator, so every boot cycle we reuse the first
blocks on storage. This is very bad on a microcontroller, where we
may reboot often. We need a way to spread our usage across the disk.
To pull this off, we can simply randomize which block we start our
allocator at. But we need a random number generator that is different
on each boot. Fortunately we have a great source of entropy, our
filesystem. So we seed our block allocator with a simple hash of the
CRCs on our metadata-pairs. This can be done for free since we
already need to scan the metadata-pairs during mount.
What we end up with is a uniform distribution of wear on storage. The
wear is not perfect, if a block is used for metadata it gets more wear,
and the randomization may not be exact. But we can never actually get
perfect wear-leveling, since we're already resigned to dynamic
wear-leveling at the file level.
With the addition of metadata logging, we end up with a really
interesting two-stage wear-leveling algorithm. At the low-level,
metadata is statically wear-leveled. At the high-level, blocks are
dynamically wear-leveled.
---
This specific commit implements the first step, eviction of metadata
pairs. Entertwining this into the already complicated compact logic was
a bit annoying, however we can combine the logic for superblock
expansion with the logic for metadata-pair eviction.
This was mostly tweaking test cases to be accommodating for variable
sized superblock-lists. Though there were a few bugs that needed fixing:
- Changed compact to use source dir for move since the original dir
could have changed as a result of an expand.
- Created copy of current directory so we don't overwrite ourselves
during an internal commit update.
Also made sure all of the test suites provide reproducable results when
ran independently (the entry tests were behaving differently based on
which tests were ran before).
(Some where legitimate test failures)
Expanding superblocks has been on my wishlist for a while. The basic
idea is that instead of maintaining a fixed offset blocks {0, 1} to the
the root directory (1 pointer), we maintain a dynamically sized
linked-list of superblocks that point to the actual root. If the number
of writes to the root exceeds some value, we increase the size of the
superblock linked-list.
This can leverage existing metadata-pair operations. The revision count for
metadata-pairs provides some knowledge on how much wear we've put on the
superblock, and the threaded linked-list can also be reused for this
purpose. This means superblock expansion is both optional and cheap to
implement.
Expanding superblocks helps both extremely small and extremely large filesystem
(extreme being relative of course). On the small end, we can actually
collapse the superblock into the root directory and drop the hard requirement
of 4-blocks for the superblock. On the large end, our superblock will
now last longer than the rest of the filesystem. Each time we expand,
the number of cycles until the superblock dies is increased by a power.
Before we were stuck with this layout:
level cycles limit layout
1 E^2 390 MiB s0 -> root
Now we expand every time a fixed offset is exceeded:
level cycles limit layout
0 E 4 KiB s0+root
1 E^2 390 MiB s0 -> root
2 E^3 37 TiB s0 -> s1 -> root
3 E^4 3.6 EiB s0 -> s1 -> s2 -> root
...
Where the cycles are the number of cycles before death, and the limit is
the worst-case size a filesystem where early superblock death becomes a
concern (all writes to root using this formula: E^|s| = E*B, E = erase
cycles = 100000, B = block count, assuming 4096 byte blocks).
Note we can also store copies of the superblock entry on the expanded
superblocks. This may help filesystem recover tools in the future.
Because of limitations in how littlefs manages attributes on disk,
littlefs views zero-length attributes and missing attributes as the same
thing. The simpliest implementation of attributes mirrors this behaviour
transparently for the user.
State stealing is a tricky part of managing the xored-globals. When
removing a metadata-pair from the metadata chain, whichever
metadata-pair does the removing is also responsible for stealing the
removed metadata-pair's global delta and incorporating it into it's own
global delta. Otherwise the global state would become corrupted.
The introduction of an explicit cache_size configuration allows
customization of the cache buffers independently from the hardware
read/write sizes.
This has been one of littlefs's main handicaps. Without a distinction
between cache units and hardware limitations, littlefs isn't able to
read or program _less_ than the cache size. This leads to the
counter-intuitive case where larger cache sizes can actually be harmful,
since larger read/prog sizes require sending more data over the bus if
we're only accessing a small set of data (for example the CTZ skip-list
traversal).
This is compounded with metadata logging, since a large program size
limits the number of commits we can write out in a single metadata
block. It really doesn't make sense to link program size + cache
size here.
With a separate cache_size configuration, we can be much smarter about
what we actually read/write from disk.
This also simplifies cache handling a bit. Before there were two
possible cache sizes, but these were rarely used. Note that the
cache_size is NOT written to the superblock and can be freely changed
without breaking backwards compatibility.
Result of testing on zero-granularity blocks, where the prog size and
read size equals the block size. This represents SD cards and other
traditional forms of block storage where we don't really get a benefit
from the metadata logging.
Unfortunately, since updates in both are tested by the same script,
we can't really use simple bash commands. Added a more complex
script to simulate corruption. Fortunately this should be more robust
than the previous solutions.
The main fixes were around corner cases where the commit logic fell
apart when it didn't have room to complete commits, but these were
fixable in the current design.
The main thing to consider was how lfs_dir_fetchwith reacts to
corruption it finds and to make sure falling back to old values works
correctly.
Some of the tricky bits involved making sure we could fall back to both old
commits and old metadata blocks while still handling things like
synthetic moves correctly.
The main issue here was that the old orphan test relied on deleting the
block that contained the most recent update. In the new design this
doesn't really work since updates get appended to metadata-pairs
incrementally.
This is fixed by instead using the truncate command on the appropriate
block. We're now passing orphan tests.
Now that littlefs has been rebuilt almost from the ground up with the
intention to support custom attributes, adding in custom attribute
support is relatively easy.
The highest bit in the 9-bit type structure indicates that an attribute
is a user-specified custom attribute. The user then has a full 8-bits to
specify the attribute type. Other than that, custom attributes are
treated the same as system-level attributes.
Also made some tweaks to custom attributes:
- Adopted the opencfg for file-level attributes provided by dpgeorge
- Changed setattrs/getattrs to the simpler setattr/getattr functions
users will probably be more familiar with. Note that multiple
attributes can still be committed atomically with files, though not
with directories.
- Changed LFS_ATTRS_MAX -> LFS_ATTR_MAX since there's no longer a global
limit on the sum of attribute sizes, which was rather confusing.
Though they are still limited by what can fit in a metadata-pair.
I've been trying to keep tag types organized with an encoding that hints
if a tag uses its id field for file ids. However this seem to have been
a mistake. Using a null id of 0x3ff greatly simplified quite a bit of
the logic around managing file related tags.
The downside is one less id we can use, but if we look at the encoding
cost, donating one full bit costs us 2^9 id permutations vs 1 id
permutation. So even if we had a perfect encoding it's in our favor to
use a null id. The cost of null ids is code size, but with the
complexity around figuring out if a type used it's id or not it just
works out better to use a null id.
The issue lies in the reuse of the id field for globals. Before globals,
the only tags with a non-null (0x3ff) id field were names, structs, and
other file-specific metadata. But globals are also using this field for
the indirect delete, since otherwise the globals structure would be very
unaligned (74-bits long).
To make matters worse, the id field for globals contains the delta used
to reconstruct the globals at mount time. Which means the id field could
take on very absurd values and break the dir fetch logic if we're not
careful.
Solution is to use the scope portion of the type field where necessary,
although unforunately this does add some code cost.
Unfortunately, the behaviour needed of lfs_dir_fetchwith is as subtle as
it is important. When fetching from a block corrupted by power-loss,
lfs_dir_fetch must be able to rewind any state it picks up to before the
corruption. This is not limited to the directory state, but includes
find results and other side-effects.
This gets a bit complicated when trying to generalize littlefs's
fetchwith mechanics. Being able to scan a directory block during a fetch
greatly impacts the runtime of littlefs operations, but if the state is
generic how do we know what to rollback to?
The fix here is to leave the management of rolling back state to the
fetchwith match functions, and transparently pass a CRC tag to indicate
the temporary state can be saved.
Originally I tried to reuse the indirect delete to accomplish truely
atomic directory removes, however this fell apart when it came to
implementing directory removes as a side-effect of renames.
A single indirect-delete simply can't handle renames with removes as
a side effects. When copying an entry to its destination, we need to
atomically delete both the old entry, and the source of our copy. We
can't delete both with only a single indirect-delete. It is possible to
accomplish this with two indirect-deletes, but this is such an uncommon
case that it's really not worth supporting efficiently due to how
expensive globals are.
I also dropped indirect-deletes for normal directory removes. I may add
it back later, but at the moment it's extra code cost for that's not
traveled very often.
As a result, restructured the indirect delete handling to be a bit more
generic, now with a multipurpose lfs_globals_t struct instead of the
delete specific lfs_entry_t struct.
Also worked on integrating xored-globals, now with several primitive
global operations to manage fetching/updating globals on disk.
lfs_dir_fetchwith did not recover from failed dir fetches correctly,
added a temporary dir variable to hold dir contents while being
populated, allowing us to fall back to a known good dir state if a
commit is corrupted.
There is a RAM cost, but the upside is that our lfs_dir_fetchwith
actually works.
Also added better handling of move ids during some get functions.
Integration with the new journaling metadata has now progressed to the
point where all of the basic functionality tests are passing. This
includes:
- test_format
- test_dirs
- test_files
- test_seek
- test_truncate
- test_interspersed
- test_paths
Some of the fixes:
- Modified move to correctly change entry ids
- Called lfs_commit_move directly from compact, avoiding commit parsing
logic during a compact
- Opened up commit filters to be passed down from compact for moves
- Added correct drop logic to lfs_dir_delete
- Updated lfs_dir_seek to use ids instead of offsets
- Caught id updates manually where possible (this needs to be fixed)
Now with some tweaks to commit/compact, and a committers for entrylists and
moves specifically. No longer relying on a commitwith callback, the
types of commits are now infered from their tags.
This means we can now commit things atomically with special commits,
such as moves. Now lfs_rename can move entries to new names correctly.
Passing more tests now with the journalling change, but still have more
work to do.
The most humorous bug was a bug where during the three step move
process, the entry move logic would dumbly copy over any tags associated
with the moving entry, including the tag used to temporarily mark the
entry as "moving".
Also combined dir and commit traversal using a "stop_at_commit" flag in
directory struct as a short-term hack to combine the code paths.
Making the superblock look like "just another entry" allows us to treat
the superblock like "just another entry" and reuse a decent amount of
logic that would otherwise only be used a format and mount time. In this
case we can use append to write out the superblock like it was creating
a new entry on the filesystem.
Now when a file overflows the max inline file size, it will be correctly
written out to a proper block. Additionally, tweaked corner cases around
inline file, however this still needs significant testing.
A real neat part that surprised me is that littlefs _already_ contains
the logic for writing out inline files: in lfs_file_relocate! With a bit
of tweaking, littlefs can pull off both the overflow from inline to
normal files _and_ the relocating of bad blocks in files with the same
piece of logic.
Proof-of-concept implementation of inline files that stores the file's
content directly in its parent's directory pair.
Inline files are indicated by a different type stored in an entry's
struct field, and take advantage of resizable entries. Where a normal
file's entry would normally hold the reference to the CTZ skip-list, an
inline file's entry contains the contents of the actual file.
Unfortunately, storing the inline file on disk is the easy part. We also
need to manage inline files in the internals of littlefs and provide the
same operations that we do on normal files, all while reusing as much
code as possible to avoid a significant increase in code cost.
There is a relatively simple, though maybe a bit hacky, solution here. If a
file fits entirely in a cache line, the file logic never actually has to go to
disk. This means we can just give the file a "pretend" block (hopefully
one that would assert if ever written to), and carry out file operations
as normal, as long as we catch the file before it exceeds the cache line
and write out the file to an actual disk.
- Fixed shadowed variable warnings in lfs_dir_find.
- Fixed unused parameter warnings when LFS_NO_MALLOC is enabled.
- Added extra warning flags to CFLAGS.
- Updated tests so they don't shadow the "size" variable for -Wshadow
Paths such as the following were causing issues:
/tea/hottea/.
/tea/hottea/..
Unfortunately the existing structure for path lookup didn't make it very
easy to introduce proper handling in this case without duplicating the
entire skip logic for paths. So the lfs_dir_find function had to be
restructured a bit.
One odd side-effect of this is that now lfs_dir_find includes the
initial fetch operation. This kinda breaks the fetch -> op pattern of
the dir functions, but does come with a nice code size reduction.
One of the big simplifications in littlefs's implementation is the
complete lack of tracking free blocks, allowing operations to simply
drop blocks that are no longer in use.
However, this means the lookahead buffer can easily contain outdated
blocks that were previously deleted. This is usually fine, as littlefs
will rescan the storage if it can't find a free block in the lookahead
buffer, but after changes that caused littlefs to more conservatively
respect the alloc acks (e611cf5), any scanned blocks after an ack would
be incorrectly trusted.
The fix is to eagerly scan ahead in the lookahead when we allocate so
that alloc acks are better able to discredit old lookahead blocks. Since
usually alloc acks are tightly coupled to allocations of one or two blocks,
this allows littlefs to properly rescan every set of allocations.
This may still be a concern if there is a long series of worn out
blocks, but in the worst case littlefs will conservatively avoid using
blocks it's not sure about.
Found by davidefer
Like most of the lfs_dir_t functions, lfs_dir_append is responsible for
updating the lfs_dir_t struct if the underlying directory block is
moved. This property makes handling worn out blocks much easier by
removing the amount of state that needs to be considered during a
directory update.
However, extending the dir chain is a bit of a corner case. It's not
changing the old block, but callers of lfs_dir_append do assume the
"entry" will reside in "dir" after lfs_dir_append completes.
This issue only occurs when creating files, since mkdir does not use
the entry after lfs_dir_append. Unfortunately, the tests against
extending the directory chain were all made using mkdir.
Found by schouleu
Before this patch, when calling lfs_mkdir or lfs_file_open with root
as the target, littlefs wouldn't find the path properly and happily
run into undefined behaviour.
The fix is to populate a directory entry for root in the lfs_dir_find
function. As an added plus, this allowed several special cases around
root to be completely dropped.
Using gcc cross compilers and qemu:
- make test CC="arm-linux-gnueabi-gcc --static -mthumb" EXEC="qemu-arm"
- make test CC="powerpc-linux-gnu-gcc --static" EXEC="qemu-ppc"
- make test CC="mips-linux-gnu-gcc --static" EXEC="qemu-mips"
Also separated out Travis jobs and added some size reporting
Rather than tracking all in-flight blocks blocks during a lookahead,
littlefs uses an ack scheme to mark the first allocated block that
hasn't reached the disk yet. littlefs assumes all blocks since the
last ack are bad or in-flight, and uses this to know when it's out
of storage.
However, these unacked allocations were still being populated in the
lookahead buffer. If the whole block device fits in the lookahead
buffer, _and_ littlefs managed to scan around the whole storage while
an unacked block was still in-flight, it would assume the block was
free and misallocate it.
The fix is to only fill the lookahead buffer up to the last ack.
The internal free structure was restructured to simplify the runtime
calculation of lookahead size.
- Write on read-only file to return LFS_ERR_BADF
- Renaming directory onto file to return LFS_ERR_NOTEMPTY
- Changed LFS_ERR_INVAL in lfs_file_seek to assert
- no previous prototype for ‘test_assert’
- no previous prototype for ‘test_count’
- unused parameter ‘b’ in test_count
- function declaration isn’t a prototype for main
The most useful part of -Werror is preventing code from being
merged that has warnings. However it is annoying for users who may have
different compilers with different warnings. Limiting -Werror to CI only
covers the main concern about warnings without limiting users.
Some of the tests were creating a variable `res`, however the test
system itself relies on it's own `res` variable. This worked out by
luck, but could lead to problems if the res variables were different
types.
Changed the generated variable in the test system to the less common
name `test`, which also works out to share the same prefix as other test
functions.
Found by user iamscottmoyers, this was an interesting bug with the test
system. If the new test.c file is generated fast enough, it may not have
a new timestamp and not get recompiled.
To fix, we can remove the specific files that need to be rebuilt (lfs and
test.o).
As a copy-on-write filesystem, the truncate function is a very nice
function to have, as it can take advantage of reusing the data already
written out to disk.