third_party_littlefs

mirror of https://gitee.com/openharmony/third_party_littlefs synced 2024-11-23 14:59:50 +00:00

Author	SHA1	Message	Date
Noah Gorny	6303558aee	Use LFS_O_RDWR instead of magic number in lfs_file_* asserts	2020-11-19 01:51:39 +02:00
Noah Gorny	4bd653dd00	Assert that file/dir struct is not reused in lfs_file_opencfg/lfs_dir_open	2020-11-19 01:51:39 +02:00
Christopher Haster	4c9146ea53	Merge pull request #405 from rojer/mfe Fix -Wmissing-field-initializers	2020-04-09 05:42:46 -05:00
Deomid "rojer" Ryabkov	5a9f38df01	Remove -Wno-missing-field-initializers	2020-04-06 19:51:19 +01:00
Deomid "rojer" Ryabkov	1b033e9ab6	Fix -Wmissing-field-initializers	2020-04-03 02:18:14 +01:00
Christopher Haster	a049f1318e	Merge pull request #372 from ARMmbed/test-revamp Rework test framework, fix a number of related bugs	2020-03-31 18:25:13 -05:00
Christopher Haster	7257681f5d	Merge branch 'master' into test-revamp	2020-03-31 18:24:54 -05:00
Christopher Haster	2da340af69	Merge pull request #373 from henrygab/patch-1 Indicate C99 standard as target for LittleFS code	2020-03-31 18:22:48 -05:00
Christopher Haster	02881e591b	Merge pull request #360 from jpdoyle/master Fix incorrect comment on `lfs_npw2`	2020-03-31 18:22:41 -05:00
Christopher Haster	38024d5a17	Merge pull request #356 from zqb-all/patch-1 Update SPEC.md	2020-03-31 18:22:34 -05:00
Christopher Haster	4a9bac4418	Merge pull request #322 from hemmick/master Allow debug prints without __VA_ARGS__ in non-MSVC	2020-03-31 18:22:27 -05:00
Christopher Haster	6121495444	Merge pull request #266 from FreddieChopin/revert-bypass-cache Revert "Don't bypass cache in `lfs_cache_prog()` and `lfs_cache_read()`"	2020-03-31 18:22:19 -05:00
John Hemmick	6372f515fe	Allow debug prints without __VA_ARGS__ __VA_ARGS__ are frustrating in C. Even for their main purpose (printf), they fall short in that they don't have a _portable_ way to have zero arguments after the format string in a printf call. Even if we detect compilers and use ##__VA_ARGS__ where available, GCC emits a warning with -pedantic that is _impossible_ to explicitly disable. This commit contains the best solution we can think of. A bit of indirection that adds a hidden "%s" % "" to the end of the format string. This solution does not work everywhere as it has a runtime cost, but it is hopefully ok for debug statements.	2020-03-29 21:58:49 -05:00
Christopher Haster	6622f3deee	Bumped minor version to v2.2	2020-03-29 21:43:58 -05:00
Christopher Haster	5137e4b0ba	Last minute tweaks to debug scripts - Standardized littlefs debug statements to use hex prefixes and brackets for printing pairs. - Removed the entry behavior for readtree and made -t the default. This is because 1. the CTZ skip-list parsing was broken, which is not surprising, and 2. the entry parsing was more complicated than useful. This functionality may be better implemented as a proper filesystem read script, complete with directory tree dumping. - Changed test.py's --gdb argument to take [init, main, assert], this matches the names of the stages in C's startup. - Added printing of tail to all mdir dumps in readtree/readmdir. - Added a print for if any mdirs are corrupted in readtree. - Added debug script side-effects to .gitignore.	2020-03-29 21:19:33 -05:00
Christopher Haster	ff84902970	Moved out block device tracing into separate define Block device tracing has a lot of potential uses, of course debugging, but it can also be used for profiling and externally tracking littlefs's usage of the block device. However, block device tracing emits a massive amount of output. So keeping block device tracing on by default limits the usefulness of the filesystem tracing. So, instead, I've moved the block device tracing into a separate LFS_TESTBD_YES_TRACE define which switches on the LFS_TESTBD_TRACE macro. Note that this means in order to get block device tracing, you need to define both LFS_YES_TRACE and LFS_TESTBD_YES_TRACE. This is needed as the LFS_TRACE definition is gated by LFS_YES_TRACE in lfs_util.h.	2020-03-29 18:45:51 -05:00
Christopher Haster	01e42abd10	Merge pull request #401 from thrasher8390/bugfix/thrasher8390/issue-394-lookahead-buffer-corruption Lookahead corruption fix given an IO Error during traversal	2020-03-29 17:59:00 -05:00
Christopher Haster	f9dbec3d92	Added test case catching issues with errors during a lookahead scan Original issue found by thrasher8390	2020-03-29 14:12:58 -05:00
Derek Thrasher	f17d3d7eba	Minor cleanup - Removed the declaration of lfs_alloc_ack - Consistent brackets	2020-03-29 14:12:30 -05:00
Derek Thrasher	5e5b5d8572	(chore) updates from PR, we decided not to move forward with changing v1 code since it can be risky. Let's improve the future! Also renamed and moved around a the lookahead free / reset function	2020-03-29 14:12:30 -05:00
Derek Thrasher	d498b9fb31	(bugfix) adding line function to clear out all the global 'free' information so that we can reset it after a failed traversal	2020-03-29 14:12:30 -05:00
Christopher Haster	4677421aba	Added "evil" tests and detecion/recovery from bad pointers and infinite loops These two features have been much requested by users, and have even had several PRs proposed to fix these in several cases. Before this, these error conditions usually were caught by internal asserts, however asserts prevented users from implementing their own workarounds. It's taken me a while to provide/accept a useful recovery mechanism (returning LFS_ERR_CORRUPT instead of asserting) because my original thinking was that these error conditions only occur due to bugs in the filesystem, and these bugs should be fixed properly. While I still think this is mostly true, the point has been made clear that being able to recover from these conditions is definitely worth the code cost. Hopefully this new behaviour helps the longevity of devices even if the storage code fails. Another, less important, reason I didn't want to accept fixes for these situations was the lack of tests that prove the code's value. This has been fixed with the new testing framework thanks to the additional of "internal tests" which can call C static functions and really take advantage of the internal information of the filesystem.	2020-03-20 09:26:07 -05:00
Chris Desjardins	cb26157880	Change assert to runtime check. I had a system that was constantly hitting this assert, after making this change it recovered immediately.	2020-02-23 22:18:08 -06:00
Christopher Haster	a7dfae4526	Minor tweaks to debugging scripts, fixed explode_asserts.py off-by-1 - Changed readmdir.py to print the metadata pair and revision count, which is useful when debugging commit issues. - Added truncated data view to readtree.py by default. This does mean readtree.py must read all files on the filesystem to show the truncated data, hopefully this does not end up being a problem. - Made overall representation hopefully more readable, including moving superblock under the root dir, userattrs under files, fixing a gstate rendering issue. - Added rendering of soft-tails as dotted-arrows, hopefully this isn't too noisy. - Fixed explode_asserts.py off-by-1 in #line mapping caused by a strip call in the assert generation eating newlines. The script matches line numbers between the original+modified files by emitting assert statements that use the same number of lines. An off-by-1 here causes the entire file to map lines incorrectly, which can be very annoying.	2020-02-22 23:50:03 -06:00
Christopher Haster	50fe8ae258	Renamed test_format -> test_superblocks, tweaked superblock tests With the superblock expansion stuff, the test_format tests have grown to test more advanced superblock-related features. This is fine but deserves a rename so it's more clear. Also fixed a typo that meant tests never ran with block cycles.	2020-02-22 23:35:28 -06:00
Christopher Haster	0990296619	Limited byte-level tests to native testing due to time Byte-level writes are expensive and not suggested (caches >= 4 bytes make much more sense), however there are many corner cases with byte-level writes that can be easy to miss (power-loss leaving single bytes written to disk). Unfortunately, byte-level writes mixed with power-loss testing, the Travis infrastructure, and Arm Thumb instruction set simulation exceeds the 50-minute budget Travis allocates for jobs. For now I'm disabling the byte-level tests under Qemu, with the hope that performance improvements in littlefs will let us turn these tests back on in the future.	2020-02-18 18:05:08 -06:00
Christopher Haster	d04b077506	Fixed minor things to get CI passing again - Added caching to Travis install dirs, because otherwise pip3 install fails randomly - Increased size of littlefs-fuse disk because test script has a larger footprint now - Skip a couple of reentrant tests under byte-level writes because the tests just take too long and cause Travis to bail due to no output for 10m - Fixed various Valgrind errors - Suppressed uninit checks for tests where LFS_BLOCK_ERASE_VALUE == -1. In this case rambd goes uninitialized, which is fine for rambd's purposes. Note I couldn't figure out how to limit this suppression to only the malloc in rambd, this doesn't seem possible with Valgrind. - Fixed memory leaks in exhaustion tests - Fixed off-by-1 string null-terminator issue in paths tests - Fixed lfs_file_sync issue caused by revealed by fixing memory leaks in exhaustion tests. Getting ENOSPC during a file write puts the file in a bad state where littlefs doesn't know how to write it out safely. In this case, lfs_file_sync and lfs_file_close return 0 without writing out state so that device-side resources can still be cleaned up. To recover from ENOSPC, the file needs to be reopened and the writes recreated. Not sure if there is a better way to handle this. - Added some quality-of-life improvements to Valgrind testing - Fit Valgrind messages into truncated output when not in verbose mode - Turned on origin tracking	2020-02-18 18:05:03 -06:00
Christopher Haster	c7987a3162	Restructured .travis.yml to span more jobs The core of littlefs's CI testing is the full test suite, `make test`, run under a number of configurations: - Processor architecture: - x86 (native) - Arm Thumb - MIPS - PowerPC - Storage geometry: - rs=16 ps=16 cs=64 bs=512 (default) - rs=1 ps=1 cs=64 bs=4KiB (NOR flash) - rs=512 ps=512 cs=512 bs=512 (eMMC) - rs=4KiB ps=4KiB cs=4KiB bs=32KiB (NAND flash) - Other corner cases: - no intrinsics - no inline - byte-level read/writes - single block-cycles - odd block counts - odd block sizes The number of different configurations we need to test quickly exceeds the 50 minute time limit Travis has on jobs. Fortunately, we can split these tests out into multiple jobs. This seems to be the intended course of action for large CI "builds" in Travis, as this gives Travis a finer grain of control over limiting builds. Unfortunately, this created a couple issues: 1. The Travis configuration isn't actually that flexible. It allows a single "matrix expansion" which can be generated from top-level lists of different configurations. But it doesn't let you generate a matrix from two seperate environment variable lists (for arch + geometry). Without multiple matrix expansions, we're stuck writing out each test permutation by hand. On the bright-side, this was a good chance to really learn how YAML anchors work. I'm torn because on one hand anchors add what feels like unnecessary complexity to a config language, on the other hand, they did help quite a bit in working around Travis's limitations. 2. Now that we have 47 jobs instead of 7, reporting a separate status for each job stops making sense. What I've opted for here is to use a special NAME variable to deduplicate jobs, and used a few state-less rules to hopefully have the reported status make sense most of the time. - Overwrite "pending" statuses so that the last job to start owns the most recent "pending" status - Don't overwrite "failure" statuses unless the job number matches our own (in the case of CI restarts) - Don't write "success" statuses unless the job number matches our own, this should delay a green check-mark until the last-to-start job finishes - Always overwrite non-failures with "failure" statuses This does mean a temporary "success" may appear if the last job terminates before earlier jobs. But this is the simpliest solution I can think of without storing some complex state somewhere. Note we can only report the size this way because it's cheap to calculate in every job.	2020-02-18 17:34:23 -06:00
Christopher Haster	dcae185a00	Fixed typo in LFS_MKTAG_IF_ELSE	2020-02-12 11:31:34 -06:00
Christopher Haster	f4b17b379c	Added test.py support for tmpfs-backed disks RAM-backed testing is faster than file-backed testing. This is why test.py uses rambd by default. So why add support for tmpfs-backed disks if we can already run tests in RAM? For reentrant testing. Under reentrant testing we simulate power-loss by forcefully exiting the test program at specific times. To make this power-loss meaningful, we need to persist the disk across these power-losses. However, it's interesting to note this persistence doesn't need to be actually backed by the filesystem. It may be possible to rearchitecture the tests to simulate power-loss a different way, by say, using coroutines or setjmp/longjmp to leave behind ongoing filesystem operations without terminating the program completely. But at this point, I think it's best to work with what we have. And simply putting the test disks into a tmpfs mount-point seems to work just fine. Note this does force serialization of the tests, which isn't required otherwise. Currently they are only serialized due to limitations in test.py. If a future change wants to perallelize the tests, it may need to rework RAM-backed reentrant tests.	2020-02-12 10:48:54 -06:00
Christopher Haster	9f546f154f	Updated .travis.yml and added additional geometry constraints Moved .travis.yml over to use the new test framework. A part of this involved testing all of the configurations ran on the old framework and deciding which to carry over. The new framework duplicates some of the cases tested by the configurations so some configurations could be dropped. The .travis.yml includes some extreme ones, such as no inline files, relocations every cycle, no intrinsics, power-loss every byte, unaligned block_count and lookahead, and odd read_sizes. There were several configurations were some tests failed because of limitations in the tests themselves, so many conditions were added to make sure the configurations can run on as many tests as possible.	2020-02-11 16:01:57 -06:00
Christopher Haster	b69cf890e6	Fixed CRC check when prog_size causes multiple CRCs per commit This is a bit of a strange case that can be caused by storage with very large prog sizes, such as NAND flash. We only have 10 bits to store the size of our padding, so when the prog_size gets larger than 1024 bytes, we have to use multiple padding tags to commit to the next prog_size boundary. This causes some complication for the new logic that checks CRCs in case our block becomes "readonly" and contains existing commits that just happen to match our new commit size. Here we just check the CRC of the first commit. This isn't perfect but does protect against pure "readonly" blocks.	2020-02-09 22:43:20 -06:00
Christopher Haster	02c84ac5f4	Cleaned up dependent fixes on branch These should probably have been cleaned up in each commit to allow cherry-picking, but due to time I haven't been able to. - Went with creating an mdir copy in lfs_dir_commit. This handles a number of related cleanup issues in lfs_dir_compact and it does so more robustly. As a plus we can use the copy to update dependencies in the mlist. - Eliminated code left by the ENOSPC file outlining - Cleaned up TODOs and lingering comments - Changed the reentrant many directory create/rename/remove test to use a smaller set of directories because of space issues when READ/PROG_SIZE=512	2020-02-09 12:37:39 -06:00
Christopher Haster	6530cb3a61	Fixed lfs_fs_size doubling metadata-pairs This was caused by the previous fix for allocations during lfs_fs_deorphan in this branch. To catch half-orphans during block allocations we needed to duplicate all metadata-pairs reported to lfs_fs_traverse. Unfortunately this causes lfs_fs_size to report 2x the number of metadata-pairs, which would undoubtably confuse users. The fix here is inelegantly simple, just do a different traversale for allocations and size measurements. It reuses the same code but touches slightly different sets of blocks. Unfortunately, this causes the public lfs_fs_traverse and lfs_fs_size functions to split in how they report blocks. This is technically allowed, since lfs_fs_traverse may report blocks multiple times due to CoW behavior, however it's undesirable and I'm sure there will be some confusion. But I don't have a better solution, so from this point lfs_fs_traverse will be reporting 2x metadata-blocks and shouldn't be used for finding the number of available blocks on the filesystem.	2020-02-09 12:00:23 -06:00
Christopher Haster	fe957de892	Fixed broken wear-leveling when block_cycles = 2n-1 This was an interesting issue found during a GitHub discussion with rmollway and thrasher8390. Blocks in the metadata-pair are relocated every "block_cycles", or, more mathy, when rev % block_cycles == 0 as long as rev += 1 every block write. But there's a problem, rev isn't += 1 every block write. There are two blocks in a metadata-pair, so looking at it from each blocks perspective, rev += 2 every block write. This leads to a sort of aliasing issue, where, if block_cycles is divisible by 2, one block in the metadata-pair is always relocated, and the other block is _never_ relocated. Causing a complete failure of block-level wear-leveling. Fortunately, because of a previous workaround to avoid block_cycles = 1 (since this will cause the relocation algorithm to never terminate), the actual math is rev % (block_cycles+1) == 0. This means the bug only shows its head in the much less likely case where block_cycles is a multiple of 2 plus 1, or, in more mathy terms, block_cycles = 2n+1 for some n. To workaround this we can bitwise or our block_cycles with 1 to force it to never be a multiple of 2n. (Maybe we should do this during initialization? But then block_cycles would need to be mutable.) --- There's a few unrelated changes mixed into this commit that shouldn't be there since I added this as part of a branch of bug fixes I'm putting together rather hastily, so unfortunately this is not easily cherry-pickable.	2020-02-09 12:00:23 -06:00
Christopher Haster	6a550844f4	Modified readmdir/readtree to make reading non-truncated data easier Added indention so there was a more clear separation between the tag description and tag data. Also took the best parts of readmdir.py and added it to readtree.py. Initially I was thinking it was best for these to have completely independent data representations, since you could always call readtree to get more info, but this becomes tedius when needed to look at low-level tag info across multiple directories on the filesystem.	2020-02-09 12:00:23 -06:00
Christopher Haster	f9c2fd93f2	Removed file outlining on ENOSPC in lfs_file_sync This was initially added as protection against the case where a file grew to no longer fit in a metadata-pair. While in most cases this should be caught by the math in lfs_file_write, it doesn't handle a problem that can happen if the files metadata is large enough that even small inline files can't fit. This can happen if you combine a small block size with large file names and many custom attributes. But trying to outline on ENOSPC creates creates a lot of problems. If we are actually low on space, this is one of the worst things we can do. Inline files take up less space than CTZ skip-lists, but inline files are rendered useless if we outline inline files as soon as we run low on space. On top of this, the outlining logic tries multiple mdir commits if it gets ENOSPC, which can hide errors if ENOSPC is returned for other reasons. In a perfect world, we would be using a different error code for no-room-in-metadata-pair, and no-blocks-on-disk. For now I've removed the outlining logic and we will need to figure out how to handle this situation more robustly.	2020-02-09 12:00:23 -06:00
Christopher Haster	44d7112794	Fixed tests/.toml. in .gitignore Running test.py creates a log of garbage here	2020-02-09 12:00:22 -06:00
Christopher Haster	77e3078b9f	Added/fixed tests for noop writes (where bd error can't be trusted) It's interesting how many ways block devices can show failed writes: 1. prog can error 2. erase can error 3. read can error after writing (ECC failure) 4. prog doesn't error but doesn't write the data correctly 5. erase doesn't error but doesn't erase correctly Can read fail without an error? Yes, though this appears the same as prog and erase failing. These weren't all simulated by testbd since I unintentionally assumed the block device could always error. Fixed by added additional bad-black behaviors to testbd. Note: This also includes a small fix where we can miss bad writes if the underlying block device contains a valid commit with the exact same size in the exact same offset.	2020-02-09 12:00:22 -06:00
Christopher Haster	517d3414c5	Fixed more bugs, mostly related to ENOSPC on different geometries Fixes: - Fixed reproducability issue when we can't read a directory revision - Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size - Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync - Fixed cleanup issue if we run out of space while extending a CTZ skip-list - Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan Also: - Added cycle-detection to readtree.py - Allowed pseudo-C expressions in test conditions (and it's beautifully hacky, see line 187 of test.py) - Better handling of ctrl-C during test runs - Added build-only mode to test.py - Limited stdout of test failures to 5 lines unless in verbose mode Explanation of fixes below 1. Fixed reproducability issue when we can't read a directory revision An interesting subtlety of the block-device layer is that the block-device is allowed to return LFS_ERR_CORRUPT on reads to untouched blocks. This can easily happen if a user is using ECC or some sort of CMAC on their blocks. Normally we never run into this, except for the optimization around directory revisions where we use uninitialized data to start our revision count. We correctly handle this case by ignoring whats on disk if the read fails, but end up using unitialized RAM instead. This is not an issue for normal use, though it can lead to a small information leak. However it creates a big problem for reproducability, which is very helpful for debugging. I ended up running into a case where the RAM values for the revision count was different, causing two identical runs to wear-level at different times, leading to one version running out of space before a bug occured because it expanded the superblock early. 2. Fixed incorrect erase assumption if lfs_dir_fetch exceeds block size This could be caused if the previous tag was a valid commit and we lost power causing a partially written tag as the start of a new commit. Fortunately we already have a separate condition for exceeding the block size, so we can force that case to always treat the mdir as unerased. 3. Fixed cleanup issue caused by lfs_fs_relocate failing when trying to outline a file in lfs_file_sync Most operations involving metadata-pairs treat the mdir struct as entirely temporary and throw it out if any error occurs. Except for lfs_file_sync since the mdir is also a part of the file struct. This is relevant because of a cleanup issue in lfs_dir_compact that usually doesn't have side-effects. The issue is that lfs_fs_relocate can fail. It needs to allocate new blocks to relocate to, and as the disk reaches its end of life, it can fail with ENOSPC quite often. If lfs_fs_relocate fails, the containing lfs_dir_compact would return immediately without restoring the previous state of the mdir. If a new commit comes in on the same mdir, the old state left there could corrupt the filesystem. It's interesting to note this is forced to happen in lfs_file_sync, since it always tries to outline the file if it gets ENOSPC (ENOSPC can mean both no blocks to allocate and that the mdir is full). I'm not actually sure this bit of code is necessary anymore, we may be able to remove it. 4. Fixed cleanup issue if we run out of space while extending a CTZ skip-list The actually CTZ skip-list logic itself hasn't been touched in more than a year at this point, so I was surprised to find a bug here. But it turns out the CTZ skip-list could be put in an invalid state if we run out of space while trying to extend the skip-list. This only becomes a problem if we keep the file open, clean up some space elsewhere, and then continue to write to the open file without modifying it. Fortunately an easy fix. 5. Fixed missing half-orphans when allocating blocks during lfs_fs_deorphan This was a really interesting bug. Normally, we don't have to worry about allocations, since we force consistency before we are allowed to allocate blocks. But what about the deorphan operation itself? Don't we need to allocate blocks if we relocate while deorphaning? It turns out the deorphan operation can lead to allocating blocks while there's still orphans and half-orphans on the threaded linked-list. Orphans aren't an issue, but half-orphans may contain references to blocks in the outdated half, which doesn't get scanned during the normal allocation pass. Fortunately we already fetch directory entries to check CTZ lists, so we can also check half-orphans here. However this causes lfs_fs_traverse to duplicate all metadata-pairs, not sure what to do about this yet.	2020-02-09 11:54:22 -06:00
zhuangqiubin	4fb188369d	Update SPEC.md 1.fix size in Layout of the CRC tag 2.update (size) to (size * 8)	2020-02-02 17:42:42 +08:00
Henry Gabryjelski	c8e9a64a21	Indicate C99 standard as target for LittleFS code Resolve #358	2020-01-27 21:51:12 -08:00
Christopher Haster	aab6aa0ed9	Cleaned up test script and directory naming - Removed old tests and test scripts - Reorganize the block devices to live under one directory - Plugged new test framework into Makefile renamed: - scripts/test_.py -> scripts/test.py - tests_ -> tests - {file,ram,test}bd/* -> bd/* It took a surprising amount of effort to make the Makefile behave since it turns out the "test_%" rule could override "tests/test_%.toml.test" which is generated as part of test.py.	2020-01-27 10:16:29 -06:00
Christopher Haster	52ef0c1c9e	Fixed a crazy consistency issue in test.py The root of the problem was the notorious Python quirk with mutable default parameters. The default defines for the TestSuite class ended up being mutated as the class determined the permutations to test, corrupting other test's defines. However, the only define that was mutated this way was the CACHE_SIZE config in test_entries. The crazy thing was how this small innocuous change would cause "./scripts/test.py -nr test_relocations" and "./scripts/test.py -nr" to drift out of sync only after a commit spanning the different cache sizes would be written out with a different number of prog calls. This offset the power-cycle counter enough to cause one case to make it to an erase, and the other to not. Normally, the difference between a successful/unsuccessful erase wouldn't change the result of a test, but in this case it offset the revision count used for wear-leveling, causing one run run expand the superblock and the other to not. This change to the filesystem would then propogate through the rest of the test, making it difficult to reproduce test failures. Fortunately the fix was to just make a copy of the default define dictionary. This should also prevent accidently mutating of dicts belonging to our caller. Oh, also fixed a buffer overflow in test_files.	2020-01-26 23:53:53 -06:00
Christopher Haster	b9d0695e0a	Rewrote explode_asserts.py to be more efficient Normally I wouldn't consider optimizing this sort of script, but explode_asserts.py proved to be terribly inefficient and dominated the build time for running tests. It was slow enough to be distracting when attempting to test patches while debugging. Just running explode_asserts.py was ~10x slower than the rest of the compilation process. After implementing a proper tokenizer and switching to a handwritten recursive descent parser, I was able to speed up explode_asserts.py by ~5x and make test compilation much more tolerable. I don't think this was a limitaiton of parsy, but rather switching to a recursive descent parser made it much easier to find the hotspots where parsing was wasting cycles (string slicing for one). It's interesting to note that while the assert patterns can be parsed with a LL(1) parser (by dumping seen tokens if a pattern fails), I didn't bother as it's much easier to write the patterns with LL(k) and parsing asserts is predicated by the "assert" string. A few other tweaks: - allowed combining different test modes in one run - added a --no-internal option - changed test_.py to start counting cases from 1 - added assert(memcmp(a, b) == 0) matching - added better handling of string escapes in assert messages time to run tests: before: 1m31.122s after: 0m41.447s	2020-01-26 23:53:53 -06:00
Christopher Haster	a5d614fbfb	Added tests for power-cycled-relocations and fixed the bugs that fell out The power-cycled-relocation test with random renames has been the most aggressive test applied to littlefs so far, with: - Random nested directory creation - Random nested directory removal - Random nested directory renames (this could make the threaded linked-list very interesting) - Relocating blocks every write (maximum wear-leveling) - Incrementally cycling power every write Also added a couple other tests to test_orphans and test_relocations. The good news is the added testing worked well, it found quite a number of complex and subtle bugs that have been difficult to find. 1. It's actually possible for our parent to be relocated and go out of sync in lfs_mkdir. This can happen if our predecessor's predecessor is our parent as we are threading ourselves into the filesystem's threaded list. (note this doesn't happen if our predecessor _is_ our parent, as we then update our parent in a single commit). This is annoying because it only happens if our parent is a long (>1 pair) directory, otherwise we wouldn't need to catch relocations. Fortunately we can reuse the internal open file/dir linked-list to catch relocations easily, as long as we're careful to unhook our parent whenever lfs_mkdir returns. 2. Even more surprising, it's possible for the child in lfs_remove to be relocated while we delete the entry from our parent. This can happen if we are our own parent's predecessor, since we need to be updated then if our parent relocates. Fortunately we can also hook into the open linked-list here. Note this same issue was present in lfs_rename. Fortunately, this means now all fetched dirs are hooked into the open linked-list if they are needed across a commit. This means we shouldn't need assumptions about tree movement for correctness. 3. lfs_rename("deja/vu", "deja/vu") with the same source and destination was broken and tried to delete the entry twice. 4. Managing gstate deltas when we lose power during relocations was broken. And unfortunately complicated. The issue happens when we lose power during a relocation while removing a directory. When we remove a directory, we need to move the contents of its gstate delta to another directory or we'll corrupt littlefs gstate. (gstate is an xor of all deltas on the filesystem). We used to just xor the gstate into our parent's gstate, however this isn't correct. The gstate isn't built out of the directory tree, but rather out of the threaded linked-list (which exists to make collecting this gstate efficient). Because we have to remove our dir in two operations, there's a point were both the updated parent and child can exist in threaded linked-list and duplicate the child's gstate delta. .--------. ->\| parent \|-. \| gstate \| \| .-\| a \|-' \| '--------' \| X <- child is orphaned \| .--------. '>\| child \|-> \| gstate \| \| a \| '--------' What we need to do is save our child's gstate and only give it to our predecessor, since this finalizes the removal of the child. However we still need to make valid updates to the gstate to mark that we've created an orphan when we start removing the child. This led to a small rework of how the gstate is handled. Now we have a separation of the gpending state that should be written out ASAP and the gdelta state that is collected from orphans awaiting deletion. 5. lfs_deorphan wasn't actually able to handle deorphaning/desyncing more than one orphan after a power-cycle. Having more than one orphan is very rare, but of course very possible. Fortunately this was just a mistake with using a break the in the deorphan, perhaps left from v1 where multiple orphans weren't possible? Note that we use a continue to force a refetch of the orphaned block. This is needed in the case of a half-orphan, since the fetched half-orphan may have an outdated tail pointer.	2020-01-26 23:45:54 -06:00
Christopher Haster	f4b6a6b328	Fixed issues with neighbor updates during moves The root of the problem was some assumptions about what tags could be sent to lfs_dir_commit. - The first assumption is that there could be only one splice (create/delete) tag at a time, which is trivially broken by the core commit in lfs_rename. - The second assumption is that there is at most one create and one delete in a single commit. This is less obvious but turns out to not be true in the case that we rename a file such that it overwrites another file in the same directory (1 delete for source file, 1 delete for destination). - The third assumption was that there was an ordering to the delete/creates passed to lfs_dir_commit. It may be possible to force all deletes to follow creates by rearranging the tags in lfs_rename, but this risks overflowing tag ids. The way the lfs_dir_commit first collected the "deletetag" and "createtag" broke all three of these assumptions. And because we lose the ordering information we can no longer apply the directory changes to open files correctly. The file ids may be shifted in a way that doesn't reflect the actual operations on disk. These problems were made worst by lfs_dir_commit cleaning up moves implicitly, which also creates deletes implicitly. While cleaning up moves in lfs_dir_commit may save some code size, it makes the commit logic much more difficult to implement correctly. This bug turned into pulling out a dead tree stump, roots and all. I ended up reworking how lfs_dir_commit updates open files so that it has less assumptions, now it just traverses the commit tags multiple times in order to update file ids after a successful commit in the correct order. This also got rid of the dir copy by carefully updating split dirs after all files have an up-to-date copy of the original dir. I also just removed the implicit move cleanup. It turns out the only commits that can occur before we have cleaned up the move is in lfs_fs_relocate, so it was simple enough to explicitly handle this case when we update our parent and pred during a relocate. Cases where we may need to fix moves: - In lfs_rename when we move a file/dir - In lfs_demove if we lose power - In lfs_fs_relocate if we have to relocate our parent and we find it had a pending move (or else the move will be outdated) - In lfs_fs_relocate if we have to relocate our predecessor and we find it had a pending move (or else the move will be outdated) Note the two cases in lfs_fs_relocate may be recursive. But lfs_fs_relocate can only trigger other lfs_fs_relocates so it's not possible for pending moves to spill out into other filesystem commits And of couse, I added several tests to cover these situations. Hopefully the rename-with-open-files logic should be fairly locked down now. found with initial fix by eastmoutain	2020-01-20 19:27:27 -06:00
Christopher Haster	9453ebd15d	Added/improved disk-reading debug scripts Also fixed a bug in dir splitting when there's a large number of open files, which was the main reason I was trying to make it easier to debug disk images. One part of the recent test changes was to move away from the file-per-block emubd and instead simulate storage with a single contiguous file. The file-per-block format was marginally useful at the beginning, but as the remaining bugs get more subtle, it becomes more useful to inspect littlefs through scripts that make the underlying metadata more human-readable. The key benefit of switching to a contiguous file is these same scripts can be reused for real disk images and can even read through /dev/sdb or similar. - ./scripts/readblock.py disk block_size block off data 00000000: 71 01 00 00 f0 0f ff f7 6c 69 74 74 6c 65 66 73 q.......littlefs 00000010: 2f e0 00 10 00 00 02 00 00 02 00 00 00 04 00 00 /............... 00000020: ff 00 00 00 ff ff ff 7f fe 03 00 00 20 00 04 19 ............... 00000030: 61 00 00 0c 00 62 20 30 0c 09 a0 01 00 00 64 00 a....b 0......d. ... readblock.py prints a hex dump of a given block on disk. It's basically just "dd if=disk bs=block_size count=1 skip=block \| xxd -g1 -" but with less typing. - ./scripts/readmdir.py disk block_size block1 block2 off tag type id len data (truncated) 0000003b: 0020000a dir 0 10 63 6f 6c 64 63 6f 66 66 coldcoff 00000049: 20000008 dirstruct 0 8 02 02 00 00 03 02 00 00 ........ 00000008: 00200409 dir 1 9 68 6f 74 63 6f 66 66 65 hotcoffe 00000015: 20000408 dirstruct 1 8 fe 01 00 00 ff 01 00 00 ........ readmdir.py prints info about the tags in a metadata pair on disk. It can print the currently active tags as well as the raw log of the metadata pair. - ./scripts/readtree.py disk block_size superblock "littlefs" version v2.0 block_size 512 block_count 1024 name_max 255 file_max 2147483647 attr_max 1022 gstate 0x000000000000000000000000 dir "/" mdir {0x0, 0x1} rev 3 v id 0 superblock "littlefs" inline size 24 mdir {0x77, 0x78} rev 1 id 0 dir "coffee" dir {0x1fc, 0x1fd} dir "/coffee" mdir {0x1fd, 0x1fc} rev 2 id 0 dir "coldcoffee" dir {0x202, 0x203} id 1 dir "hotcoffee" dir {0x1fe, 0x1ff} dir "/coffee/coldcoffee" mdir {0x202, 0x203} rev 1 dir "/coffee/warmcoffee" mdir {0x200, 0x201} rev 1 readtree.py parses the littlefs tree and prints info about the semantics of what's on disk. This includes the superblock, global-state, and directories/metadata-pairs. It doesn't print the filesystem tree though, that could be a different tool.	2020-01-20 19:27:27 -06:00
Christopher Haster	fb65057a3c	Restructured block devices again for better test exploitation Also finished migrating tests with test_relocations and test_exhaustion. The issue I was running into when migrating these tests was a lack of flexibility with what you could do with the block devices. It was possible to hack in some hooks for things like bad blocks and power loss, but it wasn't clean or easily extendable. The solution here was to just put all of these test extensions into a third block device, testbd, that uses the other two example block devices internally. testbd has several useful features for testing. Note this makes it a pretty terrible block device _example_ since these hooks look more complicated than a block device needs to be. - testbd can simulate different erase values, supporting 1s, 0s, other byte patterns, or no erases at all (which can cause surprising bugs). This actually depends on the simulated erase values in ramdb and filebd. I did try to move this out of rambd/filebd, but it's not possible to simulate erases in testbd without buffering entire blocks and creating an excessive amount of extra write operations. - testbd also helps simulate power-loss by containing a "power cycles" counter that is decremented every write operation until it calls exit. This is notably faster than the previous gdb approach, which is valuable since the reentrant tests tend to take a while to resolve. - testbd also tracks wear, which can be manually set and read. This is very useful for testing things like bad block handling, wear leveling, or even changing the effective size of the block device at runtime.	2020-01-20 19:27:24 -06:00
Christopher Haster	ecc2857c0e	Migrated bad-block tests Even with adding better reentrance testing, the bad-block tests are still very useful at isolating the block eviction logic. This also required rewriting a bit of the internal testing wirework to allow custom block devices which opens up quite a bit more straegies for testing.	2020-01-14 12:04:20 -06:00

1 2 3 4 5 ...

478 Commits