Add F2FS compression support for sload
* Support file extension filter, either default-accept or default-deny
policy
* Support choice of compression algorithm, LZO (version 2) or LZ4
(default)
* Support custom log of cluster size
* Support minimum number of compressed blocks per cluster (default 1).
A cluster will not be compressed if the number can not be met.
* suuport -r (read-only) option
This releases compressed blocks to secure free space in advance. Note that,
all compressed files will have the immutable bit.
* Added manpage update
* Remove unecessary qbuf allocation (Jaegeuk, suggested by Satya)
Signed-off-by: Robin Hsu <robinhsu@google.com>
[Jaegeuk Kim: fix some bugs and refactor names]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When error were found, we won't need to do any initialization but
just quit.
Signed-off-by: Robin Hsu <robinhsu@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
'ret' should not have been used here: otherwise, it would be wrongly used
as the error code and then be returned from main().
Signed-off-by: Robin Hsu <robinhsu@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Eric reported:
Commit 7a22451bc2 ("fsck.f2fs: fix to check validation of i_xattr_nid")
This commit caused a regression where 'dump.f2fs -i <inode> <device>'
now segfaults if the inode has any extended attributes.
It's because read_all_xattrs() now calls fsck_sanity_check_nid(), which
eventually dereferences f2fs_fsck::main_area_bitmap, which is NULL.
I'm not sure what was intended here.
Here's the output from gdb:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7f750fa in f2fs_test_bit (nr=1024, p=0x0) at libf2fs.c:304
304 return (mask & *addr) != 0;
(gdb) bt
ntype=TYPE_XATTR, ni=0x7fffffffdd20) at fsck.c:449
ntype=TYPE_XATTR, ni=0x7fffffffdd20) at fsck.c:495
fsck_sanity_check_nid() should only called from fsck.f2fs context, rather
than dump.f2fs, otherwise it may cause dereferencing structure fields of
fsck incorrectly.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Inodes aren't allowed to have the casefold flag set when they aren't
directories, or if the filesystem superblock doesn't have the casefold
feature enabled. Clear any such unexpected casefold flags.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Need to remove "/" of mount point name from the file path name
when mount point is "/". Otherwise, we will transfer file path
name whose first two characters are like "//" to fs_config function.
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
NVM Express Zoned Namespace (ZNS) devices can have zone-capacity(zc) less
than the zone-size. ZNS defines a per zone capacity which can be equal
or less than the zone-size. Zone-capacity is the number of usable blocks
in the zone. If zone-capacity is less than zone-size, then the segments
which start at/after zone-capacity are considered unusable. Only those
segments which start before the zone-capacity are considered as usable
and added to the free_segment_count and free_segment_bitmap of the kernel.
In such cases, the filesystem should not write/read beyond the
zone-capacity.
Update the super block with the usable number of blocks and free segment
count in the ZNS device zones, if zone-capacity is less than zone-size.
Set reserved segment count and overprovision ratio based on the usable
segments in the zone.
Allow fsck to find the free_segment_count based on the zone-capacity and
compare with checkpoint values.
Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
[Jaegeuk Kim: add UNUSED to is_usable_seg()]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Norbert Lange reported:
"
$ fsck.f2fs -a /dev/mmcblk0p5; echo $?
Info: Fix the reported corruption.
Info: Mounted device!
Info: Check FS only on RO mounted device
Error: Failed to open the device!
255
"
Michael Laß reminds:
"
I think the return value is exactly the problem here. See fsck(8) (
https://linux.die.net/man/8/fsck) which specifies the return values.
Systemd looks at these and decides how to proceed:
a859abf062/src/fsck/fsck.c (L407)
That means, if fsck.f2fs returns 255, then
the FSCK_SYSTEM_SHOULD_REBOOT bit is set and systemd will reboot.
"
So the problem here is fsck.f2fs didn't return correct value to userspace
apps, result in later unexpected behavior of rebooting, let's fix this.
Reported-by: Norbert Lange <nolange79@gmail.com>
Reported-by: Michael Laß <bevan@bi-co.net>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix FSCK_ERROR_CORRECTED]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch removes random bytes in sum_blk when loading a new directory.
Signed-off-by: Theotime Combes <tcombes@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Speed up fsck in auto-fix mode by splitting
build_segment_manager() into two disjoint parts:
early_build_segment_manager(), and
late_build_segment_manager(),
where in some cases (when !need_fsync_data_record(), or cannot
find_fsync_inode()), late_build_segment_manager() won't be needed in
auto-fix mode. This speeds up a little bit in those cases.
Signed-off-by: Robin Hsu <robinhsu@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Split f2fs_init_nid_bitmap() into two disjoint parts:
f2fs_early_init_nid_bitmap(), and
f2fs_late_init_nid_bitmap(),
where f2fs_late_init_nid_bitmap() won't be called in auto-fix mode, when
no errors were found.
f2fs_late_init_nid_bitmap() contains the loop to create NID bitmap from
NAT. which is the main reason for slow fsck.
Signed-off-by: Robin Hsu <robinhsu@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
A maliciously corrupted file systems can trigger buffer overruns in
the quota code used by fsck.
To fix it, quota file sizes are checked against real allocated
block index tables (inode, direct nodes, indirect nodes, double
indirect nodes). If the size mismatches, the quota file is considered
corrupted and will be regenerated.
Signed-off-by: Robin Hsu <robinhsu@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
fsck.f2fs reports corruption if the filesystem contains any encrypted +
casefolded directories with any substantial number of dentries:
[ASSERT] (f2fs_check_dirent_position:1374) --> Wrong position of dirent pino:8, name:۟�[I�^*�(�5~�}�D��#]7�8�ˎ�, level:1, dir_level:0, pgofs:4, correct range:[2, 3]
The problem is that f2fs_check_dirent_position() computes the wrong hash
for encrypted+casefolded dentries. It's not actually possible for it to
compute the correct hash, because it would need the encryption key.
However, the on-disk dentry already contains the hash code, and its
correctness was already verified by f2fs_check_hash_code() if possible.
So, make f2fs_check_dirent_position() use the hash code from disk rather
than recompute it.
Also fix it to print the filename in human-readable form.
This bug was causing 'kvm-xfstests -c f2fs/encrypt -g casefold'
to fail with the test_dummy_encryption_v2 and encryption+casefolding
kernel patches applied.
Fixes: 7f3767ee8d ("f2fs-tools: Casefolded Encryption support")
Cc: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
While dumping files during fsck, print_inode_info() didn't check
sanity of inode, so insane i_extra_isize could cause overflow
when printing i_addr, to avoid that, let's add a check condition.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Otherwise, if block address is invalid, we may access invalid
memory address in is_sit_bitmap_set().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There are totally ADDRS_PER_INODE() blkaddrs in .i_addr, fix to
print all of them.
In addition, use get_extra_isize() rather than __get_extra_isize()
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add support for new CP flag CP_RESIZEFS_FLAG set during online
resize FS. If SPO happens after SB is updated but CP isn't, then
allow fsck to fix it.
The fsck errors without this fix -
Info: CKPT version = 6ed7bccb
Wrong user_block_count(2233856)
[f2fs_do_mount:3365] Checkpoint is polluted
The subsequent mount failure without this fix -
[ 11.294650] F2FS-fs (sda8): Wrong user_block_count: 2233856
[ 11.300272] F2FS-fs (sda8): Failed to get valid F2FS checkpoint
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We should not account COMPRESS_ADDR as reserved block once we
released compress block on compress inode.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
clock_t time is per-process time, which is not wall time. For example,
if the CPU is shared by other processes, clock_t time may advance slower
than wall clock. On the other hand, if the current process is
multithreaded and more than one execution core is available, clock_t
time may advance faster than wall clock.
this CL changes it to use CLOCK_BOOTTIME (Linux-specific)
Signed-off-by: Wei Wang <wvw@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This is no longer a transitive include of android_filesystem_config.h
Signed-off-by: Tom Cherry <tomcherry@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This adds support for casefolded and encrypted directories.
Fsck cannot check the hashes of such directories because it would
require access to the encryption key to generate the siphash
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
resize.f2fs has already supported large_nat_bitmap feature, but has no
option to turn on it.
This change add a new '-i' option to control turning on it.
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds to print all valid blkaddrs in inode's i_addr field,
besides, it also adds to print meaning of specific blkaddr(flag).
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds to support compression, introducing '-O compression'
option to enable this feature in image.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Added command line options -c <num_cache_entry> and -m <max_hash_collision>
to activate cache for fsck. It may significantly speed up fsck.
Signed-off-by: Robin Hsu <robinhsu@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Given this option, fsck.f2fs does not run fsck forcefully, even if kernel
is updated. Android devices will do --kernel-check by default, while others
will not.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch tries to fix memory leak problem reported in Android.
Fixed the following problems in fsck.f2fs, make_f2fs and sload_f2fs:
* reuse of same pointer without clean-up
* exit on error without clean-up
Signed-off-by: Robin Hsu <robinhsu@google.com>
[Jaegeuk Kim: add missing definition to avoid build error]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
ckpt->entries is initialized by fsck_init(), but we tried to access it during
f2fs_do_mount().
The call sequence is:
- f2fs_do_mount
- record_fsync_data
- traverse_dnodes
- do_record_fsync_data
- ADDRS_PER_PAGE
- get_node_info
- node_info_from_raw_nat(fsck->entries[nid])
- do_fsck
- fsck_init
- build_nat_area_bitmap
- fsck->entries = calloc(fsck->nr_nat_entries);
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
To catch bugs in write pointer handling code for zoned block devices,
have fsck check consistency of write pointers of non-open zones, that
current segments do not point to. Check two items comparing write pointer
positions with valid block maps in SIT.
The first item is check for zones with no valid blocks. When there is no
valid blocks in a zone, the write pointer should be at the start of the
zone. If not, next write operation to the zone will cause unaligned write
error. If write pointer is not at the zone start, reset the zone to move
the write pointer to the zone start.
The second item is check between write pointer position and the last
valid block in the zone. It is unexpected that the last valid block
position is beyond the write pointer. In such a case, report as the bug.
Fix is not required for such zone, because the zone is not selected for
next write operation until the zone get discarded.
In the same manner as the consistency check for current segments, do the
check and fix twice: at the beginning of do_fsck() to avoid unaligned
write error during fsck, and at fsck_verify() to reflect meta data
updates by fsck.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
On sudden f2fs shutdown, write pointers of zoned block devices can go
further but f2fs meta data keeps current segments at positions before the
write operations. After remounting the f2fs, this inconsistency causes
write operations not at write pointers and "Unaligned write command"
error is reported.
To avoid the error, have f2fs.fsck check consistency of write pointers
of open zones that current segments point to. Compare each current
segment's position and the write pointer position of the open zone. If
inconsistency is found and 'fix_on' flag is set, assign a new zone to the
current segment and check the newly assigned zone has write pointer at
the zone start. Leave the original zone as is to keep data recorded in
it.
To care about fsync data, refer each seg_entry's ckpt_valid_map to get
the last valid block in the zone. If the last valid block is beyond the
current segments position, fsync data exits in the zone. In case fsync
data exists, do not assign a new zone to the current segment not to lose
the fsync data. It is expected that the kernel replay the fsync data and
fix the write pointer inconsistency at mount time.
Also check consistency between write pointer of the zone the current
segment points to with valid block maps of the zone. If the last valid
block is beyond the write pointer position, report to indicate a bug. If
'fix_on' flag is set, assign a new zone to the current segment.
When inconsistencies are found, turn on 'bug_on' flag in fsck_verify() to
ask users to fix them or not. When inconsistencies get fixed, turn on
'force' flag in fsck_verify() to enforce fixes in following checks.
This check and fix is done twice. The first is done at the beginning of
do_fsck() function so that other fixes can reflect the current segment
modification. The second is done in fsck_verify() to reflect updated meta
data by other fixes.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fsck checks fsync data when UMOUNT flag is not set. When the f2fs was not
cleanly unmouted, UMOUNT flag is not recorded in meta data and fsync data
can be left in the f2fs. The first fsck run checks fsync data to reflect
it on quota status recovery. After that, fsck writes UMOUNT flag in the
f2fs meta data so that second fsck run can skip fsync data check.
However, fsck for zoned block devices need to care fsync data for all
fsck runs. The first fsck run checks fsync data, then fsck can check
write pointer consistency with fsync data. However, since second fsck run
does not check fsync data, fsck detects write pointer at fsync data end
is not consistent with f2fs meta data. This results in meta data update
by fsck and fsync data gets lost.
To have fsck check fsync data always for zoned block devices, introduce
need_fsync_data_record() helper function which returns boolean to tell
if fsck needs fsync data check or not. For zoned block devices, always
return true. Otherwise, return true if UMOUNT flag is not set in CP.
Replace UMOUNT flag check codes for fsync data with the function call.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When fsck updates one of the current segments, update_curseg_info() is
called specifying a single current segment as its argument. However,
update_curseg_info() calls move_curseg_info() function which updates all
six current segments. Then update_curseg_info() for a single current
segment moves all current segments.
This excessive current segment move causes an issue when a new zone is
assigned to a current segment because of write pointer inconsistency.
Even when a current segment has write pointer inconsistency, all other
current segments should not be moved because they may have fsync data
at their positions.
To avoid the excessive current segment move, introduce
move_one_curseg_info() function which does same work as
move_curseg_info() only for a single current segment. Call
move_one_curseg_info() in place of move_curseg_info() from
update_curseg_info().
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When fsck needs to assign a new area to a curreng segment, it calls
find_next_free_block() function to find a new block to assign. For zoned
block devices, fsck checks write pointer consistency with current
segments' positions. In case a curseg is inconsistent with the
write pointer of the zone it points to, fsck should assign not a new free
block but a new free zone/section with write pointer at the zone start,
so that next write to the current segment succeeds without error.
To extend find_next_free_block() function's capability to find not only
a block but also a zone/section, add new_sec flag to
find_next_free_block() function. When new_sec flag is true, skip check
for each block's availability so that the check is done with unit of
section. Note that it is ensured that one zone has one section for f2fs
on zoned block devices. Then the logic to find a new free section is good
to find a new free zone.
When fsck target devices have ZONED_HM model, set new_sec flag true to
call find_next_free_block() from move_curseg_info(). Set curseg's
alloc_type not SSR but LFS for the devices with ZONED_HM model, because
SSR block allocation is not allowed for zoned block devices. Also skip
relocate_curseg_offset() for the devices with ZONED_HM model for the
same reason.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For multi-device F2FS, we should check if the sum of total_segments from
all devices matches segment_count.
Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previously, we don't allow block allocation on unclean umounted image,
result in failing to repair quota system file.
In this patch, we port most recovery codes from kernel to userspace
tools, so that on unclean image, during fsck initialization, we will
record all data/node block address we may recover in kernel, and
then during allocation of quota file repair, we can skip those blocks
to avoid block use conflict.
Eventually, if free space is enough, we can repair the quota system
file on an unclean umounted image.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: remove unnecessary parameter]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Change logic as below:
- fix to account block/node/inode stats correctly in reserve_new_block()
- check overflow in reserve_new_block()
- move stat update from f2fs_alloc_nid() to reserve_new_block()
- adjust write_checkpoint() to update stat for sload/fsck
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
if we new node block in fsck flow, we need to update
the valid_node_cnt at the same time.
Signed-off-by: Lihong Kou <koulihong@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
inode.i_blocks includes inode, xnode and data block count, so, only
fix in below condition:
- i_blocks := 3 (inode + xnode + data_block)
- i_blocks := 2 (inode + data_block)
In addition, it recovers symlink's i_size to 4k rather than i_blocks *
4k.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>