Just before reading a leaf, btrfs scans the node for blocks that are
close by and reads them too. It tries to build up a large window
of IO looking for blocks that are within a max distance from the top
and bottom of the IO window.
This patch changes things to just look for blocks within 64k of the
target block. It will trigger less IO and make for lower latencies on
the read size.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
[XFS] Long btree pointers are still 64 bit on disk
On 32 bit machines with CONFIG_LBD=n, XFS reduces the
in memory size of xfs_fsblock_t to 32 bits so that it
will fit within 32 bit addressing. However, the disk format
for long btree pointers are still 64 bits in size.
The recent btree rewrite failed to take this into account
when initialising new btree blocks, setting sibling pointers
to NULL and checking if they are NULL. Hence checking whether
a 64 bit NULL was the same as a 32 bit NULL was failingi
resulting in NULL sibling pointers failing to be detected
correctly. This showed up as WANT_CORRUPTED_GOTO shutdowns
in xfs_btree_delrec.
Fix this by making all the comparisons and setting of long
pointer btree NULL blocks to the disk format, not the
in memory format. i.e. use NULLDFSBNO.
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Reported-by: Jacek Luczak <difrost.kernel@gmail.com>
Reported-by: Danny ter Haar <dth@dth.net>
Tested-by: Jacek Luczak <difrost.kernel@gmail.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Felix Blyakher <felixb@sgi.com>
dlm_posix_get fills out the relevant fields in the file_lock before
returning when there is a lock conflict, but doesn't clean out any of
the other fields in the file_lock.
When nfsd does a NFSv4 lockt call, it sets the fl_lmops to
nfsd_posix_mng_ops before calling the lower fs. When the lock comes back
after testing a lock on GFS2, it still has that field set. This confuses
nfsd into thinking that the file_lock is a nfsd4 lock.
Fix this by making DLM reinitialize the file_lock before copying the
fields from the conflicting lock.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
We should use the original copy of the file_lock, fl, instead
of the copy, flc in the lockd notify callback. The range in flc has
been modified by posix_lock_file(), so it will not match a copy of the
lock in lockd.
Signed-off-by: David Teigland <teigland@redhat.com>
Now that bmap support is gone, this is the only way to get extent
mappings for userland. These are still not valid for IO, but they
can tell us if a file has holes or how much fragmentation there is.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Swapfiles use bmap to build a list of extents belonging to the file,
and they assume these extents won't change over the life of the file.
They also use resulting list to do IO directly to the block device.
This causes problems for btrfs in a few ways:
btrfs returns logical block numbers through bmap, and these are not suitable
for IO. They might translate to different devices, raid etc.
COW means that file block mappings are going to change frequently.
Using swapfiles on btrfs will lead to corruption, so we're avoiding the
problem for now by dropping bmap support entirely. A later commit
will add fiemap support for people that really want to know how
a file is laid out.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
To improve performance, btrfs_sync_log merges tree log sync
requests. But it wrongly merges sync requests for different
tree logs. If multiple tree logs are synced at the same time,
only one of them actually gets synced.
This patch has following changes to fix the bug:
Move most tree log related fields in btrfs_fs_info to
btrfs_root. This allows merging sync requests separately
for each tree log.
Don't insert root item into the log root tree immediately
after log tree is allocated. Root item for log tree is
inserted when log tree get synced for the first time. This
allows syncing the log root tree without first syncing all
log trees.
At tree-log sync, btrfs_sync_log first sync the log tree;
then updates corresponding root item in the log root tree;
sync the log root tree; then update the super block.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
replace_one_extent searches tree leaves for references to a given extent. It
stops searching if it goes beyond the last possible position.
The last possible position is computed by adding the starting offset of a found
file extent to the full size of the extent. The code uses physical size of the
extent as the full size. This is incorrect when compression is used.
The fix is get the full size from ram_bytes field of file extent item.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Change one typedef to a regular enum, and remove an unused one.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
btrfs_extent_post_op calls finish_current_insert and del_pending_extents. They
both may enter infinite loops.
finish_current_insert enters infinite loop if it only finds some backrefs to
update. The fix is to check for pending backref updates before restarting the
loop.
The infinite loop in del_pending_extents is due to a the skipped variable
not being properly reset before looping around.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com>
Merge list_for_each* and list_entry to list_for_each_entry*
Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>