The codes that fixup the right leaf and the codes that dirty the
extnet buffer use the variable 'right_nritems' , both of them expect
'right_nritems' is the number of items in right leaf after the push.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
There was a slight problem with ACL's returning EINVAL when you tried to set an
ACL. This isn't correct, we should be returning EOPNOTSUPP, so I did a very
ugly thing and just commented everybody out and made them return EOPNOTSUPP.
This is only temporary, I'm going back to implement ACL's, but Chris wants to
push out a release so this will suffice for now.
Also Yan suggested setting reada to -1 in the delete case to enable backwards
readahead, and in the listxattr case I moved path->reada = 2; to after the if
(!path) check so we can avoid a possible null dereference. Thank you,
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch adds a new parameter 'full_scan' to 'find_search_start',
thereby 'find_search_start' can know whether 'find_free_extent' is in
full scan phrase. I feel that 'find_search_start' should skip calling
'btrfs_find_block_group' when 'find_free_extent' is in full scan
phrase. In my test on a 2GB volume, Oops occurs when space usage is
about 76%. After apply the patch, Oops occurs when space usage is
near 100%.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch adds a helper function 'update_pinned_extents' to
extent-tree.c. The usage of the helper function is similar to
'update_block_group', the last parameter of the function indicates
pin vs unpin.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When 'item_end' is equal to 'inode->i_size', 'found_type' is updated
and current item is skipped. This behavior is correct for extent item,
but incorrect for csum item. For example, there is a csum item with
'offset == 0'. When deleting the inode, 'inode->i_size' is set to 0,
so the csum item isn't deleted.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Don't set hint_byte to EXTENT_MAP_INLINE when 'end == extent_end' or
'start == key.offset' . The inline extent will be truncated in these
cases.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When calculating the size of inline extent, inode->i_size should also
be take into consideration, otherwise sys_write may drop some data
silently. You can test this bug by:
#dd if=/dev/zero bs=4k count=1 of=test_file
#dd if=/dev/zero bs=2k count=1 of=test_file conv=notrunc
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When pin_down_bytes decides not to pin a block because it was from the
current transaction, make sure the in memory cache of free extents is updated
Signed-off-by: Chris Mason <chris.mason@oracle.com>
There is a 'finish_wait', but no 'prepare_to_wait' . So I think that
the 'prepare_to_wait' is missing. The second change is according to
the name of variable.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The fixes do a number of things:
1) Most btrfs_drop_extent callers will try to leave the inline extents in
place. It can truncate bytes off the beginning of the inline extent if
required.
2) writepage can now update the inline extent, allowing mmap writes to
go directly into the inline extent.
3) btrfs_truncate_in_transaction truncates inline extents
4) extent_map.c fixed to not merge inline extent mappings and hole
mappings together
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Execution should goto label 'insert' when 'btrfs_next_leaf' return a
non-zero value, otherwise the parameter 'slot' for
'btrfs_item_key_to_cpu' may be out of bounds. The original codes jump
to label 'insert' only when 'btrfs_next_leaf' return a negative
value.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
1. Reorder kmap and the test for 'page != NULL'
2. Zero-fill rest area of a block when inline extent isn't big enough.
3. Do not insert extent_map into the map tree when page == NULL.
(If insert the extent_map into the map tree, subsequent read requests
will find it in the map tree directly and the corresponding inline
extent data aren't copied into page by the the get_extent function.
extent_read_full_page can't handle that case)
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The ENOTEMPTY check in btrfs_rmdir isn't reliable. It's possible that
the backward search finds . or .. at first, then some other directory
entry. In that case, btrfs_rmdir delete . or .. improperly. The
patch also fixes a fs_mutex unlock issue in btrfs_rmdir.
--
Signed-off-by: Chris Mason <chris.mason@oracle.com>
1) Forced defrag wasn't working properly (btrfsctl -d) because some
cache only checks were incorrect.
2) Defrag only the leaves unless in forced defrag mode.
3) Don't use complex logic to figure out if a leaf is needs defrag
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This modifies inline extent size calculation, so that
insert_inline_extent can handle the case that parameter 'offset' is
not zero; it also a few codes to zero uninitialized area in inline
extent.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When making room for a new item, it is ok to create an empty leaf, but
when making room to extend an item, split_leaf needs to make sure it
keeps the item we're extending in the path and make sure we don't end up
with an empty leaf.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This reduces the number of calls to btrfs_extend_item and greatly lowers
the cpu usage while writing large files.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Just use kobject_set_name(), that works in all kernels (I think...).
Kernels newer than 2.6.23 currently fail with:
/home/axboe/git/btrfs/btrfs-unstable/sysfs.c:188: error: unknown field
'name' specified in initializer
Signed-off-by: Chris Mason <chris.mason@oracle.com>
endio handling is typically called with interrupts disabled, but can
also be called with it enabled. So save interrupts before using KM_IRQ0
to be completely safe.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
It now returns void and it is never called for partial completions, so
the bio->bi_size check must go.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This allows us to defrag huge directories, but skip the expensive defrag
case in more common usage, where it does not help as much.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
I think check whether extent is a hole before update 'inode->i_blocks'
is unconditional required. (original codes check it only when
del_item isn't equal to 0)
Signed-off-by: Chris Mason <chris.mason@oracle.com>
found_type has already been decreased by codes above the change, I
think decrease it by one again doesn't make sense.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
find_free_extent would fail to wrap around to the start of the drive because
it was doing the enospc case checking twice in some cases, causing it
to return -ENOSPC early.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
btrfs_btree_balance_dirty is changed to pass the number of pages dirtied
for more accurate dirty throttling. This lets the VM make better decisions
about when to force some writeback.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Cache block group was overly complex and missed free blocks at the very start
of the group. This patch simplifies things significantly.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Add a helper per ioctl function to make the code more readable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
No reason to grab the BKL before calling into the btrfs ioctl code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Dead roots are trees left over after a crash, and they were either in the
process of being removed or were waiting to be removed when the box crashed.
Before, a search of the entire tree of root pointers was done on mount
looking for dead roots. Now, the search is done the first time we load
a root.
This makes mount faster when there are a large number of snapshots, and it
enables the block accounting code to properly update the block counts on
the latest root as old versions of the root are reaped after a crash.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
XFS updates the ondisk inode size only after the data I/O has finished,
so it needs a hook when the writepage end_bio handler has finished.
Might not be worth applying as-is as the per-page callback is very
ineffcient. What XFS really wants is a callback when writeout of a
whole extent has completed. This delayed i_size updates scheme might
be worthwile for btrfs aswell, btw.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The writepage_io is not mandatory, e.g. my port of xfs to the extent_map
code does not have one for now. So handle a NULL pointer gracefully
here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
generic_bmap is completely trivial, while the extent to bh mapping in
btrfs is rather complex. So provide a extent_bmap instead that takes
a get_extent callback and can be used by filesystem using the extent_map
code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The bio completion handlers can be run in any context, e.g. when using
the old ide driver they run in hardirq context with irqs disabled so
lockdep rightfully warns about using write_lock_irq useage in these
handlers.
This patch switches clear_extent_bit and set_extent_bit to
write_lock_irqsave to fix this problem.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>