This allows detection of blocks that have already been written in the
running transaction so they can be recowed instead of modified again.
It is step one in trusting the transid field of the block pointers.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Here's a patch against the unstable tree that gets the code to build
against Linus's current tree (2.6.24-git12). This is needed as the
kobject/kset api has changed there.
I tried to make the smallest changes needed, and it builds and loads
successfully, but I don't have a btrfs volume anywhere (yet) to try to
see if things still work properly :)
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When we checkum file data during writepage, the checksumming is done one
page at a time, making it difficult to do bulk metadata modifications
to insert checksums for large ranges of the file at once.
This patch changes btrfs to checksum on a per-bio basis instead. The
bios are checksummed before they are handed off to the block layer, so
each bio is contiguous and only has pages from the same inode.
Checksumming on a bio basis allows us to insert and modify the file
checksum items in large groups. It also allows the checksumming to
be done more easily by async worker threads.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Yan Zheng noticed that we don't clear the extent state tree dirty and delalloc
bits when we clear the dirty bits on the page during file write.
This leads to csum errors later on.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Reduce CPU time searching for free blocks by optimizing find_first_extent_bit
Fix find_free_extent to make better use of the last_alloc hint. Before it
was often finding blocks just before the hint.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Btrfs set/get macros lose type information needed to avoid
unaligned accesses on sparc64.
ere is a patch for the kernel bits which fixes most of the
unaligned accesses on sparc64.
btrfs_name_hash is modified to return the hash value instead
of getting a return location via a (potentially unaligned)
pointer.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
A few codes were not properly updated for changes of extent map. This
may be the causes of "no csum found for inode" issue.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Now that delayed allocation accounting works, i_blocks accounting is changed
to only modify i_blocks when extents inserted or removed.
The fillattr call is changed to include the delayed allocation byte count
in the i_blocks result.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When freeing root block of a tree, btrfs_free_extent' parameter
'ref_generation' is from root block itseft. When freeing non-root
block, 'ref_generation' is from its parent. so when converting a
non-root block to root block, we must guarantee its generation is
equal to its parent's generation.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This makes searches for backrefs and backref insertion much more efficient
when there are many backrefs for a single extent
Signed-off-by: Chris Mason <chris.mason@oracle.com>
When truncating a inline extent, btrfs_drop_extents doesn't properly
handle the case "key.offset > inline_limit". This bug can only happen
when max line size is larger than 8K.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The end_bio routines are changed to take a pointer to the extent state
struct, and the state tree is walked in order to set/clear appropriate
bits as IO completes. This greatly reduces the number of rbtree searches
done by the end_bio handlers, and reduces lock contention.
The extent_io releasepage function is changed to avoid expensive searches
for locked state.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
There is now extent_map for mapping offsets in the file to disk and
extent_io for state tracking, IO submission and extent_bufers.
The new extent_map code shifts from [start,end] pairs to [start,len], and
pushes the locking out into the caller. This allows a few performance
optimizations and is easier to use.
A number of extent_map usage bugs were fixed, mostly with failing
to remove extent_map entries when changing the file.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
There were a few places that could cause duplicate extent insertion,
this adjusts the code that creates holes to avoid it.
lookup_extent_map is changed to correctly return all of the extents in a
range, even when there are none matching at the start of the range.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
test_range_bit doesn't properly handle the case: there's a hole at the
end of the range and there's no other extent_state after the range.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
btrfs_find_free_objectid may return a used objectid due to arithmetic
underflow. This bug may happen when parameter 'root' is tree root, so
it may cause serious problems when creating snapshot or sub-volume.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Using ilookup5 during data=ordered writeback could deadlock on I_LOCK. This
saves a pointer to the inode instead.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This patch adds readonly inode flag support. A file with this flag
can't be modified, but can be deleted.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
While shrinking the FS, the allocation functions need to make sure
they don't try to allocate bytes past the end of the FS.
nodatacow needed an extra check to force cows when the existing extents are
past the end of the FS.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
It is very difficult to create a consistent snapshot of the btree when
other writers may update the btree before the commit is done.
This changes the snapshot creation to happen during the commit, while
no other updates are possible.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This forces file data extents down the disk along with the metadata that
references them. The current implementation is fairly simple, and just
writes out all of the dirty pages in an inode before the commit.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The shrinking code used btrfs_next_leaf to find the next item, but
this does not cow the blocks it touches. This fix calls search_slot after
finding the next item to do appropriate cow and balancing.
Signed-off-by: Chris Mason <chris.mason@oracle.com>