linux/fs
Miao Xie 361048f586 Btrfs: fix full backref problem when inserting shared block reference
If we create several snapshots at the same time, the following BUG_ON() will be
triggered.

	kernel BUG at fs/btrfs/extent-tree.c:6047!

Steps to reproduce:
 # mkfs.btrfs <partition>
 # mount <partition> <mnt>
 # cd <mnt>
 # for ((i=0;i<2400;i++)); do touch long_name_to_make_tree_more_deep$i; done
 # for ((i=0; i<4; i++))
 > do
 > mkdir $i
 > for ((j=0; j<200; j++))
 > do
 > btrfs sub snap . $i/$j
 > done &
 > done

The reason is:
Before transaction commit, some operations changed the fs tree and new tree
blocks were allocated because of COW. We used the implicit non-shared back
reference for those newly allocated tree blocks because they were not shared by
two or more trees.

And then we created the first snapshot for the fs tree, according to the back
reference rules, we also used implicit back refs for the child tree blocks of
the root node of the fs tree, now those child nodes/leaves were shared by two
trees.

Then We didn't deal with the delayed references, and continued to change the fs
tree(created the second snapshot and inserted the dir item of the new snapshot
into the fs tree). According to the rules of the back reference, we added full
back refs for those tree blocks whose parents have be shared by two trees.
Now some newly allocated tree blocks had two types of the references.

As we know, the delayed reference system handles these delayed references from
back to front, and the full delayed reference is inserted after the implicit
ones. So when we dealt with the back references of those newly allocated tree
blocks, the full references was dealt with at first. And if the first reference
is a shared back reference and the tree block that the reference points to is
newly allocated, It would be considered as a tree block which is shared by two
or more trees when it is allocated and should be a full back reference not a
implicit one, the flag of its reference also should be set to FULL_BACKREF.
But in fact, it was a non-shared tree block with a implicit reference at
beginning, so it was not compulsory to set the flags to FULL_BACKREF. So BUG_ON
was triggered.

We have several methods to fix this bug:
1. deal with delayed references after the snapshot is created and before we
   change the source tree of the snapshot. This is the easiest and safest way.
2. modify the sort method of the delayed reference tree, make the full delayed
   references be inserted before the implicit ones. It is also very easy, but
   I don't know if it will introduce some problems or not.
3. modify select_delayed_ref() and make it select the implicit delayed reference
   at first. This way is not so good because it may wastes CPU time if we have
   lots of delayed references.
4. set the flags to FULL_BACKREF, this method is a little complex comparing with
   the 1st way.

I chose the 1st way to fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
2012-10-01 15:19:10 -04:00
..
9p 9p: Push file_update_time() into v9fs_vm_page_mkwrite() 2012-07-31 01:02:46 +04:00
adfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
affs affs: use memweight() 2012-07-30 17:25:16 -07:00
afs VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
autofs4 autofs4 - fix expire check 2012-08-17 06:56:39 -07:00
befs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
bfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
btrfs Btrfs: fix full backref problem when inserting shared block reference 2012-10-01 15:19:10 -04:00
cachefiles fs: cachefiles: add support for large files in filesystem caching 2012-07-30 17:25:21 -07:00
ceph ceph: avoid divide by zero in __validate_layout() 2012-08-21 15:55:28 -07:00
cifs cifs: fix return value in cifsConvertToUTF16 2012-09-18 15:35:25 -05:00
coda don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
configfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
cramfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
debugfs debugfs: fix u32_array race in format_array_alloc 2012-09-21 11:48:05 -07:00
devpts VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
dlm dlm: fix missing dir remove 2012-07-16 14:24:43 -05:00
ecryptfs eCryptfs: Copy up attributes of the lower target inode after rename 2012-09-14 09:36:03 -07:00
efs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
exofs Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd 2012-08-03 13:24:07 -07:00
exportfs switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
ext2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
ext3 ext3: Fix fdatasync() for files with only i_size changes 2012-09-04 00:04:43 +02:00
ext4 The following are all bug fixes and regressions. The most notable are 2012-08-17 08:04:47 -07:00
fat Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
freevxfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
fscache
fuse fuse: fix retrieve length 2012-09-04 18:45:54 +02:00
gfs2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes 2012-09-14 18:05:14 -07:00
hfs hfs: nuke write_super from comments 2012-08-04 12:15:38 +04:00
hfsplus hfsplus: use -ENOMEM when kzalloc() fails 2012-07-30 17:25:19 -07:00
hostfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
hpfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
hppfs switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
hugetlbfs hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages 2012-07-31 18:42:40 -07:00
isofs Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2012-07-24 17:40:44 -07:00
jbd jbd: don't write superblock when unmounting an ro filesystem 2012-08-15 13:53:30 +02:00
jbd2 The following are all bug fixes and regressions. The most notable are 2012-08-17 08:04:47 -07:00
jffs2 don't expose I_NEW inodes via dentry->d_inode 2012-07-23 00:00:58 +04:00
jfs don't expose I_NEW inodes via dentry->d_inode 2012-07-23 00:00:58 +04:00
lockd close the race in nlmsvc_free_block() 2012-09-22 20:48:20 -04:00
logfs Pull request from git://github.com/prasad-joshi/logfs_upstream.git 2012-08-26 10:14:11 -07:00
minix minixfs: fix block limit check 2012-07-30 17:25:19 -07:00
ncpfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
nfs NFS: fsync() must exit with an error if page writeback failed 2012-09-11 15:38:32 -04:00
nfs_common
nfsd nfsd4: fix security flavor of NFSv4.0 callback 2012-08-20 18:38:36 -04:00
nilfs2 nilfs2: nuke write_super from comments 2012-08-04 12:15:38 +04:00
nls nls: fix (and rename) mac NLS table files and config options 2012-06-01 19:51:22 -07:00
notify switch dentry_open() to struct path, make it grab references itself 2012-07-23 00:01:29 +04:00
ntfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
ocfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
omfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
openpromfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
proc fs/proc: fix potential unregister_sysctl_table hang 2012-09-17 10:32:03 -07:00
pstore pstore/ram: Make tracing log versioned 2012-07-17 16:48:09 -07:00
qnx4 qnx4fs: use memweight() 2012-07-30 17:25:16 -07:00
qnx6 stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
quota quota: Move down dqptr_sem read after initializing default warn[] type at __dquot_alloc_space(). 2012-08-15 00:22:57 +02:00
ramfs don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
reiserfs reiserfs: fix deadlocks with quotas 2012-08-15 00:22:57 +02:00
romfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
squashfs stop passing nameidata to ->lookup() 2012-07-14 16:34:32 +04:00
sysfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
sysv fs/sysv: stop using write_super and s_dirt 2012-07-22 23:58:12 +04:00
ubifs UBIFS: fix error messages spelling 2012-08-22 17:41:09 +03:00
udf udf: Fix data corruption for files in ICB 2012-09-05 16:06:03 +02:00
ufs fs/ufs: get rid of write_super 2012-07-22 23:58:16 +04:00
xfs xfs: stop the sync worker before xfs_unmountfs 2012-09-18 16:51:26 -05:00
aio.c aio: now fput() is OK from interrupt context; get rid of manual delayed __fput() 2012-07-22 23:57:59 +04:00
anon_inodes.c
attr.c notify_change(): check that i_mutex is held 2012-07-14 16:35:42 +04:00
bad_inode.c don't pass nameidata to ->create() 2012-07-14 16:34:47 +04:00
binfmt_aout.c VM: add "vm_mmap()" helper function 2012-04-20 17:29:13 -07:00
binfmt_elf_fdpic.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2012-05-23 17:42:39 -07:00
binfmt_elf.c binfmt_elf: switch elf_map() to vm_mmap/vm_munmap 2012-05-30 21:04:55 -04:00
binfmt_em86.c
binfmt_flat.c binfmt_flat: use vm_munmap, we are missing ->mmap_sem there 2012-05-30 21:04:56 -04:00
binfmt_misc.c vfs: Rename end_writeback() to clear_inode() 2012-05-06 13:43:41 +08:00
binfmt_script.c
binfmt_som.c VM: add "vm_mmap()" helper function 2012-04-20 17:29:13 -07:00
bio-integrity.c
bio.c Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2012-08-25 11:36:43 -07:00
block_dev.c fs/block-dev.c:fix performance regression in O_DIRECT writes to md block devices 2012-08-02 09:50:39 +02:00
buffer.c block: replace __getblk_slow misfix by grow_dev_page fix 2012-08-23 12:17:36 +02:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c The following text was taken from the original review request: 2012-03-24 10:24:31 -07:00
compat.c vfs: missed source of ->f_pos races 2012-08-20 10:11:47 -07:00
dcache.c vfs: dcache: fix deadlock in tree traversal 2012-09-29 17:41:40 -07:00
dcookies.c
direct-io.c block: move down direct IO plugging 2012-08-09 15:23:09 +02:00
drop_caches.c
eventfd.c eventfd: change int to __u64 in eventfd_signal() 2012-05-31 17:49:32 -07:00
eventpoll.c eventpoll: use-after-possible-free in epoll_create1() 2012-08-22 10:26:55 -04:00
exec.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
fcntl.c c/r: fcntl: add F_GETOWNER_UIDS option 2012-07-30 17:25:21 -07:00
fhandle.c
fifo.c fifo: Do not restart open() if it already found a partner 2012-07-16 08:33:14 -07:00
file_table.c fs: Add freezing handling to mnt_want_write() / mnt_drop_write() 2012-07-31 09:40:38 +04:00
file.c Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-03-29 18:12:23 -07:00
filesystems.c
fs_struct.c get rid of ->mnt_longterm 2012-07-14 16:32:47 +04:00
fs-writeback.c mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads 2012-07-31 18:42:40 -07:00
generic_acl.c
inode.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
internal.h fs: Add freezing handling to mnt_want_write() / mnt_drop_write() 2012-07-31 09:40:38 +04:00
ioctl.c
ioprio.c Merge branch 'for-3.5/core' of git://git.kernel.dk/linux-block 2012-05-30 08:52:42 -07:00
Kconfig
Kconfig.binfmt C6X: add support to build with BINFMT_ELF_FDPIC 2012-05-15 09:17:34 -04:00
libfs.c VFS: Pass mount flags to sget() 2012-07-14 16:38:34 +04:00
locks.c locks: remove unused lm_release_private 2012-08-01 09:01:46 -07:00
Makefile
mbcache.c
mount.h get rid of magic in proc_namespace.c 2012-07-14 16:32:48 +04:00
mpage.c
namei.c fs: fix fs/namei.c kernel-doc warnings 2012-08-22 10:30:10 -04:00
namespace.c do_add_mount()/umount -l races 2012-09-22 20:48:18 -04:00
no-block.c
open.c vfs: canonicalize create mode in build_open_flags() 2012-08-15 13:01:24 +02:00
pipe.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-08-01 10:26:23 -07:00
pnode.c VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors 2012-07-14 16:37:27 +04:00
pnode.h
posix_acl.c
proc_namespace.c get rid of magic in proc_namespace.c 2012-07-14 16:32:48 +04:00
read_write.c vfs: allow custom EOF in generic_file_llseek code 2012-07-23 00:00:15 +04:00
read_write.h
readdir.c switch readdir/getdents to fget_light/fput_light 2012-05-29 23:28:29 -04:00
select.c posix_types.h: Cleanup stale __NFDBITS and related definitions 2012-07-26 13:36:43 -07:00
seq_file.c seq_file: Add seq_vprintf function and export it 2012-06-11 13:16:35 +01:00
signalfd.c switch signalfd4() to fget_light/fput_light 2012-05-29 23:28:30 -04:00
splice.c fs: Protect write paths by sb_start_write - sb_end_write 2012-07-31 09:45:47 +04:00
stack.c
stat.c vfs: make O_PATH file descriptors usable for 'fstat()' 2012-09-14 14:48:21 -07:00
statfs.c switch statfs to fget_light/fput_light 2012-05-29 23:28:31 -04:00
super.c vfs: kill write_super and sync_supers 2012-08-04 01:24:44 +04:00
sync.c vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes 2012-07-22 23:59:01 +04:00
timerfd.c
utimes.c switch utimes() to fget_light/fput_light 2012-05-29 23:28:32 -04:00
xattr_acl.c
xattr.c fs/xattr.c:getxattr(): improve handling of allocation failures 2012-07-30 17:25:11 -07:00