linux/fs/btrfs
Filipe Manana 28a235931b Btrfs: fix lockdep warning on deadlock against an inode's log mutex
Commit 44f714dae5 ("Btrfs: improve performance on fsync against new
inode after rename/unlink"), which landed in 4.8-rc2, introduced a
possibility for a deadlock due to double locking of an inode's log mutex
by the same task, which lockdep reports with:

[23045.433975] =============================================
[23045.434748] [ INFO: possible recursive locking detected ]
[23045.435426] 4.7.0-rc6-btrfs-next-34+ #1 Not tainted
[23045.436044] ---------------------------------------------
[23045.436044] xfs_io/3688 is trying to acquire lock:
[23045.436044]  (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]
               but task is already holding lock:
[23045.436044]  (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]
               other info that might help us debug this:
[23045.436044]  Possible unsafe locking scenario:

[23045.436044]        CPU0
[23045.436044]        ----
[23045.436044]   lock(&ei->log_mutex);
[23045.436044]   lock(&ei->log_mutex);
[23045.436044]
                *** DEADLOCK ***

[23045.436044]  May be due to missing lock nesting notation

[23045.436044] 3 locks held by xfs_io/3688:
[23045.436044]  #0:  (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffffa035f2ae>] btrfs_sync_file+0x14e/0x425 [btrfs]
[23045.436044]  #1:  (sb_internal#2){.+.+.+}, at: [<ffffffff8118446b>] __sb_start_write+0x5f/0xb0
[23045.436044]  #2:  (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]
               stack backtrace:
[23045.436044] CPU: 4 PID: 3688 Comm: xfs_io Not tainted 4.7.0-rc6-btrfs-next-34+ #1
[23045.436044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[23045.436044]  0000000000000000 ffff88022f5f7860 ffffffff8127074d ffffffff82a54b70
[23045.436044]  ffffffff82a54b70 ffff88022f5f7920 ffffffff81092897 ffff880228015d68
[23045.436044]  0000000000000000 ffffffff82a54b70 ffffffff829c3f00 ffff880228015d68
[23045.436044] Call Trace:
[23045.436044]  [<ffffffff8127074d>] dump_stack+0x67/0x90
[23045.436044]  [<ffffffff81092897>] __lock_acquire+0xcbb/0xe4e
[23045.436044]  [<ffffffff8109155f>] ? mark_lock+0x24/0x201
[23045.436044]  [<ffffffff8109179a>] ? mark_held_locks+0x5e/0x74
[23045.436044]  [<ffffffff81092de0>] lock_acquire+0x12f/0x1c3
[23045.436044]  [<ffffffff81092de0>] ? lock_acquire+0x12f/0x1c3
[23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffff814a51a4>] mutex_lock_nested+0x77/0x3a7
[23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffffa039705e>] ? btrfs_release_delayed_node+0xb/0xd [btrfs]
[23045.436044]  [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffff810a0ed1>] ? vprintk_emit+0x453/0x465
[23045.436044]  [<ffffffffa0385a61>] btrfs_log_inode+0x66e/0xc95 [btrfs]
[23045.436044]  [<ffffffffa03c084d>] log_new_dir_dentries+0x26c/0x359 [btrfs]
[23045.436044]  [<ffffffffa03865aa>] btrfs_log_inode_parent+0x4a6/0x628 [btrfs]
[23045.436044]  [<ffffffffa0387552>] btrfs_log_dentry_safe+0x5a/0x75 [btrfs]
[23045.436044]  [<ffffffffa035f464>] btrfs_sync_file+0x304/0x425 [btrfs]
[23045.436044]  [<ffffffff811acaf4>] vfs_fsync_range+0x8c/0x9e
[23045.436044]  [<ffffffff811acb22>] vfs_fsync+0x1c/0x1e
[23045.436044]  [<ffffffff811acc79>] do_fsync+0x31/0x4a
[23045.436044]  [<ffffffff811ace99>] SyS_fsync+0x10/0x14
[23045.436044]  [<ffffffff814a88e5>] entry_SYSCALL_64_fastpath+0x18/0xa8
[23045.436044]  [<ffffffff8108f039>] ? trace_hardirqs_off_caller+0x3f/0xaa

An example reproducer for this is:

   $ mkfs.btrfs -f /dev/sdb
   $ mount /dev/sdb /mnt
   $ mkdir /mnt/dir
   $ touch /mnt/dir/foo
   $ sync
   $ mv /mnt/dir/foo /mnt/dir/bar
   $ touch /mnt/dir/foo
   $ xfs_io -c "fsync" /mnt/dir/bar

This is because while logging the inode of file bar we end up logging its
parent directory (since its inode has an unlink_trans field matching the
current transaction id due to the rename operation), which in turn logs
the inodes for all its new dentries, so that the new inode for the new
file named foo gets logged which in turn triggered another logging attempt
for the inode we are fsync'ing, since that inode had an old name that
corresponds to the name of the new inode.

So fix this by ensuring that when logging the inode for a new dentry that
has a name matching an old name of some other inode, we don't log again
the original inode that we are fsync'ing.

Fixes: 44f714dae5 ("Btrfs: improve performance on fsync against new inode after rename/unlink")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:32 -07:00
..
tests btrfs: tests, require fs_info for root 2016-07-26 13:53:18 +02:00
acl.c btrfs: Replace -ENOENT by -ERANGE in btrfs_get_acl() 2016-07-26 13:52:25 +02:00
async-thread.c btrfs: plumb fs_info into btrfs_work 2016-07-26 13:53:15 +02:00
async-thread.h btrfs: plumb fs_info into btrfs_work 2016-07-26 13:53:15 +02:00
backref.c btrfs: backref: Fix soft lockup in __merge_refs function 2016-08-25 03:58:16 -07:00
backref.h
btrfs_inode.h Merge branch 'cleanups-4.7' into for-chris-4.7-20160525 2016-05-25 22:51:03 +02:00
check-integrity.c btrfs: Use correct format specifier 2016-06-17 18:32:40 +02:00
check-integrity.h
compression.c Btrfs: fix BUG_ON in btrfs_submit_compressed_write 2016-07-26 13:52:25 +02:00
compression.h btrfs: move btrfs_compression_type to compression.h 2016-03-11 17:12:46 +01:00
ctree.c btrfs: btrfs_abort_transaction, drop root parameter 2016-07-26 13:54:26 +02:00
ctree.h btrfs: fix fsfreeze hang caused by delayed iputs deal 2016-08-25 03:58:26 -07:00
dedupe.h btrfs: expand cow_file_range() to support in-band dedup and subpage-blocksize 2016-07-26 13:52:25 +02:00
delayed-inode.c btrfs: btrfs_abort_transaction, drop root parameter 2016-07-26 13:54:26 +02:00
delayed-inode.h Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes 2016-06-25 06:20:10 -07:00
delayed-ref.c btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent() 2016-08-25 03:58:21 -07:00
delayed-ref.h Btrfs: remove unused function btrfs_add_delayed_qgroup_reserve() 2016-08-03 11:02:51 +01:00
dev-replace.c btrfs: btrfs_test_opt and friends should take a btrfs_fs_info 2016-07-26 13:53:16 +02:00
dev-replace.h btrfs: refactor btrfs_dev_replace_start for reuse 2016-04-28 10:59:13 +02:00
dir-item.c
disk-io.c Btrfs: detect corruption when non-root leaf has zero item 2016-08-25 03:58:31 -07:00
disk-io.h btrfs: don't create or leak aliased root while cleaning up orphans 2016-08-25 03:58:29 -07:00
export.c BTRFS: support NFSv2 export 2015-10-06 06:55:23 -07:00
export.h
extent_io.c Btrfs: fix eb memory leak due to readpage failure 2016-07-26 13:52:25 +02:00
extent_io.h btrfs: update btrfs_space_info's bytes_may_use timely 2016-08-25 03:58:26 -07:00
extent_map.c btrfs: Fix slab accounting flags 2016-07-26 13:52:25 +02:00
extent_map.h btrfs: cleanup, stop casting for extent_map->lookup everywhere 2016-01-15 19:22:28 +01:00
extent-tree.c Btrfs: fix em leak in find_first_block_group 2016-08-25 03:58:29 -07:00
file-item.c Btrfs: fix __MAX_CSUM_ITEMS 2016-08-03 14:08:37 -07:00
file.c Btrfs: fix lockdep warning on deadlock against an inode's log mutex 2016-08-25 03:58:32 -07:00
free-space-cache.c btrfs: btrfs_abort_transaction, drop root parameter 2016-07-26 13:54:26 +02:00
free-space-cache.h btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
free-space-tree.c btrfs: btrfs_abort_transaction, drop root parameter 2016-07-26 13:54:26 +02:00
free-space-tree.h Btrfs: implement the free space B-tree 2015-12-17 12:16:47 -08:00
hash.c btrfs: advertise which crc32c implementation is being used at module load 2016-06-06 14:08:28 +02:00
hash.h btrfs: advertise which crc32c implementation is being used at module load 2016-06-06 14:08:28 +02:00
inode-item.c btrfs: rename btrfs_std_error to btrfs_handle_fs_error 2016-04-28 10:36:54 +02:00
inode-map.c btrfs: update btrfs_space_info's bytes_may_use timely 2016-08-25 03:58:26 -07:00
inode-map.h Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots 2016-01-15 19:25:02 +01:00
inode.c btrfs: update btrfs_space_info's bytes_may_use timely 2016-08-25 03:58:26 -07:00
ioctl.c btrfs: waiting on qgroup rescan should not always be interruptible 2016-08-25 03:58:20 -07:00
Kconfig
locking.c btrfs: cleanup, remove stray return statements 2016-01-07 14:30:52 +01:00
locking.h
lzo.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00
Makefile Btrfs: add free space tree sanity tests 2015-12-17 12:16:47 -08:00
math.h
ordered-data.c btrfs: Fix slab accounting flags 2016-07-26 13:52:25 +02:00
ordered-data.h Btrfs: fix race setting block group readonly during device replace 2016-05-30 12:58:21 +01:00
orphan.c
print-tree.c btrfs: teach print_leaf about temporary item subtypes 2016-02-11 16:15:43 +01:00
print-tree.h
props.c btrfs: simpilify btrfs_subvol_inherit_props 2016-07-26 13:54:22 +02:00
props.h
qgroup.c btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent() 2016-08-25 03:58:21 -07:00
qgroup.h btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent() 2016-08-25 03:58:21 -07:00
raid56.c btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
raid56.h
rcu-string.h
reada.c Btrfs: fix race between readahead and device replace/removal 2016-05-30 12:58:18 +01:00
relocation.c btrfs: update btrfs_space_info's bytes_may_use timely 2016-08-25 03:58:26 -07:00
root-tree.c btrfs: don't create or leak aliased root while cleaning up orphans 2016-08-25 03:58:29 -07:00
scrub.c btrfs: plumb fs_info into btrfs_work 2016-07-26 13:53:15 +02:00
send.c Btrfs: send, don't bug on inconsistent snapshots 2016-08-01 07:31:41 +01:00
send.h Btrfs: use linux/sizes.h to represent constants 2016-01-07 14:38:02 +01:00
struct-funcs.c btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
super.c btrfs: fix fsfreeze hang caused by delayed iputs deal 2016-08-25 03:58:26 -07:00
sysfs.c btrfs: add missing bytes_readonly attribute file in sysfs 2016-07-26 13:52:25 +02:00
sysfs.h btrfs: sysfs: introduce helper for syncing bits with sysfs files 2016-01-21 18:50:40 +01:00
transaction.c btrfs: fix fsfreeze hang caused by delayed iputs deal 2016-08-25 03:58:26 -07:00
transaction.h btrfs: add btrfs_trans_handle->fs_info pointer 2016-07-26 13:54:26 +02:00
tree-defrag.c Btrfs: fix locking bugs when defragging leaves 2015-12-18 02:51:32 +00:00
tree-log.c Btrfs: fix lockdep warning on deadlock against an inode's log mutex 2016-08-25 03:58:32 -07:00
tree-log.h Btrfs: fix lockdep warning on deadlock against an inode's log mutex 2016-08-25 03:58:32 -07:00
ulist.c btrfs: fix string and comment grammatical issues and typos 2016-05-25 22:35:14 +02:00
ulist.h
uuid-tree.c
volumes.c btrfs: do not background blkdev_put() 2016-08-25 03:58:28 -07:00
volumes.h Merge branch 'foreign/jeffm/uapi' into for-chris-4.7-20160516 2016-05-16 15:46:29 +02:00
xattr.c switch xattr_handler->set() to passing dentry and inode separately 2016-05-27 15:39:43 -04:00
xattr.h btrfs: Switch to generic xattr handlers 2016-05-17 19:17:09 -04:00
zlib.c mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros 2016-04-04 10:41:08 -07:00