linux/fs
Jerry Hoemann 6424babfd6 fsnotify: next_i is freed during fsnotify_unmount_inodes.
During file system stress testing on 3.10 and 3.12 based kernels, the
umount command occasionally hung in fsnotify_unmount_inodes in the
section of code:

                spin_lock(&inode->i_lock);
                if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
                        spin_unlock(&inode->i_lock);
                        continue;
                }

As this section of code holds the global inode_sb_list_lock, eventually
the system hangs trying to acquire the lock.

Multiple crash dumps showed:

The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point
back at itself.  As this is not the value of list upon entry to the
function, the kernel never exits the loop.

To help narrow down problem, the call to list_del_init in
inode_sb_list_del was changed to list_del.  This poisons the pointers in
the i_sb_list and causes a kernel to panic if it transverse a freed
inode.

Subsequent stress testing paniced in fsnotify_unmount_inodes at the
bottom of the list_for_each_entry_safe loop showing next_i had become
free.

We believe the root cause of the problem is that next_i is being freed
during the window of time that the list_for_each_entry_safe loop
temporarily releases inode_sb_list_lock to call fsnotify and
fsnotify_inode_delete.

The code in fsnotify_unmount_inodes attempts to prevent the freeing of
inode and next_i by calling __iget.  However, the code doesn't do the
__iget call on next_i

	if i_count == 0 or
	if i_state & (I_FREEING | I_WILL_FREE)

The patch addresses this issue by advancing next_i in the above two cases
until we either find a next_i which we can __iget or we reach the end of
the list.  This makes the handling of next_i more closely match the
handling of the variable "inode."

The time to reproduce the hang is highly variable (from hours to days.) We
ran the stress test on a 3.10 kernel with the proposed patch for a week
without failure.

During list_for_each_entry_safe, next_i is becoming free causing
the loop to never terminate.  Advance next_i in those cases where
__iget is not done.

Signed-off-by: Jerry Hoemann <jerry.hoemann@hp.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ken Helias <kenhelias@firemail.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-29 16:33:14 -07:00
..
9p 9p: switch to %p[dD] 2014-10-09 02:39:04 -04:00
adfs adfs: add __printf verification, fix format/argument mismatches 2014-08-08 15:57:24 -07:00
affs fs/affs: remove redundant sys_tz declarations 2014-10-14 02:18:22 +02:00
afs Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 16:23:15 +02:00
autofs4 autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode. 2014-10-14 02:18:16 +02:00
befs fs/befs/btree.c: remove typedef befs_btree_node 2014-10-14 02:18:20 +02:00
bfs fs/bfs: use bfs prefix for dump_imap 2014-08-08 15:57:24 -07:00
btrfs vfs: export check_sticky() 2014-10-24 00:14:36 +02:00
cachefiles FS-Cache fixes 2014-10-14 08:40:15 +02:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2014-10-15 06:46:01 +02:00
cifs [CIFS] Remove obsolete comment 2014-10-17 17:17:12 -05:00
coda fs/coda: use linux/uaccess.h 2014-08-08 15:57:20 -07:00
configfs
cramfs fs/cramfs/inode.c: use linux/uaccess.h 2014-08-08 15:57:25 -07:00
debugfs
devpts
dlm dlm: fix missing endian conversion of rcom_status flags 2014-10-14 15:11:48 -05:00
ecryptfs fs: limit filesystem stacking depth 2014-10-24 00:14:39 +02:00
efivarfs
efs fs/efs/namei.c: return is not a function 2014-08-08 15:57:18 -07:00
exofs Boaz Harrosh - Fix broken email address 2014-10-19 20:22:32 +03:00
exportfs
ext2 percpu_counter: add @gfp to percpu_counter_init() 2014-09-08 09:51:29 +09:00
ext3 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2014-10-11 08:02:31 -04:00
ext4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-10-26 11:19:18 -07:00
f2fs f2fs: support volatile operations for transient data 2014-10-07 11:54:41 -07:00
fat fat: remove redundant sys_tz declaration 2014-10-14 02:18:20 +02:00
freevxfs
fscache fs/fscache/object-list.c: use __seq_open_private() 2014-10-13 17:52:21 +01:00
fuse vfs: Make d_invalidate return void 2014-10-09 02:38:57 -04:00
gfs2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-10-13 11:28:42 +02:00
hfs fs/hfs/hfs_fs.h: remove redundant sys_tz declaration 2014-10-14 02:18:20 +02:00
hfsplus
hostfs hostfs: support rename flags 2014-08-07 14:40:09 -04:00
hpfs fs/hpfs/dnode.c: fix suspect code indent 2014-08-08 15:57:22 -07:00
hppfs
hugetlbfs
isofs isofs: replace strnicmp with strncasecmp 2014-10-14 02:18:24 +02:00
jbd jbd/jbd2: use non-movable memory for the jbd superblock 2014-09-04 22:36:35 -04:00
jbd2 jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list 2014-09-18 00:58:12 -04:00
jffs2 [jffs2] kill wbuf_queued/wbuf_dwork_lock 2014-10-09 02:39:01 -04:00
jfs Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 16:23:15 +02:00
kernfs vfs: Remove unnecessary calls of check_submounts_and_drop 2014-10-09 02:38:56 -04:00
lockd File locking related changes for v3.18 (pile #1) 2014-10-11 13:21:34 -04:00
logfs fs/logfs/readwrite.c: kernel-doc warning fixes 2014-08-06 18:01:12 -07:00
minix minix zmap block counts calculation fix 2014-08-08 15:57:20 -07:00
ncpfs fs/ncpfs/dir.c: remove redundant sys_tz declaration 2014-10-14 02:18:16 +02:00
nfs Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd 2014-10-21 12:53:45 -07:00
nfs_common lockd: move lockd's grace period handling into its own module 2014-09-17 16:33:11 -04:00
nfsd nfsd4: fix crash on unknown operation number 2014-10-23 13:39:51 -04:00
nilfs2 nilfs2: improve the performance of fdatasync() 2014-10-14 02:18:20 +02:00
nls
notify fsnotify: next_i is freed during fsnotify_unmount_inodes. 2014-10-29 16:33:14 -07:00
ntfs NTFS: Bump version to 2.1.31. 2014-10-16 12:53:35 +01:00
ocfs2 ocfs2: replace strnicmp with strncasecmp 2014-10-14 02:18:24 +02:00
omfs FS/OMFS: block number sanity check during fill_super operation 2014-10-14 02:18:22 +02:00
openpromfs
overlayfs overlayfs: embed middle into overlay_readdir_data 2014-10-24 20:25:23 -04:00
proc mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared 2014-10-14 02:18:28 +02:00
pstore pstore: Fix duplicate {console,ftrace}-efi entries 2014-10-15 13:51:33 -07:00
qnx4
qnx6 fs/qnx6: update debugging to current functions 2014-08-08 15:57:26 -07:00
quota percpu_counter: add @gfp to percpu_counter_init() 2014-09-08 09:51:29 +09:00
ramfs fs/ramfs/file-nommu.c: replace count*size kzalloc by kcalloc 2014-08-08 15:57:18 -07:00
reiserfs fs/reiserfs/journal.c: fix sparse context imbalance warning 2014-10-14 02:18:20 +02:00
romfs fs/romfs/super.c: add blank line after declarations 2014-08-08 15:57:25 -07:00
squashfs fs/squashfs/super.c: logging cleanup 2014-08-06 18:01:13 -07:00
sysfs
sysv
ubifs UBIFS: Fix trivial typo in power_cut_emulated() 2014-09-30 09:29:44 +03:00
udf udf: Fix loading of special inodes 2014-10-09 13:06:14 +02:00
ufs fs/ufs/balloc.c: remove unused variable 2014-10-14 02:18:20 +02:00
xfs Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block 2014-10-18 11:53:51 -07:00
aio.c percpu_ref: add PERCPU_REF_INIT_* flags 2014-09-24 13:31:50 -04:00
anon_inodes.c
attr.c
bad_inode.c bad_inode: add ->rename2() 2014-08-07 14:40:09 -04:00
binfmt_aout.c handle suicide on late failure exits in execve() in search_binary_handler() 2014-10-09 02:39:00 -04:00
binfmt_elf_fdpic.c handle suicide on late failure exits in execve() in search_binary_handler() 2014-10-09 02:39:00 -04:00
binfmt_elf.c handle suicide on late failure exits in execve() in search_binary_handler() 2014-10-09 02:39:00 -04:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c binfmt_misc: work around gcc-4.9 warning 2014-10-14 02:18:16 +02:00
binfmt_script.c
binfmt_som.c
block_dev.c Merge branch 'for-3.18/core' of git://git.kernel.dk/linux-block 2014-10-18 11:53:51 -07:00
buffer.c A large number of cleanups and bug fixes, with some (minor) journal 2014-10-20 09:50:11 -07:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c
compat.c vfs: move getname() from callers to do_mount() 2014-10-09 02:39:16 -04:00
coredump.c coredump: add %i/%I in core_pattern to report the tid of the crashed thread 2014-10-14 02:18:21 +02:00
dcache.c fix inode leaks on d_splice_alias() failure exits 2014-10-23 22:30:18 -04:00
dcookies.c
direct-io.c fuse: honour max_read and max_write in direct_io mode 2014-09-26 21:16:51 -04:00
drop_caches.c
eventfd.c
eventpoll.c eventpoll: fix uninitialized variable in epoll_ctl 2014-09-10 15:42:12 -07:00
exec.c handle suicide on late failure exits in execve() in search_binary_handler() 2014-10-09 02:39:00 -04:00
fcntl.c security: make security_file_set_fowner, f_setown and __f_setown void return 2014-09-09 16:01:36 -04:00
fhandle.c
file_table.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-10-13 11:28:42 +02:00
file.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-10-13 15:44:12 +02:00
filesystems.c
fs_pin.c make fs/{namespace,super}.c forget about acct.h 2014-08-07 14:40:09 -04:00
fs_struct.c
fs-writeback.c
inode.c mm: allow drivers to prevent new writable mappings 2014-08-08 15:57:31 -07:00
internal.h vfs: export __inode_permission() to modules 2014-10-24 00:14:35 +02:00
ioctl.c
Kconfig overlay filesystem 2014-10-24 00:14:38 +02:00
Kconfig.binfmt
libfs.c locks: plumb a "priv" pointer into the setlease routines 2014-10-07 14:06:12 -04:00
locks.c locks: flock_make_lock should return a struct file_lock (or PTR_ERR) 2014-10-07 14:06:13 -04:00
Makefile overlay filesystem 2014-10-24 00:14:38 +02:00
mbcache.c
mount.h vfs: Add a function to lazily unmount all mounts from any dentry. 2014-10-09 02:38:55 -04:00
mpage.c vfs: guard end of device for mpage interface 2014-10-09 22:25:53 -04:00
namei.c vfs: add RENAME_WHITEOUT 2014-10-24 00:14:37 +02:00
namespace.c vfs: introduce clone_private_mount() 2014-10-24 00:14:36 +02:00
no-block.c
open.c vfs: add i_op->dentry_open() 2014-10-24 00:14:35 +02:00
pipe.c
pnode.c get rid of propagate_umount() mistakenly treating slaves as busy. 2014-08-30 18:31:41 -04:00
pnode.h
posix_acl.c
proc_namespace.c
read_write.c cachefiles_write_page(): switch to __kernel_write() 2014-10-09 02:39:05 -04:00
readdir.c
select.c
seq_file.c
signalfd.c
splice.c vfs: export do_splice_direct() to modules 2014-10-24 00:14:35 +02:00
stack.c fs: fix comment for 'CONFIG_LBADF' 2014-08-26 09:35:56 +02:00
stat.c
statfs.c
super.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-10-13 11:28:42 +02:00
sync.c Export sync_filesystem() for modular ->remount_fs() use 2014-09-05 08:16:21 -07:00
timerfd.c timerfd: Remove an always true check 2014-08-27 11:17:48 +02:00
utimes.c
xattr.c vfs: Deduplicate code shared by xattr system calls operating on paths 2014-10-12 17:09:10 -04:00