linux/fs
Peter Zijlstra ada723dcd6 fs/super.c: add lockdep annotation to s_umount
Li Zefan said:

Thread 1:
  for ((; ;))
  {
      mount -t cpuset xxx /mnt > /dev/null 2>&1
      cat /mnt/cpus > /dev/null 2>&1
      umount /mnt > /dev/null 2>&1
  }

Thread 2:
  for ((; ;))
  {
      mount -t cpuset xxx /mnt > /dev/null 2>&1
      umount /mnt > /dev/null 2>&1
  }

(Note: It is irrelevant which cgroup subsys is used.)

After a while a lockdep warning showed up:

=============================================
[ INFO: possible recursive locking detected ]
2.6.28 #479
---------------------------------------------
mount/13554 is trying to acquire lock:
 (&type->s_umount_key#19){--..}, at: [<c049d888>] sget+0x5e/0x321

but task is already holding lock:
 (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321

other info that might help us debug this:
1 lock held by mount/13554:
 #0:  (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321

stack backtrace:
Pid: 13554, comm: mount Not tainted 2.6.28-mc #479
Call Trace:
 [<c044ad2e>] validate_chain+0x4c6/0xbbd
 [<c044ba9b>] __lock_acquire+0x676/0x700
 [<c044bb82>] lock_acquire+0x5d/0x7a
 [<c049d888>] ? sget+0x5e/0x321
 [<c061b9b8>] down_write+0x34/0x50
 [<c049d888>] ? sget+0x5e/0x321
 [<c049d888>] sget+0x5e/0x321
 [<c045a2e7>] ? cgroup_set_super+0x0/0x3e
 [<c045959f>] ? cgroup_test_super+0x0/0x2f
 [<c045bcea>] cgroup_get_sb+0x98/0x2e7
 [<c045cfb6>] cpuset_get_sb+0x4a/0x5f
 [<c049dfa4>] vfs_kern_mount+0x40/0x7b
 [<c049e02d>] do_kern_mount+0x37/0xbf
 [<c04af4a0>] do_mount+0x5c3/0x61a
 [<c04addd2>] ? copy_mount_options+0x2c/0x111
 [<c04af560>] sys_mount+0x69/0xa0
 [<c0403251>] sysenter_do_call+0x12/0x31

The cause is after alloc_super() and then retry, an old entry in list
fs_supers is found, so grab_super(old) is called, but both functions hold
s_umount lock:

struct super_block *sget(...)
{
	...
retry:
	spin_lock(&sb_lock);
	if (test) {
		list_for_each_entry(old, &type->fs_supers, s_instances) {
			if (!test(old, data))
				continue;
			if (!grab_super(old))  <--- 2nd: down_write(&old->s_umount);
				goto retry;
			if (s)
				destroy_super(s);
			return old;
		}
	}
	if (!s) {
		spin_unlock(&sb_lock);
		s = alloc_super(type);   <--- 1th: down_write(&s->s_umount)
		if (!s)
			return ERR_PTR(-ENOMEM);
		goto retry;
	}
	...
}

It seems like a false positive, and seems like VFS but not cgroup needs to
be fixed.

Peter said:

We can simply put the new s_umount instance in a but lockdep doesn't
particularly cares about subclass order.

If there's any issue with the callers of sget() assuming the s_umount lock
being of sublcass 0, then there is another annotation we can use to fix
that, but lets not bother with that if this is sufficient.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12673

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Paul Menage <menage@google.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:55 -08:00
..
9p fs/Kconfig: move 9p out 2009-01-22 13:16:01 +03:00
adfs fs/Kconfig: move adfs out 2009-01-22 13:15:56 +03:00
affs fs/Kconfig: move affs out 2009-01-22 13:15:56 +03:00
afs fs/Kconfig: move afs out 2009-01-22 13:16:01 +03:00
autofs fs/Kconfig: move autofs, autofs4 out 2009-01-22 13:15:54 +03:00
autofs4 fs/Kconfig: move autofs, autofs4 out 2009-01-22 13:15:54 +03:00
befs fs/Kconfig: move befs out 2009-01-22 13:15:57 +03:00
bfs fs/Kconfig: move bfs out 2009-01-22 13:15:57 +03:00
btrfs Btrfs: hold trans_mutex when using btrfs_record_root_in_trans 2009-02-12 14:14:53 -05:00
cifs cifs: make sure we allocate enough storage for socket address 2009-01-29 03:32:13 +00:00
coda fs/Kconfig: move coda out 2009-01-22 13:16:01 +03:00
configfs Revert "configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()" 2009-02-04 09:46:25 -08:00
cramfs fs/Kconfig: move cramfs out 2009-01-22 13:15:58 +03:00
debugfs debugfs: add helpers for exporting a size_t simple value 2009-01-07 10:00:16 -08:00
devpts zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
dlm dlm: initialize file_lock struct in GETLK before copying conflicting lock 2009-01-21 15:28:45 -06:00
ecryptfs eCryptfs: Regression in unencrypted filename symlinks 2009-02-06 18:36:40 -08:00
efs fs/Kconfig: move efs out 2009-01-22 13:15:57 +03:00
exportfs Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
ext2 ext2/xip: refuse to change xip flag during remount with busy inodes 2009-02-11 14:25:36 -08:00
ext3 ext3: revert "ext3: wait on all pending commits in ext3_sync_fs" 2009-02-11 14:25:35 -08:00
ext4 ext4: Fix NULL dereference in ext4_ext_migrate()'s error handling 2009-02-15 20:02:19 -05:00
fat fs/Kconfig: move fat out 2009-01-22 13:15:55 +03:00
freevxfs fs/Kconfig: move vxfs out 2009-01-22 13:15:58 +03:00
fuse Merge branch 'Kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/misc 2009-01-26 10:08:50 -08:00
gfs2 filesystem freeze: add error handling of write_super_lockfs/unlockfs 2009-01-09 16:54:42 -08:00
hfs fs/Kconfig: move hfs, hfsplus out 2009-01-22 13:15:57 +03:00
hfsplus fs/Kconfig: move hfs, hfsplus out 2009-01-22 13:15:57 +03:00
hostfs fs: symlink write_begin allocation context fix 2009-01-04 13:33:20 -08:00
hpfs fs/Kconfig: move hpfs out 2009-01-22 13:15:59 +03:00
hppfs
hugetlbfs Do not account for the address space used by hugetlbfs using VM_ACCOUNT 2009-02-10 10:48:42 -08:00
isofs fs/Kconfig: move iso9660, udf out 2009-01-22 13:15:55 +03:00
jbd jbd: fix return value of journal_start_commit() 2009-02-11 14:25:35 -08:00
jbd2 jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() 2009-02-10 11:15:34 -05:00
jffs2 [JFFS2] remove junk prototypes 2009-01-09 21:05:21 +00:00
jfs fs/Kconfig: move jfs out 2009-01-22 13:15:54 +03:00
lockd lockd: fix regression in lockd's handling of blocked locks 2009-02-09 13:19:46 -05:00
minix fs/Kconfig: move minix out 2009-01-22 13:15:58 +03:00
ncpfs fs/Kconfig: move the rest of ncpfs out 2009-01-22 13:16:01 +03:00
nfs fs/Kconfig: move nfs out 2009-01-22 13:16:00 +03:00
nfs_common SUNRPC: nfsacl_encode/nfsacl_decode should be exported as GPL-only 2008-12-23 15:21:32 -05:00
nfsd nfsd: only set file_lock.fl_lmops in nfsd4_lockt if a stateowner is found 2009-01-27 17:26:59 -05:00
nls
notify inotify: clean up inotify_read and fix locking problems 2009-01-26 10:08:05 -08:00
ntfs fs/Kconfig: move ntfs out 2009-01-22 13:15:55 +03:00
ocfs2 jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() 2009-02-10 11:15:34 -05:00
omfs fs/Kconfig: move omfs out 2009-01-22 13:15:58 +03:00
openpromfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
partitions block: fix bug in ptbl lookup cache 2009-01-09 21:46:13 +01:00
proc Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommu 2009-01-09 14:00:58 -08:00
qnx4 fs/Kconfig: move qnx4 out 2009-01-22 13:15:59 +03:00
ramfs NOMMU: Fix cleanup handling in ramfs_nommu_get_umapped_area() 2009-01-08 12:04:46 +00:00
reiserfs fs/Kconfig: move reiserfs out 2009-01-22 13:15:53 +03:00
romfs fs/Kconfig: move romfs out 2009-01-22 13:15:59 +03:00
smbfs fs/Kconfig: move smbfs out 2009-01-22 13:16:01 +03:00
squashfs fs/Kconfig: move squashfs out 2009-01-22 13:15:58 +03:00
sysfs Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 2009-01-26 10:40:28 -08:00
sysv fs/Kconfig: move sysv out 2009-01-22 13:15:59 +03:00
ubifs UBIFS: remove fast unmounting 2009-01-29 16:34:30 +02:00
udf fs/Kconfig: move iso9660, udf out 2009-01-22 13:15:55 +03:00
ufs fs/Kconfig: move ufs out 2009-01-22 13:16:00 +03:00
xfs [XFS] Warn on transaction in flight on read-only remount 2009-02-03 11:04:54 -06:00
aio.c [CVE-2009-0029] System call wrappers part 16 2009-01-14 14:15:25 +01:00
anon_inodes.c anon_inodes: use fops->owner for module refcount 2008-12-31 16:55:44 +02:00
attr.c
bad_inode.c kill ->dir_notify() 2008-12-31 18:07:43 -05:00
binfmt_aout.c sanitize ifdefs in binfmt_aout 2009-01-03 11:45:54 -08:00
binfmt_elf_fdpic.c FDPIC: Don't attempt to expand the userspace stack to fill the space allocated 2009-01-08 12:04:47 +00:00
binfmt_elf.c elf core dump: fix get_user use 2009-02-06 17:34:07 -08:00
binfmt_em86.c
binfmt_flat.c FLAT: Don't attempt to expand the userspace stack to fill the space allocated 2009-01-08 12:04:47 +00:00
binfmt_misc.c fs/binfmt_misc.c: add terminating newline to /proc/sys/fs/binfmt_misc/status 2009-01-06 15:59:19 -08:00
binfmt_script.c
binfmt_som.c
bio-integrity.c block: Remove obsolete BUG_ON 2009-01-30 12:34:36 +01:00
bio.c [SCSI] block: make blk_rq_map_user take a NULL user-space buffer for WRITE 2009-01-02 11:10:35 -06:00
block_dev.c filesystem freeze: implement generic freeze feature 2009-01-09 16:54:42 -08:00
buffer.c mm: task dirty accounting fix 2009-02-18 15:37:54 -08:00
char_dev.c fs: fix name overwrite in __register_chrdev_region() 2009-01-06 15:59:13 -08:00
compat_binfmt_elf.c
compat_ioctl.c braino in sg_ioctl_trans() 2009-02-05 16:35:52 -08:00
compat.c CRED: Fix SUID exec regression 2009-02-07 08:46:18 +11:00
dcache.c [CVE-2009-0029] System call wrappers part 20 2009-01-14 14:15:26 +01:00
dcookies.c [CVE-2009-0029] System call wrapper special cases 2009-01-14 14:15:18 +01:00
direct-io.c fs: truncate blocks outside i_size after O_DIRECT write error 2009-01-06 15:59:06 -08:00
dquot.c quota: Improve locking 2009-01-16 18:02:10 +01:00
drop_caches.c
eventfd.c [CVE-2009-0029] System call wrappers part 32 2009-01-14 14:15:31 +01:00
eventpoll.c epoll: drop max_user_instances and rely only on max_user_watches 2009-01-29 18:04:45 -08:00
exec.c CRED: Fix SUID exec regression 2009-02-07 08:46:18 +11:00
fcntl.c [CVE-2009-0029] System call wrappers part 15 2009-01-14 14:15:24 +01:00
fifo.c
file_table.c filp_cachep can be static in fs/file_table.c 2008-12-31 18:07:42 -05:00
file.c
filesystems.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
fs-writeback.c fs: sys_sync fix 2009-01-06 15:59:09 -08:00
generic_acl.c
inode.c partial revert of asynchronous inode delete 2009-01-09 13:15:49 -08:00
internal.h CRED: Fix SUID exec regression 2009-02-07 08:46:18 +11:00
ioctl.c [CVE-2009-0029] System call wrappers part 15 2009-01-14 14:15:24 +01:00
ioprio.c [CVE-2009-0029] System call wrappers part 28 2009-01-14 14:15:30 +01:00
Kconfig fs/Kconfig: move 9p out 2009-01-22 13:16:01 +03:00
Kconfig.binfmt CORE_DUMP_DEFAULT_ELF_HEADERS depends on ELF_CORE 2009-01-09 16:54:41 -08:00
libfs.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-01-05 18:32:06 -08:00
locks.c [CVE-2009-0029] System call wrappers part 16 2009-01-14 14:15:25 +01:00
Makefile Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus 2009-01-09 15:18:49 -08:00
mbcache.c
mpage.c do_mpage_readpage(): remove useless clear_buffer_mapped() call 2009-01-06 15:59:01 -08:00
namei.c [CVE-2009-0029] System call wrappers part 29 2009-01-14 14:15:30 +01:00
namespace.c Fix incomplete __mntput locking 2009-02-17 14:02:08 -08:00
nfsctl.c [CVE-2009-0029] System call wrappers part 27 2009-01-14 14:15:29 +01:00
no-block.c
open.c [CVE-2009-0029] System call wrappers part 30 2009-01-14 14:15:30 +01:00
pipe.c [CVE-2009-0029] System call wrappers part 33 2009-01-14 14:15:32 +01:00
pnode.c
pnode.h
posix_acl.c
quota_tree.c quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
quota_tree.h quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
quota_v1.c quota: Move quotaio_v[12].h from include/linux/ to fs/ 2009-01-05 08:36:58 -08:00
quota_v2.c quota: Convert union in mem_dqinfo to a pointer 2009-01-05 08:40:21 -08:00
quota.c [CVE-2009-0029] System call wrappers part 20 2009-01-14 14:15:26 +01:00
quotaio_v1.h quota: Move quotaio_v[12].h from include/linux/ to fs/ 2009-01-05 08:36:58 -08:00
quotaio_v2.h quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
read_write.c [CVE-2009-0029] System call wrappers part 20 2009-01-14 14:15:26 +01:00
read_write.h
readdir.c [CVE-2009-0029] System call wrappers part 32 2009-01-14 14:15:31 +01:00
select.c [CVE-2009-0029] System call wrappers part 32 2009-01-14 14:15:31 +01:00
seq_file.c seq_file: properly cope with pread 2009-02-18 15:37:53 -08:00
signalfd.c [CVE-2009-0029] System call wrappers part 31 2009-01-14 14:15:31 +01:00
splice.c [CVE-2009-0029] System call wrappers part 31 2009-01-14 14:15:31 +01:00
stack.c
stat.c [CVE-2009-0029] System call wrappers part 30 2009-01-14 14:15:30 +01:00
super.c fs/super.c: add lockdep annotation to s_umount 2009-02-18 15:37:55 -08:00
sync.c [CVE-2009-0029] System call wrappers part 09 2009-01-14 14:15:21 +01:00
timerfd.c timerfd: add flags check 2009-02-18 15:37:53 -08:00
utimes.c [CVE-2009-0029] System call wrappers part 30 2009-01-14 14:15:30 +01:00
xattr_acl.c
xattr.c [CVE-2009-0029] System call wrappers part 13 2009-01-14 14:15:23 +01:00