linux/kernel
Wu Fengguang 381a80e6df inotify: use GFP_NOFS in kernel_event() to work around a lockdep false-positive
There is what we believe to be a false positive reported by lockdep.

inotify_inode_queue_event() => take inotify_mutex => kernel_event() =>
kmalloc() => SLOB => alloc_pages_node() => page reclaim => slab reclaim =>
dcache reclaim => inotify_inode_is_dead => take inotify_mutex => deadlock

The plan is to fix this via lockdep annotation, but that is proving to be
quite involved.

The patch flips the allocation over to GFP_NFS to shut the warning up, for
the 2.6.30 release.

Hopefully we will fix this for real in 2.6.31.  I'll queue a patch in -mm
to switch it back to GFP_KERNEL so we don't forget.

  =================================
  [ INFO: inconsistent lock state ]
  2.6.30-rc2-next-20090417 #203
  ---------------------------------
  inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
  kswapd0/380 [HC0[0]:SC0[0]:HE1:SE1] takes:
   (&inode->inotify_mutex){+.+.?.}, at: [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
  {RECLAIM_FS-ON-W} state was registered at:
    [<ffffffff81079188>] mark_held_locks+0x68/0x90
    [<ffffffff810792a5>] lockdep_trace_alloc+0xf5/0x100
    [<ffffffff810f5261>] __kmalloc_node+0x31/0x1e0
    [<ffffffff81130652>] kernel_event+0xe2/0x190
    [<ffffffff81130826>] inotify_dev_queue_event+0x126/0x230
    [<ffffffff8112f096>] inotify_inode_queue_event+0xc6/0x110
    [<ffffffff8110444d>] vfs_create+0xcd/0x140
    [<ffffffff8110825d>] do_filp_open+0x88d/0xa20
    [<ffffffff810f6b68>] do_sys_open+0x98/0x140
    [<ffffffff810f6c50>] sys_open+0x20/0x30
    [<ffffffff8100c272>] system_call_fastpath+0x16/0x1b
    [<ffffffffffffffff>] 0xffffffffffffffff
  irq event stamp: 690455
  hardirqs last  enabled at (690455): [<ffffffff81564fe4>] _spin_unlock_irqrestore+0x44/0x80
  hardirqs last disabled at (690454): [<ffffffff81565372>] _spin_lock_irqsave+0x32/0xa0
  softirqs last  enabled at (690178): [<ffffffff81052282>] __do_softirq+0x202/0x220
  softirqs last disabled at (690157): [<ffffffff8100d50c>] call_softirq+0x1c/0x50

  other info that might help us debug this:
  2 locks held by kswapd0/380:
   #0:  (shrinker_rwsem){++++..}, at: [<ffffffff810d0bd7>] shrink_slab+0x37/0x180
   #1:  (&type->s_umount_key#17){++++..}, at: [<ffffffff8110cfbf>] shrink_dcache_memory+0x11f/0x1e0

  stack backtrace:
  Pid: 380, comm: kswapd0 Not tainted 2.6.30-rc2-next-20090417 #203
  Call Trace:
   [<ffffffff810789ef>] print_usage_bug+0x19f/0x200
   [<ffffffff81018bff>] ? save_stack_trace+0x2f/0x50
   [<ffffffff81078f0b>] mark_lock+0x4bb/0x6d0
   [<ffffffff810799e0>] ? check_usage_forwards+0x0/0xc0
   [<ffffffff8107b142>] __lock_acquire+0xc62/0x1ae0
   [<ffffffff810f478c>] ? slob_free+0x10c/0x370
   [<ffffffff8107c0a1>] lock_acquire+0xe1/0x120
   [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
   [<ffffffff81562d43>] mutex_lock_nested+0x63/0x420
   [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
   [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
   [<ffffffff81012fe9>] ? sched_clock+0x9/0x10
   [<ffffffff81077165>] ? lock_release_holdtime+0x35/0x1c0
   [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
   [<ffffffff8110c9dc>] dentry_iput+0xbc/0xe0
   [<ffffffff8110cb23>] d_kill+0x33/0x60
   [<ffffffff8110ce23>] __shrink_dcache_sb+0x2d3/0x350
   [<ffffffff8110cffa>] shrink_dcache_memory+0x15a/0x1e0
   [<ffffffff810d0cc5>] shrink_slab+0x125/0x180
   [<ffffffff810d1540>] kswapd+0x560/0x7a0
   [<ffffffff810ce160>] ? isolate_pages_global+0x0/0x2c0
   [<ffffffff81065a30>] ? autoremove_wake_function+0x0/0x40
   [<ffffffff8107953d>] ? trace_hardirqs_on+0xd/0x10
   [<ffffffff810d0fe0>] ? kswapd+0x0/0x7a0
   [<ffffffff8106555b>] kthread+0x5b/0xa0
   [<ffffffff8100d40a>] child_rip+0xa/0x20
   [<ffffffff8100cdd0>] ? restore_args+0x0/0x30
   [<ffffffff81065500>] ? kthread+0x0/0xa0
   [<ffffffff8100d400>] ? child_rip+0x0/0x20

[eparis@redhat.com: fix audit too]
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-06 16:36:09 -07:00
..
irq Revert "genirq: assert that irq handlers are indeed running in hardirq context" 2009-05-01 15:16:04 +02:00
power PM/Hibernate: Fix waiting for image device to appear on resume 2009-04-24 15:31:30 -07:00
time clockevents: prevent endless loop in tick_handle_periodic() 2009-05-02 10:22:27 +02:00
trace tracing: fix ref count in splice pages 2009-04-29 08:02:44 +02:00
.gitignore
acct.c
async.c async: remove the temporary (2.6.29) "async is off by default" code 2009-03-28 13:05:30 -07:00
audit_tree.c No need for crossing to mountpoint in audit_tag_tree() 2009-04-20 23:01:15 -04:00
audit.c Audit: remove spaces from audit_log_d_path 2009-04-05 13:49:04 -04:00
audit.h
auditfilter.c inotify: use GFP_NOFS in kernel_event() to work around a lockdep false-positive 2009-05-06 16:36:09 -07:00
auditsc.c Audit: remove spaces from audit_log_d_path 2009-04-05 13:49:04 -04:00
backtracetest.c
bounds.c
capability.c
cgroup_debug.c debug cgroup: remove unneeded cgroup_lock 2009-04-02 19:04:54 -07:00
cgroup_freezer.c
cgroup.c memcg: fix OOM killer under memcg 2009-04-02 19:04:55 -07:00
compat.c
configs.c
cpu.c cpumask: use set_cpu_active in init/main.c 2009-03-30 22:05:12 +10:30
cpuset.c cpusets: prevent PF_THREAD_BOUND tasks from attaching to non-root cpusets 2009-04-02 19:04:57 -07:00
cred-internals.h
cred.c
delayacct.c
dma-coherent.c
dma.c
exec_domain.c Get rid of indirect include of fs_struct.h 2009-03-31 23:00:27 -04:00
exit.c Merge branch 'irq/threaded' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-07 14:07:52 -07:00
extable.c Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-05 11:04:19 -07:00
fork.c Merge branches 'core-fixes-for-linus', 'irq-fixes-for-linus' and 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-09 10:35:30 -07:00
freezer.c
futex_compat.c
futex.c futex: comment requeue key reference semantics 2009-04-02 23:39:53 +02:00
hrtimer.c hrtimer: fix rq->lock inversion (again) 2009-03-31 14:52:52 +02:00
hung_task.c
itimer.c
kallsyms.c Ksplice: Add functions for walking kallsyms symbols 2009-03-31 13:05:32 +10:30
Kconfig.freezer
Kconfig.hz
Kconfig.preempt
kexec.c kexec: vmcoreinfo_data[] can become static 2009-04-02 19:05:04 -07:00
kfifo.c
kgdb.c
kmod.c module: create a request_module_nowait() 2009-03-31 13:05:35 +10:30
kprobes.c kprobes: support per-kprobe disabling 2009-04-07 08:31:08 -07:00
ksysfs.c
kthread.c kthread: move sched-realeted initialization from kthreadd context 2009-04-09 09:50:37 +09:30
latencytop.c
lockdep_internals.h
lockdep_proc.c
lockdep_states.h
lockdep.c lockdep: more robust lockdep_map init sequence 2009-04-17 18:00:00 +02:00
Makefile Merge branch 'linus' into core/softlockup 2009-04-07 11:15:40 +02:00
marker.c
module.c async: Fix module loading async-work regression 2009-04-11 12:44:49 -07:00
mutex-debug.c
mutex-debug.h
mutex.c mutex: have non-spinning mutexes on s390 by default 2009-04-09 19:28:24 +02:00
mutex.h
notifier.c
ns_cgroup.c cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups 2009-04-02 19:04:53 -07:00
nsproxy.c
panic.c locking: clarify kernel-taint warning message 2009-04-23 09:36:52 +02:00
params.c param: fix charp parameters set via sysfs 2009-03-31 13:05:30 +10:30
pid_namespace.c signals: zap_pid_ns_process() should use force_sig() 2009-04-02 19:04:58 -07:00
pid.c pids: refactor vnr/nr_ns helpers to make them safe 2009-04-02 19:05:02 -07:00
pm_qos_params.c
posix-cpu-timers.c kernel/posix-cpu-timers.c: fix sparse warning 2009-04-30 08:08:31 +02:00
posix-timers.c
printk.c Merge branch 'printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-05 10:23:25 -07:00
profile.c
ptrace.c ptrace: ptrace_attach: fix the usage of ->cred_exec_mutex 2009-04-27 20:30:51 +10:00
rcuclassic.c kmemtrace, rcu: fix linux/rcutree.h and linux/rcuclassic.h dependencies 2009-04-03 12:23:02 +02:00
rcupdate.c RCU: Don't try and predeclare inline funcs as it upsets some versions of gcc 2009-04-15 13:55:14 -07:00
rcupreempt_trace.c
rcupreempt.c kmemtrace, rcu: fix rcupreempt.c data structure dependencies 2009-04-03 12:23:04 +02:00
rcutorture.c cpumask: convert rcutorture.c 2009-03-30 22:05:16 +10:30
rcutree_trace.c rcu: Make hierarchical RCU less IPI-happy 2009-04-14 11:31:50 +02:00
rcutree.c rcu: Make hierarchical RCU less IPI-happy 2009-04-14 11:31:50 +02:00
rcutree.h kmemtrace, rcu: fix rcu_tree_trace.c data structure dependencies 2009-04-03 12:23:03 +02:00
relay.c Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-05 11:04:19 -07:00
res_counter.c
resource.c Remove 'recurse into child resources' logic from 'reserve_region_with_split()' 2009-04-18 21:44:24 -07:00
rtmutex_common.h
rtmutex-debug.c
rtmutex-debug.h
rtmutex-tester.c
rtmutex.c
rtmutex.h
rwsem.c
sched_clock.c Merge branch 'tracing/core-v2' into tracing-for-linus 2009-04-02 00:49:02 +02:00
sched_cpupri.c sched_rt: don't allocate cpumask in fastpath 2009-04-01 13:24:51 +02:00
sched_cpupri.h cpumask: remove cpumask_t from core 2009-03-30 22:05:17 +10:30
sched_debug.c sched: remove unused fields from struct rq 2009-03-24 23:16:51 +01:00
sched_fair.c
sched_features.h Merge branch 'locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-03-30 17:17:35 -07:00
sched_idletask.c
sched_rt.c Merge commit 'v2.6.30-rc1' into sched/urgent 2009-04-08 17:26:00 +02:00
sched_stats.h sched: remove unused fields from struct rq 2009-03-24 23:16:51 +01:00
sched.c sched: account system time properly 2009-04-29 15:02:28 +02:00
seccomp.c
semaphore.c
signal.c signals: SI_USER: Masquerade si_pid when crossing pid ns boundary 2009-04-02 19:04:58 -07:00
slow-work.c Delete slow-work timers properly 2009-04-24 07:47:59 -07:00
smp.c generic-ipi: eliminate WARN_ON()s during oops/panic 2009-03-13 10:47:34 +01:00
softirq.c kernel/softirq.c: fix sparse warning 2009-04-17 01:57:54 +02:00
softlockup.c
spinlock.c Allow rwlocks to re-enable interrupts 2009-04-02 19:05:11 -07:00
srcu.c
stacktrace.c
stop_machine.c cpumask: remove cpumask_t from core 2009-03-30 22:05:17 +10:30
sys_ni.c
sys.c kernel/sys.c: clean up sys_shutdown exit path 2009-04-13 15:04:32 -07:00
sysctl_check.c
sysctl.c mm: prevent divide error for small values of vm_dirty_bytes 2009-05-02 15:36:10 -07:00
taskstats.c
test_kprobes.c
time.c
timeconst.pl
timer.c Merge branches 'core-fixes-for-linus', 'irq-fixes-for-linus' and 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-04-09 10:35:30 -07:00
tracepoint.c tracepoints: dont update zero-sized tracepoint sections 2009-03-18 19:55:00 +01:00
tsacct.c
uid16.c
up.c
user_namespace.c
user.c Merge branch 'master' into next 2009-03-24 10:52:46 +11:00
utsname_sysctl.c proc_sysctl: use CONFIG_PROC_SYSCTL around ipc and utsname proc_handlers 2009-04-02 19:05:01 -07:00
utsname.c
wait.c
workqueue.c work_on_cpu(): rewrite it to create a kernel thread on demand 2009-04-09 09:50:37 +09:30