linux/kernel
Tejun Heo 85fbd722ad libata, freezer: avoid block device removal while system is frozen
Freezable kthreads and workqueues are fundamentally problematic in
that they effectively introduce a big kernel lock widely used in the
kernel and have already been the culprit of several deadlock
scenarios.  This is the latest occurrence.

During resume, libata rescans all the ports and revalidates all
pre-existing devices.  If it determines that a device has gone
missing, the device is removed from the system which involves
invalidating block device and flushing bdi while holding driver core
layer locks.  Unfortunately, this can race with the rest of device
resume.  Because freezable kthreads and workqueues are thawed after
device resume is complete and block device removal depends on
freezable workqueues and kthreads (e.g. bdi_wq, jbd2) to make
progress, this can lead to deadlock - block device removal can't
proceed because kthreads are frozen and kthreads can't be thawed
because device resume is blocked behind block device removal.

839a8e8660 ("writeback: replace custom worker pool implementation
with unbound workqueue") made this particular deadlock scenario more
visible but the underlying problem has always been there - the
original forker task and jbd2 are freezable too.  In fact, this is
highly likely just one of many possible deadlock scenarios given that
freezer behaves as a big kernel lock and we don't have any debug
mechanism around it.

I believe the right thing to do is getting rid of freezable kthreads
and workqueues.  This is something fundamentally broken.  For now,
implement a funny workaround in libata - just avoid doing block device
hot[un]plug while the system is frozen.  Kernel engineering at its
finest.  :(

v2: Add EXPORT_SYMBOL_GPL(pm_freezing) for cases where libata is built
    as a module.

v3: Comment updated and polling interval changed to 10ms as suggested
    by Rafael.

v4: Add #ifdef CONFIG_FREEZER around the hack as pm_freezing is not
    defined when FREEZER is not configured thus breaking build.
    Reported by kbuild test robot.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Tomaž Šolc <tomaz.solc@tablix.org>
Reviewed-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=62801
Link: http://lkml.kernel.org/r/20131213174932.GA27070@htj.dyndns.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Len Brown <len.brown@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Cc: kbuild test robot <fengguang.wu@intel.com>
2013-12-19 13:50:32 -05:00
..
cpu
debug kdb: Add support for external NMI handler to call KGDB/KDB 2013-10-03 18:47:54 +02:00
events list: introduce list_next_entry() and list_prev_entry() 2013-11-13 12:09:23 +09:00
gcov gcov: reuse kbasename helper 2013-11-13 12:09:34 +09:00
irq Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-19 10:40:00 -08:00
locking locking/lockdep: Mark __lockdep_count_forward_deps() as static 2013-11-13 13:50:17 +01:00
power Merge branch 'pm-sleep' 2013-11-19 01:07:08 +01:00
printk printk.c: comments should refer to /proc/vmcore instead of /proc/vmcoreinfo 2013-11-13 12:09:14 +09:00
rcu This batch of changes is mostly clean ups and small bug fixes. 2013-11-16 12:23:18 -08:00
sched sched/fair: Avoid integer overflow 2013-11-13 13:33:55 +01:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 10:36:00 +09:00
trace This batch of changes is mostly clean ups and small bug fixes. 2013-11-16 12:23:18 -08:00
.gitignore
acct.c
async.c
audit_tree.c
audit_watch.c
audit.c Merge git://git.infradead.org/users/eparis/audit 2013-11-21 19:18:14 -08:00
audit.h audit: call audit_bprm() only once to add AUDIT_EXECVE information 2013-11-05 11:15:03 -05:00
auditfilter.c audit: do not reject all AUDIT_INODE filter types 2013-11-05 11:09:16 -05:00
auditsc.c audit: fix type of sessionid in audit_set_loginuid() 2013-11-06 11:47:24 -05:00
backtracetest.c
bounds.c kernel/bounds: avoid circular dependencies in generated headers 2013-11-19 14:20:12 -08:00
capability.c
cgroup_freezer.c
cgroup.c consolidate simple ->d_delete() instances 2013-11-15 22:04:17 -05:00
compat.c
configs.c
context_tracking.c Linux 3.12-rc4 2013-10-09 12:36:13 +02:00
cpu_pm.c
cpu.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:55:11 +09:00
cpuset.c
crash_dump.c
cred.c
delayacct.c kernel/delayacct.c: remove redundant checking in __delayacct_add_tsk() 2013-11-13 12:09:12 +09:00
dma.c
elfcore.c switch elf_core_write_extra_phdrs() to dump_emit() 2013-11-09 00:16:23 -05:00
exec_domain.c
exit.c
extable.c
fork.c mm: implement split page table lock for PMD level 2013-11-15 09:32:15 +09:00
freezer.c libata, freezer: avoid block device removal while system is frozen 2013-12-19 13:50:32 -05:00
futex_compat.c
futex.c locking: Move the rtmutex code to kernel/locking/ 2013-11-06 09:23:59 +01:00
groups.c
hrtimer.c
hung_task.c Here are the 3.13 KVM changes. There was a lot of work on the PPC 2013-11-15 13:51:36 +09:00
irq_work.c
itimer.c
jump_label.c static_key: WARN on usage before jump_label_init was called 2013-10-19 19:45:35 -04:00
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS 2013-11-15 09:32:22 +09:00
Kconfig.locks
Kconfig.preempt
kexec.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2013-11-15 16:47:22 -08:00
kmod.c kernel/kmod.c: check for NULL in call_usermodehelper_exec() 2013-09-30 14:31:02 -07:00
kprobes.c kprobes: use KSYM_NAME_LEN to size identifier buffers 2013-11-13 12:09:26 +09:00
ksysfs.c
kthread.c kthread: make kthread_create() killable 2013-11-13 12:08:59 +09:00
latencytop.c
Makefile Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2013-11-21 19:46:00 -08:00
module_signing.c keys: change asymmetric keys to use common hash definitions 2013-10-25 17:15:18 -04:00
module-internal.h KEYS: Separate the kernel signature checking keyring from module signing 2013-09-25 17:17:01 +01:00
module.c Mainly boring here, too. rmmod --wait finally removed, though. 2013-11-15 13:27:50 +09:00
notifier.c
nsproxy.c
padata.c
panic.c kernel/panic.c: reduce 1 byte usage for print tainted buffer 2013-11-13 12:09:35 +09:00
params.c kernel/params: fix handling of signed integer types 2013-09-28 12:35:52 -07:00
pid_namespace.c pid_namespace: make freeing struct pid_namespace rcu-delayed 2013-10-24 23:43:29 -04:00
pid.c pidns: fix free_pid() to handle the first fork failure 2013-09-30 14:31:03 -07:00
posix-cpu-timers.c
posix-timers.c
profile.c
ptrace.c exec/ptrace: fix get_dumpable() incorrect tests 2013-11-13 12:09:33 +09:00
range.c
reboot.c
relay.c
res_counter.c
resource.c
seccomp.c
signal.c constify copy_siginfo_to_user{,32}() 2013-11-09 00:16:29 -05:00
smp.c kernel: fix generic_exec_single indentation 2013-11-15 09:32:22 +09:00
smpboot.c
smpboot.h
softirq.c revert "softirq: Add support for triggering softirq work on softirqs" 2013-11-15 09:32:22 +09:00
stacktrace.c
stop_machine.c stop_machine: Fix race between stop_two_cpus() and stop_cpus() 2013-11-11 12:43:38 +01:00
sys_ni.c
sys.c kernel/sys.c: remove obsolete #include <linux/kexec.h> 2013-11-13 12:09:13 +09:00
sysctl_binary.c kernel/sysctl_binary.c: use scnprintf() instead of snprintf() 2013-11-13 12:09:33 +09:00
sysctl.c Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:30:30 +09:00
system_certificates.S kernel/system_certificate.S: use real contents instead of macro GLOBAL() 2013-10-30 12:58:00 +00:00
system_keyring.c KEYS: Make the system 'trusted' keyring viewable by userspace 2013-09-25 17:17:01 +01:00
task_work.c
taskstats.c genetlink: only pass array to genl_register_family_with_ops() 2013-11-19 16:39:05 -05:00
test_kprobes.c
time.c
timeconst.bc
timer.c
tracepoint.c
tsacct.c
uid16.c
up.c kernel: provide a __smp_call_function_single stub for !CONFIG_SMP 2013-11-15 09:32:22 +09:00
user_namespace.c
user-return-notifier.c
user.c
utsname_sysctl.c
utsname.c
watchdog.c
workqueue_internal.h
workqueue.c