linux/drivers/base
David Rientjes 30467e0b3b mm, hotplug: fix concurrent memory hot-add deadlock
There's a deadlock when concurrently hot-adding memory through the probe
interface and switching a memory block from offline to online.

When hot-adding memory via the probe interface, add_memory() first takes
mem_hotplug_begin() and then device_lock() is later taken when registering
the newly initialized memory block.  This creates a lock dependency of (1)
mem_hotplug.lock (2) dev->mutex.

When switching a memory block from offline to online, dev->mutex is first
grabbed in device_online() when the write(2) transitions an existing
memory block from offline to online, and then online_pages() will take
mem_hotplug_begin().

This creates a lock inversion between mem_hotplug.lock and dev->mutex.
Vitaly reports that this deadlock can happen when kworker handling a probe
event races with systemd-udevd switching a memory block's state.

This patch requires the state transition to take mem_hotplug_begin()
before dev->mutex.  Hot-adding memory via the probe interface creates a
memory block while holding mem_hotplug_begin(), there is no way to take
dev->mutex first in this case.

online_pages() and offline_pages() are only called when transitioning
memory block state.  We now require that mem_hotplug_begin() is taken
before calling them -- this requires exporting the mem_hotplug_begin() and
mem_hotplug_done() to generic code.  In all hot-add and hot-remove cases,
mem_hotplug_begin() is done prior to device_online().  This is all that is
needed to avoid the deadlock.

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:00 -07:00
..
power Merge branches 'pm-domains' and 'pm-cpufreq' 2015-03-06 01:29:31 +01:00
regmap regmap: Patch for v4.1 2015-04-13 15:00:55 -07:00
attribute_container.c attribute_container: fix missing blank lines after declarations 2015-03-25 14:35:09 +01:00
base.h driver core: Move driver_data back to struct device 2014-05-27 12:37:18 -07:00
bus.c driver core: bus: Goto appropriate labels on failure in bus_add_device 2015-03-25 13:40:31 +01:00
cacheinfo.c drivers/base: cacheinfo: validate device node for all the caches 2015-03-25 14:38:41 +01:00
class.c drivers: base: class: Add a blank line after declarations 2015-03-25 14:36:19 +01:00
component.c component: fix bug with legacy API 2014-07-04 18:05:05 +01:00
container.c ACPI / hotplug / driver core: Handle containers in a special way 2013-12-29 15:25:48 +01:00
core.c drivers/core/of: Add symlink to device-tree from devices with an OF node 2015-03-25 14:56:58 +01:00
cpu.c drivers/base: use %*pb[l] to print bitmaps including cpumasks and nodemasks 2015-02-13 21:21:38 -08:00
dd.c driver core: Make probe deferral more quiet 2015-03-25 14:58:40 +01:00
devcoredump.c devcoredump: provide a one-way disable function 2014-11-26 19:40:12 -08:00
devres.c devres: Improve devm_kasprintf()/kvasprintf() support 2014-09-23 23:32:50 -07:00
devtmpfs.c devtmpfs: Calling delete_path() only when necessary 2013-12-19 10:10:32 -08:00
dma-coherent.c drivers: dma-coherent: add initialization from device tree 2014-10-14 02:18:12 +02:00
dma-contiguous.c drivers: of: add return value to of_reserved_mem_device_init() 2014-10-29 16:33:14 -07:00
dma-mapping.c drivers: base: dma-mapping: Erase blank space after pointer 2015-03-25 14:36:19 +01:00
driver.c driver core: add missing blank line after declaration 2015-03-25 14:36:30 +01:00
firmware_class.c drivers: base: fw: fix ret value when loading fw 2015-03-25 14:49:10 +01:00
firmware.c
hypervisor.c drivers/base: Add export.h for EXPORT_SYMBOL/THIS_MODULE as required. 2011-10-31 19:31:38 -04:00
init.c ACPI / hotplug / driver core: Handle containers in a special way 2013-12-29 15:25:48 +01:00
isa.c dma-mapping: replace all DMA_24BIT_MASK macro with DMA_BIT_MASK(24) 2009-04-07 08:31:12 -07:00
Kconfig cma: make default CMA area size zero for x86 2014-12-10 17:41:06 -08:00
Makefile Driver core patches for 3.19-rc1 2014-12-14 16:10:09 -08:00
map.c drivers: base: map: Use kmalloc_array instead of kmalloc 2015-03-25 14:35:08 +01:00
memory.c mm, hotplug: fix concurrent memory hot-add deadlock 2015-04-14 16:49:00 -07:00
module.c driver core: module.c: Use kasprintf 2010-05-21 09:37:29 -07:00
node.c drivers: base: node: Delete space after pointer declaration 2015-03-25 14:36:20 +01:00
pinctrl.c drivers: pinctrl sleep and idle states in the core 2013-06-16 11:56:52 +02:00
platform.c drivers: platform: parse IRQ flags from resources 2015-03-25 15:04:32 +01:00
property.c Driver core: Fix missing whitespace in function argument 2015-03-25 14:35:08 +01:00
soc.c drivers/base: use tabs where possible in code indentation 2015-03-25 14:37:35 +01:00
syscore.c genirq: Simplify wakeup mechanism 2014-09-01 13:48:59 +02:00
topology.c topology: replace custom attribute macros with standard DEVICE_ATTR* 2014-11-07 11:45:00 -08:00
transport_class.c drivers/base: transport_class explicitly requires EXPORT_SYMBOL 2011-10-31 19:31:15 -04:00