linux/block
Wanpeng Li 51d638b1f5 block/mq: fix potential deadlock during cpu hotplug
This can be triggered by hot-unplug one cpu.

======================================================
 [ INFO: possible circular locking dependency detected ]
 4.11.0+ #17 Not tainted
 -------------------------------------------------------
 step_after_susp/2640 is trying to acquire lock:
  (all_q_mutex){+.+...}, at: [<ffffffffb33f95b8>] blk_mq_queue_reinit_work+0x18/0x110

 but task is already holding lock:
  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (cpu_hotplug.lock){+.+.+.}:
        lock_acquire+0x11c/0x230
        __mutex_lock+0x92/0x990
        mutex_lock_nested+0x1b/0x20
        get_online_cpus+0x64/0x80
        blk_mq_init_allocated_queue+0x3a0/0x4e0
        blk_mq_init_queue+0x3a/0x60
        loop_add+0xe5/0x280
        loop_init+0x124/0x177
        do_one_initcall+0x53/0x1c0
        kernel_init_freeable+0x1e3/0x27f
        kernel_init+0xe/0x100
        ret_from_fork+0x31/0x40

 -> #0 (all_q_mutex){+.+...}:
        __lock_acquire+0x189a/0x18a0
        lock_acquire+0x11c/0x230
        __mutex_lock+0x92/0x990
        mutex_lock_nested+0x1b/0x20
        blk_mq_queue_reinit_work+0x18/0x110
        blk_mq_queue_reinit_dead+0x1c/0x20
        cpuhp_invoke_callback+0x1f2/0x810
        cpuhp_down_callbacks+0x42/0x80
        _cpu_down+0xb2/0xe0
        freeze_secondary_cpus+0xb6/0x390
        suspend_devices_and_enter+0x3b3/0xa40
        pm_suspend+0x129/0x490
        state_store+0x82/0xf0
        kobj_attr_store+0xf/0x20
        sysfs_kf_write+0x45/0x60
        kernfs_fop_write+0x135/0x1c0
        __vfs_write+0x37/0x160
        vfs_write+0xcd/0x1d0
        SyS_write+0x58/0xc0
        do_syscall_64+0x8f/0x710
        return_from_SYSCALL_64+0x0/0x7a

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(cpu_hotplug.lock);
                                lock(all_q_mutex);
                                lock(cpu_hotplug.lock);
   lock(all_q_mutex);

  *** DEADLOCK ***

 8 locks held by step_after_susp/2640:
  #0:  (sb_writers#6){.+.+.+}, at: [<ffffffffb3244aed>] vfs_write+0x1ad/0x1d0
  #1:  (&of->mutex){+.+.+.}, at: [<ffffffffb32d3a51>] kernfs_fop_write+0x101/0x1c0
  #2:  (s_active#166){.+.+.+}, at: [<ffffffffb32d3a59>] kernfs_fop_write+0x109/0x1c0
  #3:  (pm_mutex){+.+...}, at: [<ffffffffb30d2ecd>] pm_suspend+0x21d/0x490
  #4:  (acpi_scan_lock){+.+.+.}, at: [<ffffffffb34dc3d7>] acpi_scan_lock_acquire+0x17/0x20
  #5:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffffb306d6d7>] freeze_secondary_cpus+0x27/0x390
  #6:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffffb306cfd5>] cpu_hotplug_begin+0x5/0xe0
  #7:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffffb306d04f>] cpu_hotplug_begin+0x7f/0xe0

 stack backtrace:
 CPU: 3 PID: 2640 Comm: step_after_susp Not tainted 4.11.0+ #17
 Hardware name: Dell Inc. OptiPlex 7040/0JCTF8, BIOS 1.4.9 09/12/2016
 Call Trace:
  dump_stack+0x99/0xce
  print_circular_bug+0x1fa/0x270
  __lock_acquire+0x189a/0x18a0
  lock_acquire+0x11c/0x230
  ? lock_acquire+0x11c/0x230
  ? blk_mq_queue_reinit_work+0x18/0x110
  ? blk_mq_queue_reinit_work+0x18/0x110
  __mutex_lock+0x92/0x990
  ? blk_mq_queue_reinit_work+0x18/0x110
  ? kmem_cache_free+0x2cb/0x330
  ? anon_transport_class_unregister+0x20/0x20
  ? blk_mq_queue_reinit_work+0x110/0x110
  mutex_lock_nested+0x1b/0x20
  ? mutex_lock_nested+0x1b/0x20
  blk_mq_queue_reinit_work+0x18/0x110
  blk_mq_queue_reinit_dead+0x1c/0x20
  cpuhp_invoke_callback+0x1f2/0x810
  ? __flow_cache_shrink+0x160/0x160
  cpuhp_down_callbacks+0x42/0x80
  _cpu_down+0xb2/0xe0
  freeze_secondary_cpus+0xb6/0x390
  suspend_devices_and_enter+0x3b3/0xa40
  ? rcu_read_lock_sched_held+0x79/0x80
  pm_suspend+0x129/0x490
  state_store+0x82/0xf0
  kobj_attr_store+0xf/0x20
  sysfs_kf_write+0x45/0x60
  kernfs_fop_write+0x135/0x1c0
  __vfs_write+0x37/0x160
  ? rcu_read_lock_sched_held+0x79/0x80
  ? rcu_sync_lockdep_assert+0x2f/0x60
  ? __sb_start_write+0xd9/0x1c0
  ? vfs_write+0x1ad/0x1d0
  vfs_write+0xcd/0x1d0
  SyS_write+0x58/0xc0
  ? rcu_read_lock_sched_held+0x79/0x80
  do_syscall_64+0x8f/0x710
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL64_slow_path+0x25/0x25

The cpu hotplug path will hold cpu_hotplug.lock and then reinit all exiting
queues for blk mq w/ all_q_mutex, however, blk_mq_init_allocated_queue() will
contend these two locks in the inversion order. This is due to commit eabe06595d
(blk/mq: Cure cpu hotplug lock inversion), it fixes a cpu hotplug lock inversion
issue because of hotplug rework, however the hotplug rework is still work-in-progress
and lives in a -tip branch and mainline cannot yet trigger that splat. The commit
breaks the linus's tree in the merge window, so this patch reverts the lock order
and avoids to splat linus's tree.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-05-07 19:50:11 -06:00
..
partitions partitions/efi: Fix integer overflow in GPT size calculation 2017-01-17 09:02:31 -07:00
badblocks.c badblocks: badblocks_set/clear update unacked_exist 2016-10-21 15:45:47 -06:00
bfq-cgroup.c block, bfq: split bfq-iosched.c into multiple source files 2017-04-19 08:48:24 -06:00
bfq-iosched.c block, bfq: don't dereference bic before null checking it 2017-04-20 08:19:23 -06:00
bfq-iosched.h bfq: fix compile error if CONFIG_CGROUPS=n 2017-04-20 09:39:12 -06:00
bfq-wf2q.c block, bfq: split bfq-iosched.c into multiple source files 2017-04-19 08:48:24 -06:00
bio-integrity.c block: remove bio_is_rw 2016-10-28 08:45:17 -06:00
bio.c Merge branch 'md-next' into md-linus 2017-05-01 14:09:21 -07:00
blk-cgroup.c blkcg: allocate struct blkcg_gq outside request queue spinlock 2017-03-29 11:27:19 -06:00
blk-core.c blk-mq: untangle debugfs and sysfs 2017-05-04 08:24:13 -06:00
blk-exec.c block: remove the errors field from struct request 2017-04-20 12:16:10 -06:00
blk-flush.c block: make __blk_end_bidi_request private 2017-04-19 10:19:47 -06:00
blk-integrity.c block: fix blk_integrity_register to use template's interval_exp if not 0 2017-04-23 12:59:56 -06:00
blk-ioc.c Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2017-03-03 10:53:35 -08:00
blk-lib.c block: remove the discard_zeroes_data flag 2017-04-08 11:25:38 -06:00
blk-map.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
blk-merge.c block: implement splitting of REQ_OP_WRITE_ZEROES bios 2017-04-08 11:25:38 -06:00
blk-mq-cpumap.c blk-mq: export blk_mq_map_queues 2016-11-08 17:30:00 -05:00
blk-mq-debugfs.c mq-deadline: add debugfs attributes 2017-05-04 08:25:17 -06:00
blk-mq-debugfs.h mq-deadline: add debugfs attributes 2017-05-04 08:25:17 -06:00
blk-mq-pci.c blk-mq-pci: Fix two spelling mistakes 2017-03-29 11:09:51 -06:00
blk-mq-sched.c blk-mq-debugfs: allow schedulers to register debugfs attributes 2017-05-04 08:24:40 -06:00
blk-mq-sched.h blk-mq: Remove blk_mq_sched_move_to_dispatch() 2017-04-20 17:28:30 -06:00
blk-mq-sysfs.c blk-mq: untangle debugfs and sysfs 2017-05-04 08:24:13 -06:00
blk-mq-tag.c blk-mq: add shallow depth option for blk_mq_get_tag() 2017-04-14 14:06:54 -06:00
blk-mq-tag.h blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset 2017-03-02 08:56:04 -07:00
blk-mq-virtio.c blk-mq: provide a default queue mapping for virtio device 2017-02-27 20:54:05 +02:00
blk-mq.c block/mq: fix potential deadlock during cpu hotplug 2017-05-07 19:50:11 -06:00
blk-mq.h blk-mq: move debugfs declarations to a separate header file 2017-05-04 08:23:44 -06:00
blk-settings.c block: remove the discard_zeroes_data flag 2017-04-08 11:25:38 -06:00
blk-softirq.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/topology.h> 2017-03-02 08:42:26 +01:00
blk-stat.c blk-stat: kill blk_stat_rq_ddir() 2017-04-21 07:56:23 -06:00
blk-stat.h blk-stat: kill blk_stat_rq_ddir() 2017-04-21 07:56:23 -06:00
blk-sysfs.c blk-mq: untangle debugfs and sysfs 2017-05-04 08:24:13 -06:00
blk-tag.c blk-mq-sched: add framework for MQ capable IO schedulers 2017-01-17 10:04:20 -07:00
blk-throttle.c blk-throttle: fix unused variable warning with BLK_DEV_THROTTLING_LOW=n 2017-04-20 09:41:36 -06:00
blk-timeout.c block: remove the errors field from struct request 2017-04-20 12:16:10 -06:00
blk-wbt.c blk-stat: kill blk_stat_rq_ddir() 2017-04-21 07:56:23 -06:00
blk-wbt.h block: Make writeback throttling defaults consistent for SQ devices 2017-04-19 08:49:03 -06:00
blk-zoned.c block: Rename blk_queue_zone_size and bdev_zone_size 2017-01-12 07:58:32 -07:00
blk.h block: Export blk_init_request_from_bio() 2017-04-19 17:38:30 -06:00
bounce.c Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2015-09-19 18:57:09 -07:00
bsg-lib.c scsi: introduce a result field in struct scsi_request 2017-04-20 12:16:10 -06:00
bsg.c Merge branch 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-01 14:41:04 -07:00
cfq-iosched.c cfq: Disable writeback throttling by default 2017-04-05 08:15:08 -06:00
cmdline-parser.c
compat_ioctl.c block: remove the discard_zeroes_data flag 2017-04-08 11:25:38 -06:00
deadline-iosched.c block: enumify ELEVATOR_*_MERGE 2017-02-08 13:43:06 -07:00
elevator.c block: don't call blk_mq_quiesce_queue() after queue is frozen 2017-05-02 11:33:08 -06:00
genhd.c A reasonably busy cycle for documentation this time around. There is a new 2017-05-02 10:21:17 -07:00
ioctl.c block: remove the discard_zeroes_data flag 2017-04-08 11:25:38 -06:00
ioprio.c block: Optimize ioprio_best() 2017-04-19 17:38:36 -06:00
Kconfig libnvdimm for 4.12 2017-05-05 18:49:20 -07:00
Kconfig.iosched block, bfq: add full hierarchical scheduling and cgroups support 2017-04-19 08:30:26 -06:00
kyber-iosched.c kyber: add debugfs attributes 2017-05-04 08:25:17 -06:00
Makefile block, bfq: split bfq-iosched.c into multiple source files 2017-04-19 08:48:24 -06:00
mq-deadline.c mq-deadline: add debugfs attributes 2017-05-04 08:25:17 -06:00
noop-iosched.c block: move existing elevator ops to union 2017-01-17 10:03:33 -07:00
opal_proto.h block/sed-opal: allocate struct opal_dev dynamically 2017-02-17 12:41:47 -07:00
partition-generic.c libnvdimm for 4.12 2017-05-05 18:49:20 -07:00
scsi_ioctl.c scsi: introduce a result field in struct scsi_request 2017-04-20 12:16:10 -06:00
sed-opal.c block: sed-opal: Tone down all the pr_* to debugs 2017-04-07 14:24:16 -06:00
t10-pi.c block: constify struct blk_integrity_profile 2017-03-24 20:34:39 -06:00