xemu/block
Stefan Hajnoczi fe121b9d3c linux-aio: fix re-entrant completion processing
Commit 0ed93d84ed ("linux-aio: process
completions from ioq_submit()") added an optimization that processes
completions each time ioq_submit() returns with requests in flight.
This commit introduces a "Co-routine re-entered recursively" error which
can be triggered with -drive format=qcow2,aio=native.

Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, and I
debugged the following backtrace:

  (gdb) bt
  #0  0x00007ffff0a046f5 in raise () at /lib64/libc.so.6
  #1  0x00007ffff0a062fa in abort () at /lib64/libc.so.6
  #2  0x0000555555ac0013 in qemu_coroutine_enter (co=0x5555583464d0) at util/qemu-coroutine.c:113
  #3  0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
  #4  0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331
  #5  0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x555559d38ae0, offset=offset@entry=2932727808, type=type@entry=1) at block/linux-aio.c:383
  #6  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2932727808, qiov=0x555559d38e20, type=1) at block/linux-aio.c:402
  #7  0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2932727808, bytes=bytes@entry=8192, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:804
  #8  0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x555559d38d20, offset=offset@entry=2932727808, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x555559d38e20, flags=0) at block/io.c:1041
  #9  0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2932727808, bytes=8192, qiov=qiov@entry=0x555559d38e20, flags=flags@entry=0) at block/io.c:1133
  #10 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=<optimized out>) at block/qcow2.c:1509
  #11 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:804
  #12 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x555559d39000, offset=offset@entry=6178725888, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x555557527840, flags=0) at block/io.c:1041
  #13 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6178725888, bytes=bytes@entry=8192, qiov=qiov@entry=0x555557527840, flags=flags@entry=0) at block/io.c:1133
  #14 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6178725888, bytes=8192, qiov=0x555557527840, flags=0) at block/block-backend.c:783
  #15 0x0000555555a45266 in blk_aio_read_entry (opaque=0x5555577025e0) at block/block-backend.c:991
  #16 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78

It turned out that re-entrant ioq_submit() and completion processing
between three requests caused this error.  The following check is not
sufficient to prevent recursively entering coroutines:

  if (laiocb->co != qemu_coroutine_self()) {
      qemu_coroutine_enter(laiocb->co);
  }

As the following coroutine backtrace shows, not just the current
coroutine (self) can be entered.  There might also be other coroutines
that are currently entered and transferred control due to the qcow2 lock
(CoMutex):

  (gdb) qemu coroutine 0x5555583464d0
  #0  0x0000555555ac0c90 in qemu_coroutine_switch (from_=from_@entry=0x5555583464d0, to_=to_@entry=0x5555572f9890, action=action@entry=COROUTINE_ENTER) at util/coroutine-ucontext.c:175
  #1  0x0000555555abfe54 in qemu_coroutine_enter (co=0x5555572f9890) at util/qemu-coroutine.c:117
  #2  0x0000555555ac031c in qemu_co_queue_run_restart (co=co@entry=0x5555583462c0) at util/qemu-coroutine-lock.c:60
  #3  0x0000555555abfe5e in qemu_coroutine_enter (co=0x5555583462c0) at util/qemu-coroutine.c:119
  #4  0x0000555555a4b663 in qemu_laio_process_completions (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:218
  #5  0x0000555555a4b874 in ioq_submit (s=s@entry=0x555557e2f7f0) at block/linux-aio.c:331
  #6  0x0000555555a4ba12 in laio_do_submit (fd=fd@entry=13, laiocb=laiocb@entry=0x55555a338b40, offset=offset@entry=2911477760, type=type@entry=1) at block/linux-aio.c:383
  #7  0x0000555555a4bbd3 in laio_co_submit (bs=<optimized out>, s=0x555557e2f7f0, fd=13, offset=2911477760, qiov=0x55555a338e80, type=1) at block/linux-aio.c:402
  #8  0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x55555663bcb0, offset=offset@entry=2911477760, bytes=bytes@entry=8192, qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:804
  #9  0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x55555663bcb0, req=req@entry=0x55555a338d80, offset=offset@entry=2911477760, bytes=bytes@entry=8192, align=align@entry=512, qiov=qiov@entry=0x55555a338e80, flags=0) at block/io.c:1041
  #10 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=2911477760, bytes=8192, qiov=qiov@entry=0x55555a338e80, flags=flags@entry=0) at block/io.c:1133
  #11 0x0000555555a29629 in qcow2_co_preadv (bs=0x555556635890, offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=<optimized out>) at block/qcow2.c:1509
  #12 0x0000555555a4fd23 in bdrv_driver_preadv (bs=bs@entry=0x555556635890, offset=offset@entry=6157475840, bytes=bytes@entry=8192, qiov=qiov@entry=0x5555575df720, flags=0) at block/io.c:804
  #13 0x0000555555a52b34 in bdrv_aligned_preadv (bs=bs@entry=0x555556635890, req=req@entry=0x55555a339060, offset=offset@entry=6157475840, bytes=bytes@entry=8192, align=align@entry=1, qiov=qiov@entry=0x5555575df720, flags=0) at block/io.c:1041
  #14 0x0000555555a52db8 in bdrv_co_preadv (child=<optimized out>, offset=offset@entry=6157475840, bytes=bytes@entry=8192, qiov=qiov@entry=0x5555575df720, flags=flags@entry=0) at block/io.c:1133
  #15 0x0000555555a4515a in blk_co_preadv (blk=0x5555566356d0, offset=6157475840, bytes=8192, qiov=0x5555575df720, flags=0) at block/block-backend.c:783
  #16 0x0000555555a45266 in blk_aio_read_entry (opaque=0x555557231aa0) at block/block-backend.c:991
  #17 0x0000555555ac0cfa in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:78

Use the new qemu_coroutine_entered() function instead of comparing
against qemu_coroutine_self().  This is correct because:

1. If a coroutine is not entered then it must have yielded to wait for
   I/O completion.  It is therefore safe to enter.

2. If a coroutine is entered then it must be in
   ioq_submit()/qemu_laio_process_completions() because otherwise it
   would be yielded while waiting for I/O completion.  Therefore it will
   check laio->ret and return from ioq_submit() instead of yielding,
   i.e. it's guaranteed not to hang.

Reported-by: Fam Zheng <famz@redhat.com>
Tested-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1474989516-18255-4-git-send-email-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-28 17:11:23 +01:00
..
accounting.c block: Clean up includes 2016-01-20 13:36:23 +01:00
archipelago.c coccinelle: Remove unnecessary variables for function return value 2016-06-20 16:38:13 +02:00
backup.c Backup: export interfaces for extra serialization 2016-09-13 11:00:56 +01:00
blkdebug.c block/blkdebug: Store config filename 2016-08-15 15:52:28 +02:00
blkreplay.c blkreplay: Switch .bdrv_co_discard() to byte-based 2016-07-20 14:11:55 +01:00
blkverify.c block: Convert bdrv_aio_writev() to BdrvChild 2016-07-05 16:46:26 +02:00
block-backend.c block: Add blk_by_dev() 2016-09-23 13:36:10 +02:00
bochs.c block: Convert bdrv_co_preadv/pwritev to BdrvChild 2016-07-05 16:46:27 +02:00
cloop.c block: Convert bdrv_pread(v) to BdrvChild 2016-07-05 16:46:27 +02:00
commit.c commit: Add 'base' to the reopen queue before 'overlay_bs' 2016-09-23 13:36:10 +02:00
crypto.c crypto: make PBKDF iterations configurable for LUKS format 2016-09-19 16:30:45 +01:00
curl.c curl: Operate on zero-length file 2016-09-15 15:32:22 +03:00
dirty-bitmap.c dirty-bitmap: operate with int64_t amount 2016-07-19 16:54:46 -04:00
dmg.c block: Convert bdrv_pread(v) to BdrvChild 2016-07-05 16:46:27 +02:00
gluster.c block/gluster: add support to choose libgfapi logfile 2016-09-13 01:34:47 -04:00
io.c block/io: turn on dirty_bitmaps for the compressed writes 2016-09-05 19:06:48 +02:00
iscsi.c -----BEGIN PGP SIGNATURE----- 2016-09-23 13:10:43 +01:00
linux-aio.c linux-aio: fix re-entrant completion processing 2016-09-28 17:11:23 +01:00
Makefile.objs vhdx: Use QEMU UUID API 2016-09-23 11:42:52 +08:00
mirror.c mirror: auto complete active commit 2016-09-13 11:00:56 +01:00
nbd-client.c nbd: Convert to byte-based interface 2016-07-20 14:24:25 +01:00
nbd-client.h nbd: Limit nbdflags to 16 bits 2016-08-03 18:44:56 +02:00
nbd.c block/nbd: Store runtime option values 2016-08-15 15:52:29 +02:00
nfs.c coroutine: move entry argument to qemu_coroutine_create 2016-07-13 13:26:02 +02:00
null.c block/null: Implement bdrv_refresh_filename() 2016-06-16 15:20:37 +02:00
parallels.c block/parallels: check new image size 2016-08-05 09:59:06 +01:00
qapi.c qapi: Add new visit_complete() function 2016-07-06 10:52:04 +02:00
qcow2-cache.c block: Convert bdrv_pwrite(v/_sync) to BdrvChild 2016-07-05 16:46:27 +02:00
qcow2-cluster.c qcow2: fix encryption during cow of sectors 2016-09-23 13:36:09 +02:00
qcow2-refcount.c block: Convert bdrv_discard() to byte-based 2016-07-20 14:11:55 +01:00
qcow2-snapshot.c block: Convert bdrv_pwrite(v/_sync) to BdrvChild 2016-07-05 16:46:27 +02:00
qcow2.c qcow2: avoid memcpy(dst, NULL, len) 2016-09-13 11:00:55 +01:00
qcow2.h Remove unused function declarations 2016-09-15 15:32:22 +03:00
qcow.c qcow: cleanup qcow_co_pwritev_compressed to avoid the recursion 2016-09-05 19:06:48 +02:00
qed-check.c qed: Use DIV_ROUND_UP 2016-06-07 18:19:24 +03:00
qed-cluster.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-gencb.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-l2-cache.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-table.c block: Convert bdrv_aio_writev() to BdrvChild 2016-07-05 16:46:26 +02:00
qed.c coroutine: move entry argument to qemu_coroutine_create 2016-07-13 13:26:02 +02:00
qed.h util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
quorum.c block: Convert bdrv_aio_writev() to BdrvChild 2016-07-05 16:46:26 +02:00
raw_bsd.c raw_bsd: Convert to byte-based interface 2016-07-20 14:24:25 +01:00
raw-posix.c block: Convert .bdrv_aio_discard() to byte-based 2016-07-20 14:11:55 +01:00
raw-win32.c raw-posix: Switch paio_submit() to byte-based 2016-07-20 14:11:55 +01:00
rbd.c block: Convert .bdrv_aio_discard() to byte-based 2016-07-20 14:11:55 +01:00
replication.c replication: Implement new driver for block replication 2016-09-13 11:00:56 +01:00
sheepdog.c sheepdog: remove useless casts 2016-09-15 15:32:22 +03:00
snapshot.c error: Remove NULL checks on error_propagate() calls 2016-06-20 16:38:13 +02:00
ssh.c block/ssh: Use QemuOpts for runtime options 2016-08-15 15:52:28 +02:00
stream.c Improve block job rate limiting for small bandwidth values 2016-07-13 13:41:38 +02:00
throttle-groups.c block: Move I/O throttling configuration functions to BlockBackend 2016-05-19 16:45:30 +02:00
trace-events trace-events: fix first line comment in trace-events 2016-08-12 10:36:01 +01:00
vdi.c vdi: Use QEMU UUID API 2016-09-23 11:42:52 +08:00
vhdx-endian.c vhdx: Use QEMU UUID API 2016-09-23 11:42:52 +08:00
vhdx-log.c block: Convert bdrv_pwrite(v/_sync) to BdrvChild 2016-07-05 16:46:27 +02:00
vhdx.c vhdx: Use QEMU UUID API 2016-09-23 11:42:52 +08:00
vhdx.h block: vhdx - update PAYLOAD_BLOCK_UNMAPPED value to match 1.00 spec 2014-12-12 15:42:22 +00:00
vmdk.c vmdk: add vmdk_co_pwritev_compressed 2016-09-05 19:06:48 +02:00
vpc.c vpc: Use QEMU UUID API 2016-09-23 11:42:52 +08:00
vvfat.c block: Add "read-only" to the options QDict 2016-09-23 13:36:10 +02:00
win32-aio.c linux-aio: share one LinuxAioState within an AioContext 2016-07-18 15:09:31 +01:00
write-threshold.c block: Clean up includes 2016-01-20 13:36:23 +01:00