Commit Graph

164 Commits

Author SHA1 Message Date
Peter Maydell
2d40fa6987 Block patches for 2.1.0-rc0
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTrbz4AAoJEH8JsnLIjy/WlIkP/RepIwS29f19i3B/idGzUdYW
 9XJnVowRvpkUzDqUprrr7lPHMW/CwAswLNis9B1hZ59rx+tx4Hm/rZGARlqhSOSO
 ZMdW32GFW0SyC5PglFBwGQAk4U0FxwW5cJD6US7h3L4pACIdCkzFSNxehyfCMyU/
 oJkjuAH4a2IQoQf/M7WMm5kPkrdpRp6ZgbQvJGHaR63cuulZDb7rbHMyG66MWH8P
 wahhFFPY1wOeMBiISxPbmcTus+AlfCffG5qPqq83OtaIuWzINTmWlpiFmtx+Aqwy
 HSvGnFJ4Rf7J6Fw8sdTsABdqUTc/gxDYmhAuftm/hsjD9MvPeuFSLPMPLfGg6aPR
 umKaeBOw8NoMTPgbxg403gxFTrHar+TidBu8KgZw5T189/oJSSpT2J53uHWazmd9
 8USkcYQ7VHdFUQVXluLEzHMIWc7kf87ylQ8c9S1yCkNeWYxRZDZGgHEU49ov7FFU
 FnA0w+ZFyDkU8d5gryG+vxOeBDlmXD4UHa676gGlaYhs7YC/BY/JaMgqY4Fd6MMW
 dS5ibPjdtbxEZTh29eWEByMWpzuitr+iPPzsJEdC29LeIIj3XRQq/4FyiQ6EMAAO
 iOlcqE3tws0Ty8GEp78xsAYjaLuH3zmvOTa4aHUQ+K9kwpMPFSJKEcLkwPWWYRbs
 qR2ZL6M+95oQTYkYzv8i
 =Wwqx
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block patches for 2.1.0-rc0

# gpg: Signature made Fri 27 Jun 2014 19:50:32 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"

* remotes/kevin/tags/for-upstream: (47 commits)
  iotests: Fix 083 for out-of-tree builds
  iotests: Drop Python version from 065's Shebang
  iotests: Use $PYTHON for Python scripts
  iotests: Source common.env
  configure: Enable out-of-tree iotests
  iotests: Allow out-of-tree run
  block.c: Don't return success for bdrv_append_temp_snapshot() failure
  qemu-iotests: Add TestRepairQuorum to 041 to test drive-mirror node-name mode.
  block: Add replaces argument to drive-mirror
  blockjob: Fix recent BLOCK_JOB_ERROR regression
  blockjob: Fix recent BLOCK_JOB_READY regression
  virtio-blk: Rename complete_request_early to complete_request_vring
  virtio-blk: Unify {non-,}dataplane's request handlings
  virtio-blk: Schedule BH in the right context
  virtio-blk: Export request handling functions to dataplane
  virtio-blk: Make request completion function virtual
  block: acquire AioContext in qmp_query_blockstats()
  block: make bdrv_query_stats() static
  virtio-blk: Fix and clean up the in_sg and out_sg check
  virtio-blk: Fill in VirtIOBlockReq.out in dataplane code
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-06-29 15:24:54 +01:00
Chen Gang
6b8aeca574 block.c: Don't return success for bdrv_append_temp_snapshot() failure
When failure occurs, 'ret' need be set, or may return 0 to indicate
success. Previously, an error was set in errp, but 0 was returned
anyway. So let bdrv_append_temp_snapshot() return an error code and
use that for the bdrv_open() return value.

Also, error_propagate() need be called only one time within a function.

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 20:00:00 +02:00
Benoît Canet
09158f00e0 block: Add replaces argument to drive-mirror
drive-mirror will bdrv_swap the new BDS named node-name with the one
pointed by replaces when the mirroring is finished.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 20:00:00 +02:00
Stefan Hajnoczi
ac46821f2c block: make bdrv_query_stats() static
This function is only called from block/qapi.c.  There is no need to
keep it public.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Tested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27 18:19:57 +02:00
Wenchao Xia
2f44a08b3e qapi event: clean up in callers
This patch improves docs and address small issues in event
callers.

Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-27 09:27:56 -04:00
Kevin Wolf
8ee79e707a block: Catch backing files assigned to non-COW drivers
Since we parse backing.* options to add a backing file from the command
line when the driver didn't assign one, it has been possible to have a
backing file for e.g. raw images (it just was never accessed).

This is obvious nonsense and should be rejected.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-06-26 13:51:01 +02:00
Fam Zheng
dc71ce45de blockjob: Add block_job_yield()
This will unset busy flag and put coroutine to sleep, can be used to
wait for QMP complete/cancel.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-26 12:12:22 +02:00
Wenchao Xia
bcada37b19 qapi event: convert other BLOCK_JOB events
Since BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED, BLOCK_JOB_READY are
related, convert them in one patch. The block_job_event_* functions
are used to keep encapsulation of BlockJob structure.

Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-23 11:12:28 -04:00
Wenchao Xia
5a2d2cbd88 qapi event: convert BLOCK_IO_ERROR and BLOCK_JOB_ERROR
Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-23 11:12:27 -04:00
Wenchao Xia
a589569f2f qapi: adjust existing defines
In order to let event defines use existing types later, instead of
redefine new ones, some old type defines for spice and vnc are changed,
and BlockErrorAction is moved from block.h to qapi schema. Note that
BlockErrorAction is not merged with BlockdevOnError.

At this point, VncInfo is not made a child of VncBasicInfo, because
VncBasicInfo has mandatory fields where VncInfo makes them optional.

Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-23 11:01:25 -04:00
Alexey Kardashevskiy
b9e77bc718 scsi: Print command name in debug
This makes scsi_command_name() public.

This makes use of scsi_command_name() in debug output for scsi-disk and
spapr-vscsi host bus adapter. Before this, SCSI used to print hex numbers
instead of human-friendly strings.

This adds GET_EVENT_STATUS_NOTIFICATION and READ_DISC_INFORMATION to
the list of SCSI commands supported by scsi_command_name().

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-06-18 08:47:10 +02:00
Chunyan Liu
c282e1fdf7 cleanup QEMUOptionParameter
Now that all backend drivers are using QemuOpts, remove all
QEMUOptionParameter related codes.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:21 +08:00
Chunyan Liu
83d0521a1e change block layer to support both QemuOpts and QEMUOptionParamter
Change block layer to support both QemuOpts and QEMUOptionParameter.
After this patch, it will change backend drivers one by one. At the end,
QEMUOptionParameter will be removed and only QemuOpts is kept.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Chunyan Liu <cyliu@suse.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-16 17:23:20 +08:00
Fam Zheng
db519cba87 block: Move declaration of bdrv_get_aio_context to block.h
block_int.h is for block layer and block drivers, other code shouldn't
include it. But similar to bdrv_set_aio_context, bdrv_get_aio_context
should also be accessible from outside of block layer.

Move it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-04 09:56:12 +02:00
Stefan Hajnoczi
76ef2cf549 raw-posix: drop raw_get_aio_fd() since it is no longer used
virtio-blk data-plane now uses the QEMU block layer for I/O.  We do not
need raw_get_aio_fd() anymore.  It was a layering violation anyway, so
let's get rid of it.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-04 09:56:12 +02:00
Stefan Hajnoczi
dcd042282d block: add bdrv_set_aio_context()
Up until now all BlockDriverState instances have used the QEMU main loop
for fd handlers, timers, and BHs.  This is not scalable on SMP guests
and hosts so we need to move to a model with multiple event loops on
different host CPUs.

bdrv_set_aio_context() assigns the AioContext event loop to use for a
particular BlockDriverState.  It first detaches the entire
BlockDriverState graph from the current AioContext and then attaches to
the new AioContext.

This function will be used by virtio-blk data-plane to assign a
BlockDriverState to its IOThread AioContext.  Make
bdrv_aio_set_context() public since data-plane should not include
block_int.h.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-06-04 09:56:11 +02:00
Fam Zheng
826b6ca0b0 block: Add backing_blocker in BlockDriverState
This makes use of op_blocker and blocks all the operations except for
commit target, on each BlockDriverState->backing_hd.

The asserts for op_blocker in bdrv_swap are removed because with this
change, the target of block commit has at least the backing blocker of
its child, so the assertion is not true. Callers should do their check.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28 14:28:46 +02:00
Fam Zheng
8d24cce1e3 block: Add bdrv_set_backing_hd()
This is the common but non-trivial steps to assign or change the
backing_hd of BDS.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28 14:28:46 +02:00
Fam Zheng
3718d8ab65 block: Replace in_use with operation blocker
This drops BlockDriverState.in_use with op_blockers:

  - Call bdrv_op_block_all in place of bdrv_set_in_use(bs, 1).

  - Call bdrv_op_unblock_all in place of bdrv_set_in_use(bs, 0).

  - Check bdrv_op_is_blocked() in place of bdrv_in_use(bs).

    The specific types are used, e.g. in place of starting block backup,
    bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_BACKUP, ...).

    There is one exception in block_job_create, where
    bdrv_op_blocker_is_empty() is used, because we don't know the operation
    type here. This doesn't matter because in a few commits away we will drop
    the check and move it to callers that _do_ know the type.

  - Check bdrv_op_blocker_is_empty() in place of assert(!bs->in_use).

Note: there is only bdrv_op_block_all and bdrv_op_unblock_all callers at
this moment. So although the checks are specific to op types, this
changes can still be seen as identical logic with previously with
in_use. The difference is error message are improved because of blocker
error info.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28 14:28:46 +02:00
Fam Zheng
fbe40ff780 block: Introduce op_blockers to BlockDriverState
BlockDriverState.op_blockers is an array of lists with BLOCK_OP_TYPE_MAX
elements. Each list is a list of blockers of an operation type
(BlockOpType), that marks this BDS as currently blocked for a certain
type of operation with reason errors stored in the list. The rule of
usage is:

 * BDS user who wants to take an operation should check if there's any
   blocker of the type with bdrv_op_is_blocked().

 * BDS user who wants to block certain types of operation, should call
   bdrv_op_block (or bdrv_op_block_all to block all types of operations,
   which is similar to the existing bdrv_set_in_use()).

 * A blocker is only referenced by op_blockers, so the lifecycle is
   managed by caller, and shouldn't be lost until unblock, so typically
   a caller does these:

   - Allocate a blocker with error_setg or similar, call bdrv_op_block()
     to block some operations.
   - Hold the blocker, do his job.
   - Unblock operations that it blocked, with the same reason pointer
     passed to bdrv_op_unblock().
   - Release the blocker with error_free().

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28 14:28:46 +02:00
Fam Zheng
8574575f90 block: Add BlockOpType enum
This adds the enum of all the operations that can be taken on a block
device.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-28 14:28:46 +02:00
Peter Lieven
465bee1da8 block: optimize zero writes with bdrv_write_zeroes
this patch tries to optimize zero write requests
by automatically using bdrv_write_zeroes if it is
supported by the format.

This significantly speeds up file system initialization and
should speed zero write test used to test backend storage
performance.

I ran the following 2 tests on my internal SSD with a
50G QCOW2 container and on an attached iSCSI storage.

a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX

QCOW2         [off]     [on]     [unmap]
-----
runtime:       14secs    1.1secs  1.1secs
filesize:      937M      18M      18M

iSCSI         [off]     [on]     [unmap]
----
runtime:       9.3s      0.9s     0.9s

b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct

QCOW2         [off]     [on]     [unmap]
-----
runtime:       246secs   18secs   18secs
filesize:      51G       192K     192K
throughput:    203M/s    2.3G/s   2.3G/s

iSCSI*        [off]     [on]     [unmap]
----
runtime:       8mins     45secs   33secs
throughput:    106M/s    1.2G/s   1.6G/s
allocated:     100%      100%     0%

* The storage was connected via an 1Gbit interface.
  It seems to internally handle writing zeroes
  via WRITESAME16 very fast.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-05-19 13:42:27 +02:00
Kevin Wolf
e88ae2264d block: Fix bdrv_is_allocated() for short backing files
bdrv_is_allocated() shouldn't return true for sectors that are
unallocated, but after the end of a short backing file, even though
such sectors are (correctly) marked as containing zeros.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-05-19 11:36:48 +02:00
Kevin Wolf
b1e6fc0817 block: Fix open flags with BDRV_O_SNAPSHOT
The immediately visible effect of this patch is that it fixes committing
a temporary snapshot to its backing file. Previously, it would fail with
a "permission denied" error because bdrv_inherited_flags() forced the
backing file to be read-only, ignoring the r/w reopen of bdrv_commit().

The bigger problem this revealed is that the original open flags must
actually only be applied to the temporary snapshot, and the original
image file must be treated as a backing file of the temporary snapshot
and get the right flags for that.

Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-09 20:57:31 +02:00
Fam Zheng
85f49cad87 qemu-img: Convert by cluster size if target is compressed
If target block driver forces compression, qemu-img convert needs to
write by cluster size as well as "-c" option.

Particularly, this applies for converting to VMDK streamOptimized
format.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-05-09 13:32:16 +02:00
Kevin Wolf
8bfea15dda block: Unlink temporary files in raw-posix/win32
Instead of having unlink() calls in the generic block layer, where we
aren't even guarateed to have a file name, move them to those block
drivers that are actually used and that always have a filename. Gets us
rid of some #ifdefs as well.

The patch also converts bs->is_temporary to a new BDRV_O_TEMPORARY open
flag so that it is inherited in the protocol layer and the raw-posix and
raw-win32 drivers can unlink the file.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-30 11:05:00 +02:00
Kevin Wolf
98522f63f4 block: Add errp to bdrv_new()
This patch adds an errp parameter to bdrv_new() and updates all its
callers. The next patches will make use of this in order to check for
duplicate IDs. Most of the callers know that their ID is fine, so they
can simply assert that there is no error.

Behaviour doesn't change with this patch yet as bdrv_new() doesn't
actually assign errors to errp.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-04-22 12:00:20 +02:00
Fam Zheng
b8afb520e4 block: Handle error of bdrv_getlength in bdrv_create_dirty_bitmap
bdrv_getlength could fail, check the return value before using it.
Return NULL and set errno if it fails. Callers are updated to handle
the error case.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-04-22 11:57:02 +02:00
Kevin Wolf
b998875dcf block: Fix snapshot=on for protocol parsed from filename
Since commit 9fd3171a, BDRV_O_SNAPSHOT uses an option QDict to specify
the originally requested image as the backing file of the newly created
temporary snapshot. This means that the filename is stored in
"file.filename", which is an option that is not parsed for protocol
names. Therefore things like -drive file=nbd:localhost:10809 were
broken because it looked for a local file with the literal name
'nbd:localhost:10809'.

This patch changes the way BDRV_O_SNAPSHOT works once again. We now open
the originally requested image as normal, and then do a similar
operation as for live snapshots to put the temporary snapshot on top.
This way, both driver specific options and parsed filenames work.

As a nice side effect, this results in code movement to factor
bdrv_append_temp_snapshot() out. This is a good preparation for moving
its call to drive_init() and friends eventually.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-04 19:35:51 +02:00
Kevin Wolf
5a8a30db47 block: Add error handling to bdrv_invalidate_cache()
If it returns an error, the migrated VM will not be started, but qemu
exits with an error message.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-03-19 09:39:41 +01:00
Stefan Hajnoczi
98563fc3ec aio: add aio_context_acquire() and aio_context_release()
It can be useful to run an AioContext from a thread which normally does
not "own" the AioContext.  For example, request draining can be
implemented by acquiring the AioContext and looping aio_poll() until all
requests have been completed.

The following pattern should work:

  /* Event loop thread */
  while (running) {
      aio_context_acquire(ctx);
      aio_poll(ctx, true);
      aio_context_release(ctx);
  }

  /* Another thread */
  aio_context_acquire(ctx);
  bdrv_read(bs, 0x1000, buf, 1);
  aio_context_release(ctx);

This patch implements aio_context_acquire() and aio_context_release().

Note that existing aio_poll() callers do not need to worry about
acquiring and releasing - it is only needed when multiple threads will
call aio_poll() on the same AioContext.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:42:24 +01:00
Benoît Canet
b5042a3622 block: Rewrite the snapshot authorization mechanism for block filters.
This patch keep the recursive way of doing things but simplify it by giving
two responsabilities to all block filters implementors.

They will need to do two things:

-Set the is_filter field of their block driver to true.

-Implement the bdrv_recurse_is_first_non_filter method of their block driver like
it is done on the Quorum block driver. (block/quorum.c)

[Paolo Bonzini <pbonzini@redhat.com> pointed out that this patch changes
the semantics of blkverify, which now recurses down both bs->file and
s->test_file.
-- Stefan]

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:23:27 +01:00
Paolo Bonzini
537b41f501 nbd: move socket wrappers to qemu-nbd
qemu-nbd is one of the few valid users of qerror_report_err.  Move
the error-reporting socket wrappers there.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:23 +01:00
Paolo Bonzini
c06b72781d nbd: inline tcp_socket_incoming_spec into sole caller
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Paolo Bonzini
77e8b9ca64 nbd: correctly propagate errors
Before:
    $ ./qemu-io-old
    qemu-io-old> open -r -o file.driver=nbd
    one of path and host must be specified.
    qemu-io-old: can't open device (null): Could not open image: Invalid argument
    $ ./qemu-io-old
    qemu-io-old> open -r -o file.driver=nbd,file.host=foo,file.path=bar
    path and host may not be used at the same time.
    qemu-io-old: can't open device (null): Could not open image: Invalid argument

After:
    $ ./qemu-io
    qemu-io> open -r -o file.driver=nbd
    qemu-io: can't open device (null): one of path and host must be specified.
    $ ./qemu-io
    qemu-io> open -r -o file.driver=nbd,file.host=foo,file.path=bar
    qemu-io: can't open device (null): path and host may not be used at the same time.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
f7d9fd8c72 block: Remove bdrv_open_image()'s force_raw option
This option is now unnecessary since specifying BDRV_O_PROTOCOL as flag
will do exactly the same.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
2e40134bfd block: Make bdrv_file_open() static
Add the bdrv_open() option BDRV_O_PROTOCOL which results in passing the
call to bdrv_file_open(). Additionally, make bdrv_file_open() static and
therefore bdrv_open() the only way to call it.

Consequently, all existing calls to bdrv_file_open() have to be adjusted
to use bdrv_open() with the BDRV_O_PROTOCOL flag instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
ddf5636dc9 block: Add reference parameter to bdrv_open()
Allow bdrv_open() to handle references to existing block devices just as
bdrv_file_open() is already capable of.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz
f67503e5bd block: Change BDS parameter of bdrv_open() to **
Make bdrv_open() take a pointer to a BDS pointer, similarly to
bdrv_file_open(). If a pointer to a NULL pointer is given, bdrv_open()
will create a new BDS with an empty name; if the BDS pointer is not
NULL, that existing BDS will be reused (in the same way as bdrv_open()
already did).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:21 +01:00
Kevin Wolf
9e1cb96d9a qemu-iotests: Test pwritev RMW logic
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:25 +01:00
Kevin Wolf
8407d5d7e2 block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper
Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_pwritev().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf
6460440f34 block: Allow wait_serialising_requests() at any point
We can only have a single wait_serialising_requests() call per request
because otherwise we can run into deadlocks where requests are waiting
for each other. The same is true when wait_serialising_requests() is not
at the very beginning of a request, so that other requests can be issued
between the start of the tracking and wait_serialising_requests().

Fix this by changing wait_serialising_requests() to ignore requests that
are already (directly or indirectly) waiting for the calling request.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
7327145f63 block: Make overlap range for serialisation dynamic
Copy on Read wants to serialise with all requests touching the same
cluster, so wait_serialising_requests() rounded to cluster boundaries.
Other users like alignment RMW will have different requirements, though
(requests touching the same sector), so make it dynamic.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
2dbafdc012 block: Generalise and optimise COR serialisation
Change the API so that specific requests can be marked serialising. Only
these requests are checked for overlaps then.

This means that during a Copy on Read operation, not all requests
overlapping other requests are serialised any more, but only those that
actually overlap with the specific COR request.

Also remove COR from function and variable names because this
functionality can be useful in other contexts.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf
793ed47a7a block: Switch BdrvTrackedRequest to byte granularity
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Paolo Bonzini
c25f53b06e raw: Probe required direct I/O alignment
Add a bs->request_alignment field that contains the required
offset/length alignment for I/O requests and fill it in the raw block
drivers. Use ioctls if possible, else see what alignment it takes for
O_DIRECT to succeed.

While at it, also expose the memory alignment requirements, which may be
(and in practice are) different from the disk alignment requirements.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:02 +01:00
Paolo Bonzini
1b7fd72955 block: rename buffer_alignment to guest_block_size
The alignment field is now set to the value that is promised to the
guest, rather than required by the host.  The next patches will make
QEMU aware of the host-provided values, so make this clear.

The alignment is also not about memory buffers, but about the sectors on
the disk, change the documentation of the field.

At this point, the field is set by the device emulation, but completely
ignored by the block layer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf
339064d506 block: Don't use guest sector size for qemu_blockalign()
bs->buffer_alignment is set by the device emulation and contains the
logical block size of the guest device. This isn't something that the
block layer should know, and even less something to use for determining
the right alignment of buffers to be used for the host.

The new BlockLimits field opt_mem_alignment tells the qemu block layer
the optimal alignment to be used so that no bounce buffer must be used
in the driver.

This patch may change the buffer alignment from 4k to 512 for all
callers that used qemu_blockalign() with the top-level image format
BlockDriverState. The value was never propagated to other levels in the
tree, so in particular raw-posix never required anything else than 512.

While on disks with 4k sectors direct I/O requires a 4k alignment,
memory may still be okay when aligned to 512 byte boundaries. This is
what must have happened in practice, because otherwise this would
already have failed earlier. Therefore I don't expect regressions even
with this intermediate state. Later, raw-posix can implement the hook
and expose a different memory alignment requirement.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:01 +01:00
Kevin Wolf
355ef4ac95 block: Update BlockLimits when they might have changed
When reopening with different flags, or when backing files disappear
from the chain, the limits may change. Make sure they get updated in
these cases.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf
d34682cd4a block: Move initialisation of BlockLimits to bdrv_refresh_limits()
This function separates filling the BlockLimits from bdrv_open(), which
allows it to call it from other operations which may change the limits
(e.g. modifications to the backing file chain or bdrv_reopen)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00