In 1.7.1 qcow2_create2 reopen the file for flushing without the BDRV_O_NO_BACKING
flags.
As a consequence the code would recursively open the whole backing chain.
These three stack arrays would pile up through the recursion and lead to a coroutine
stack overflow.
Convert these array to malloced buffers in order to streamline the coroutine
footprint.
Symptoms where freezes or segfaults on production machines while taking QMP externals
snapshots. The overflow disturbed coroutine switching.
[Resolved conflicts on qemu.git/master since the patch was against v1.7.1
--Stefan]
Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Since commit f298d071, block devices added with blockdev-add don't have
a QemuOpts around in dinfo->opts. Consequently, we can't rely any more
on QemuOpts catching duplicate IDs for block devices.
This patch adds a new check for duplicate IDs to bdrv_new(), and moves
the existing check that the ID isn't already taken for a node-name there
as well.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
This patch adds an errp parameter to bdrv_new() and updates all its
callers. The next patches will make use of this in order to check for
duplicate IDs. Most of the callers know that their ID is fine, so they
can simply assert that there is no error.
Behaviour doesn't change with this patch yet as bdrv_new() doesn't
actually assign errors to errp.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
bdrv_getlength could fail, check the return value before using it.
Return NULL and set errno if it fails. Callers are updated to handle
the error case.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Insanely large requests could cause an integer overflow in
bdrv_rw_co() while converting sectors to bytes. This patch catches the
problem and returns an error (if we hadn't overflown the integer here,
bdrv_check_byte_request() would have rejected the request, so we're not
breaking anything that was supposed to work before).
We actually do have a test case that triggers behaviour where we
accidentally let such a request pass, so that it would return success,
but read 0 bytes instead of the requested 4 GB. It fails now like it
should.
If the vdi block driver wants to be able to deal with huge images, it
can't read the whole block bitmap at once into memory like it does
today, but needs to use a metadata cache like qcow2 does.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 8f4754ed intended to protect against integer overflow bugs in
block drivers by making sure that a single request that is passed to
drivers is no longer than INT_MAX bytes.
However, meanwhile there are some callers that don't use that code path
any more but call bdrv_check_byte_request() directy, so let's add a
check there as well.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Since commit 9fd3171a, BDRV_O_SNAPSHOT uses an option QDict to specify
the originally requested image as the backing file of the newly created
temporary snapshot. This means that the filename is stored in
"file.filename", which is an option that is not parsed for protocol
names. Therefore things like -drive file=nbd:localhost:10809 were
broken because it looked for a local file with the literal name
'nbd:localhost:10809'.
This patch changes the way BDRV_O_SNAPSHOT works once again. We now open
the originally requested image as normal, and then do a similar
operation as for live snapshots to put the temporary snapshot on top.
This way, both driver specific options and parsed filenames work.
As a nice side effect, this results in code movement to factor
bdrv_append_temp_snapshot() out. This is a good preparation for moving
its call to drive_init() and friends eventually.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
When using the QDict option 'filename', it is supposed to be interpreted
literally. The code did correctly avoid guessing the protocol from any
string before the first colon, but it still called bdrv_parse_filename()
which would, for example, incorrectly remove a 'file:' prefix in the
raw-posix driver.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Limiting the size of a single request to INT_MAX not only fixes a
direct integer overflow in bdrv_check_request() (which would only
trigger bad behaviour with ridiculously huge images, as in close to
2^64 bytes), but can also prevent overflows in all block drivers.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If it returns an error, the migrated VM will not be started, but qemu
exits with an error message.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Opening an encrypted image takes an additional step: setting the key.
Between open and the key set, the image must not be used.
We have some protection against accidental use in place: you can't
unpause a guest while we're missing keys. You can, however, hot-plug
block devices lacking keys into a running guest just fine, or insert
media lacking keys. In the latter case, notifying the guest of the
insert is delayed until the key is set, which may suffice to protect
at least some guests in common usage.
This patch makes the protection apply in more cases, in a rather
heavy-handed way: it doesn't let you open encrypted images unless
we're in a paused state.
It doesn't extend the protection to users other than the guest (block
jobs?). Use of runstate_check() from block.c is disgusting. Best I
can do right now.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If the image file cannot be opened and was created as a temporary file,
it should be deleted; thus, in this case, we should jump to the
"unlink_and_fail" label and not just to "fail".
Reported-by: Benoît Canet <benoit@irqsave.net>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch keep the recursive way of doing things but simplify it by giving
two responsabilities to all block filters implementors.
They will need to do two things:
-Set the is_filter field of their block driver to true.
-Implement the bdrv_recurse_is_first_non_filter method of their block driver like
it is done on the Quorum block driver. (block/quorum.c)
[Paolo Bonzini <pbonzini@redhat.com> pointed out that this patch changes
the semantics of blkverify, which now recurses down both bs->file and
s->test_file.
-- Stefan]
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Currently, bdrv_debug_resume() requires every bs->drv in the BDS stack
to be NULL until a bs->drv with an implementation of bdrv_debug_resume()
is found. For a normal function, this would be fine, but this is a
function for debugging purposes and should therefore allow intermediate
BDS not to have a driver (i.e., be "ejected"). Otherwise, it is hard to
debug such situations.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
After migration has completed, we call bdrv_invalidate_cache() so that
drivers which cache some data drop their stale copy of the data and
reread it from the image file to get a new version of data that the
source modified while the migration was running.
Reloading metadata from the image file is useless, though, if the size
of the image file stays stale (this is a value that is cached for all
image formats in block.c). Reads from (meta)data after the old EOF
return only zeroes, causing image corruption.
We need to update bs->total_sectors in all layers that could potentially
have changed their size (i.e. backing files are not a concern - if they
are changed, we're in bigger trouble)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Using an invalid option for a block device that is opened with
BDRV_O_PROTOCOL led to drv = NULL, and when trying to include the driver
name in the error message, qemu dereferenced it:
$ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,file.foo=bar
Segmentation fault (core dumped)
With this patch applied, the expected error message is printed:
$ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,file.foo=bar
qemu-system-x86_64: -drive file=/tmp/test.qcow2,file.foo=bar: could
not open disk image /tmp/test.qcow2: Block protocol 'file' doesn't
support the option 'foo'
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Currently, bdrv_file_open() always removes the "filename" option from
the options QDict after bdrv_parse_filename() has been (successfully)
called. However, for drivers with bdrv_needs_filename, it makes more
sense for bdrv_parse_filename() to overwrite the "filename" option and
for bdrv_file_open() to fetch the filename from there.
Since there currently are no drivers that implement
bdrv_parse_filename() and have bdrv_needs_filename set, this does not
change current behavior.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Moving only the node_name one field could lead to some inconsitencies where a
node_name was defined on a bs which was not registered in the graph node list.
bdrv_swap between a named node bs and a non named node bs would lead to this.
bdrv_make_anon would then crash because it would try to remove the bs from the
graph node list while it is not in it.
This patch remove named node bses from the graph node list before doing the swap
then insert them back.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
For sg backends, bs->request_alignment is meaningless and may be 0.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
If TMPDIR is not specified, the default was to use /tmp for the working
copy of the block devices. Update this to /var/tmp instead, so systems
using tmp-on-tmpfs don't end up inadvertently using RAM for the block
device.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This option is now unnecessary since specifying BDRV_O_PROTOCOL as flag
will do exactly the same.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The fail and success paths of bdrv_file_open() may be further shortened
by reusing code already existent in bdrv_open(). This includes
bdrv_file_open() not taking the reference to options which allows the
removal of QDECREF(options) in that function.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The fail paths of bdrv_file_open() and bdrv_open() naturally exhibit
similarities, thus it is possible to reuse the one from bdrv_open() and
shorten the one in bdrv_file_open() accordingly.
Also, setting bs->options in bdrv_file_open() is not necessary if it is
already done in bdrv_open().
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Change bdrv_file_open() to take a simple pointer to an already existing
BDS instead of an indirect one. The BDS will be created in bdrv_open()
if necessary.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Remove the reference parameter and the related handling code from
bdrv_file_open(), since it exists in bdrv_open() now as well.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add the bdrv_open() option BDRV_O_PROTOCOL which results in passing the
call to bdrv_file_open(). Additionally, make bdrv_file_open() static and
therefore bdrv_open() the only way to call it.
Consequently, all existing calls to bdrv_file_open() have to be adjusted
to use bdrv_open() with the BDRV_O_PROTOCOL flag instead.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Allow bdrv_open() to handle references to existing block devices just as
bdrv_file_open() is already capable of.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Make bdrv_open() take a pointer to a BDS pointer, similarly to
bdrv_file_open(). If a pointer to a NULL pointer is given, bdrv_open()
will create a new BDS with an empty name; if the BDS pointer is not
NULL, that existing BDS will be reused (in the same way as bdrv_open()
already did).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Consider top level BlockDriverStates as well.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Tested-by: Benoit Canet <benoit@irqsave.net>
error_is_set(&var) is the same as var != NULL, but it takes
whole-program analysis to figure that out. Unnecessarily hard for
optimizers, static checkers, and human readers. Dumb it down to
obvious.
Gets rid of several dozen Coverity false positives.
Note that the obvious form is already used in many places.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Since we introduced node_name for named bs of the graph modify the opening by
reference to use it as a fallback.
This patch also enforce the separation of the device id and graph node
namespaces.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The following patch will reuse bdrv_lookup_bs in order to open images by
references so the rules of usage of bdrv_lookup_bs must be relaxed a bit.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
On 32 bit hosts, size_t is too small for align as the bitmask
~(align - 1) will zero out the higher 32 bits of the offset.
While at it, change the local overlap_bytes variable to unsigned to
match the field in BdrvTrackedRequest.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
The behaviour of the ROUND_UP macro with negative numbers isn't obvious.
It happens to do the right thing in this please, but better avoid it.
Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
This adds assertions that the request that we actually end up passing to
the block driver (which includes RMW data and has therefore potentially
been rounded to alignment boundaries) is fully covered by the
overlap_{offset,size} fields of the associated BdrvTrackedRequest.
Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
The error path for a failure in one of the two bdrv_aligned_preadv()
calls leaked head_buf or tail_buf, respectively. This fixes the memory
leak.
Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
This fixes a regression introduced in commit 2a05cbe42 ('block: Allow
block devices without files'):
$ qemu-system-x86_64 -drive driver=file
qemu-system-x86_64: block.c:892: bdrv_open_common: Assertion
`!drv->bdrv_needs_filename || filename != ((void *)0)' failed.
Now the respective check must be performed not only in bdrv_file_open(),
but also in bdrv_open().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Request sizes used to be rounded down to the next sector boundary,
allowing to bypass the I/O limit. Now all requests are accounted for
with their exact byte size.
Reported-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_pwritev().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_preadv().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
If a request calls wait_serialising_requests() and actually has to wait
in this function (i.e. a coroutine yield), other requests can run and
previously read data (like the head or tail buffer) could become
outdated. In this case, we would have to restart from the beginning to
read in the updated data.
However, we're lucky and don't actually need to do that: A request can
only wait in the first call of wait_serialising_requests() because we
mark it as serialising before that call, so any later requests would
wait. So as we don't wait in practice, we don't have to reload the data.
This is an important assumption that may not be broken or data
corruption will happen. Document it with some assertions.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
This patch changes bdrv_co_do_pwritev() to actually be what its name
promises. If requests aren't properly aligned, it performs a RMW.
Requests touching the same block are serialised against the RMW request.
Further optimisation of this is possible by differentiating types of
requests (concurrent reads should actually be okay here).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
We can only have a single wait_serialising_requests() call per request
because otherwise we can run into deadlocks where requests are waiting
for each other. The same is true when wait_serialising_requests() is not
at the very beginning of a request, so that other requests can be issued
between the start of the tracking and wait_serialising_requests().
Fix this by changing wait_serialising_requests() to ignore requests that
are already (directly or indirectly) waiting for the calling request.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>