Commit Graph

470 Commits

Author SHA1 Message Date
Jeff Cody
79fac5680d block: helper function, to find the base image of a chain
This is a simple helper function, that will return the base image
of a given image chain.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 18:23:44 +02:00
Jeff Cody
6ebdcee2d8 block: add support functions for live commit, to find and delete images.
Add bdrv_find_overlay(), and bdrv_drop_intermediate().

bdrv_find_overlay():  given 'bs' and the active (topmost) BDS of an image chain,
                    find the image that is the immediate top of 'bs'

bdrv_drop_intermediate():
                    Given 3 BDS (active, top, base), drop images above
                    base up to and including top, and set base to be the
                    backing file of top's overlay node.

                    E.g., this converts:

                    bottom <- base <- intermediate <- top <- active

                    to

                    bottom <- base <- active

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-28 18:22:44 +02:00
Jeff Cody
dc1c13d969 block: remove keep_read_only flag from BlockDriverState struct
The keep_read_only flag is no longer used, in favor of the bdrv
flag BDRV_O_ALLOW_RDWR.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:13 +02:00
Jeff Cody
0bce597d6e block: convert bdrv_commit() to use bdrv_reopen()
Currently, bdrv_commit() reopens images r/w itself, via risky
_delete() and _open() calls. Use the new safe method for drive reopen.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:12 +02:00
Jeff Cody
e971aa1273 block: Framework for reopening files safely
This is based on Supriya Kannery's bdrv_reopen() patch series.

This provides a transactional method to reopen multiple
images files safely.

Image files are queue for reopen via bdrv_reopen_queue(), and the
reopen occurs when bdrv_reopen_multiple() is called.  Changes are
staged in bdrv_reopen_prepare() and in the equivalent driver level
functions.  If any of the staged images fails a prepare, then all
of the images left untouched, and the staged changes for each image
abandoned.

Block drivers are passed a reopen state structure, that contains:
    * BDS to reopen
    * flags for the reopen
    * opaque pointer for any driver-specific data that needs to be
      persistent from _prepare to _commit/_abort
    * reopen queue pointer, if the driver needs to queue additional
      BDS for a reopen

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:11 +02:00
Jeff Cody
55b110f24e block: make bdrv_set_enable_write_cache() modify open_flags
bdrv_set_enable_write_cache() sets the bs->enable_write_cache flag,
but without the flag recorded in bs->open_flags, then next time
a reopen() is performed the enable_write_cache setting may be
inadvertently lost.

This will set the flag in open_flags, so it is preserved across
reopens.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:11 +02:00
Jeff Cody
be028adced block: correctly set the keep_read_only flag
I believe the bs->keep_read_only flag is supposed to reflect
the initial open state of the device. If the device is initially
opened R/O, then commit operations, or reopen operations changing
to R/W, are prohibited.

Currently, the keep_read_only flag is only accurate for the active
layer, and its backing file. Subsequent images end up always having
the keep_read_only flag set.

For instance, what happens now:

[  base  ]  kro = 1, ro = 1
    |
    v
[ snap-1 ]  kro = 1, ro = 1
    |
    v
[ snap-2 ]  kro = 0, ro = 1
    |
    v
[ active ]  kro = 0, ro = 0

What we want:

[  base  ]  kro = 0, ro = 1
    |
    v
[ snap-1 ]  kro = 0, ro = 1
    |
    v
[ snap-2 ]  kro = 0, ro = 1
    |
    v
[ active ]  kro = 0, ro = 0

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-24 15:15:11 +02:00
Dunrong Huang
fe235a06e1 block: Don't forget to delete temporary file
The caller would not delete temporary file after failed get_tmp_filename().

Signed-off-by: Dunrong Huang <riegamaths@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-12 15:50:09 +02:00
Pavel Hrdina
9ca111544c block: fix block tray status
The tray status should change also if you eject empty block device.

Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-09-12 15:50:09 +02:00
Kevin Wolf
d4c8232923 block: Flush parent to OS with cache=unsafe
Commit 29cdb251 already added a comment that no unnecessary flushes to
disk will occur, this patch makes the code even get to the point of the
comment. This is mostly theoretical because in practice we only stack
one format on top of one protocol, the former implementing flush_to_os
and the latter only flush_to_disk. It starts to matter when drivers that
are not on top implement flush_to_os.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
2012-08-15 15:14:43 +02:00
Luiz Capitulino
c75a1a8a5a qmp: query-block: add 'encryption_key_missing' field
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
2012-08-13 13:20:06 -03:00
Benoît Canet
2e3e331710 block: Use bdrv_get_backing_file_depth()
Use the dedicated counting function in qmp_query_block in order to
propagate the backing file depth to HMP and add backing_file_depth
to qmp-commands.hx

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-08-03 10:10:51 -03:00
Benoît Canet
f198fd1c9a block: create bdrv_get_backing_file_depth()
Create bdrv_get_backing_file_depth() in order to be able to show
in QMP and HMP how many ancestors backing an image a block device
have.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-08-03 10:10:38 -03:00
Blue Swirl
0ed8b6f67f Avoid returning void
It's silly and non-conforming to standards to return void,
don't do it.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-07-28 09:23:11 +00:00
Markus Armbruster
2b584959ed block: Geometry and translation hints are now useless, purge them
There are two producers of these hints: drive_init() on behalf of
-drive, and hd_geometry_guess().

The only consumer of the hint is hd_geometry_guess().

The callers of hd_geometry_guess() call it only when drive_init()
didn't set the hints.  Therefore, drive_init()'s hints are never used.

Thus, hd_geometry_guess() only ever sees hints it produced itself in a
prior call.  Only the first call computes something, subsequent calls
just repeat the first call's results.  However, hd_geometry_guess() is
never called more than once: the device models don't, and the block
device is destroyed on unplug.  Thus, dropping the repeat feature
doesn't break anything now.

If a block device wasn't destroyed on unplug and could be reused with
a new device, then repeating old results would be wrong.  Thus,
dropping the repeat feature prevents future breakage.

This renders the hints unused.  Purge them from the block layer.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-17 16:48:31 +02:00
Markus Armbruster
9db1c0f7a9 hd-geometry: Move disk geometry guessing back from block.c
Commit f3d54fc4 factored it out of hw/ide.c for reuse.  Sensible,
except it was put into block.c.  Device-specific functionality should
be kept in device code, not the block layer.  Move it to
hw/hd-geometry.c, and make stylistic changes required to keep
checkpatch.pl happy.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-17 16:48:30 +02:00
Markus Armbruster
61a8d649ff fdc: Move floppy geometry guessing back from block.c
Commit 5bbdbb46 moved it to block.c because "other geometry guessing
functions already reside in block.c".  Device-specific functionality
should be kept in device code, not the block layer.  Move it back.

Disk geometry guessing is still in block.c.  To be moved out in a
later patch series.

Bonus: the floppy type used in pc_cmos_init() now obviously matches
the one in the FDrive.  Before, we relied on
bdrv_get_floppy_geometry_hint() picking the same type both in
fd_revalidate() and in pc_cmos_init().

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-17 16:48:29 +02:00
Anthony Liguori
23797df3d9 Merge remote-tracking branch 'mjt/mjt-iov2' into staging
* mjt/mjt-iov2:
  rewrite iov_send_recv() and move it to iov.c
  cleanup qemu_co_sendv(), qemu_co_recvv() and friends
  export iov_send_recv() and use it in iov_send() and iov_recv()
  rename qemu_sendv to iov_send, change proto and move declarations to iov.h
  change qemu_iovec_to_buf() to match other to,from_buf functions
  consolidate qemu_iovec_copy() and qemu_iovec_concat() and make them consistent
  allow qemu_iovec_from_buffer() to specify offset from which to start copying
  consolidate qemu_iovec_memset{,_skip}() into single function and use existing iov_memset()
  rewrite iov_* functions
  change iov_* function prototypes to be more appropriate
  virtio-serial-bus: use correct lengths in control_out() message

Conflicts:
	tests/Makefile

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-07-09 12:35:06 -05:00
Markus Armbruster
07d27a442e block: Factor bdrv_read_unthrottled() out of guess_disk_lchs()
To prepare move of guess_disk_lchs() into hw/, where it poking
BlockDriverState member io_limits_enabled directly would be unclean.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 17:21:02 +02:00
Markus Armbruster
1f69c2b022 fdc: Drop broken code for user-defined floppy geometry
bdrv_get_floppy_geometry_hint() fails to store through its parameter
drive when bs has a geometry hint.  Makes fd_revalidate() assign
random crap to drv->drive.

Has been broken that way for ages.  Harmless, because:

* The only way to set a geometry hint is -drive if=none,cyls=...
  Since commit c219331e, probably unintentional.

* The only use of drv->drive is as argument to another
  bdrv_get_floppy_geometry_hint().  Which doesn't use it, since the
  geometry hint is still there.

Drop the broken code, ignore -drive parameter cyls, heads and secs for
floppies even with if=none, just like before commit c219331e.  Matches
-help, which explains cyls, heads, secs as "hard disk physical
geometry".

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:03 +02:00
Paolo Bonzini
4ddc07cac2 block: introduce bdrv_swap, implement bdrv_append on top of it
The new function can be made a bit nicer than bdrv_append.  It swaps the
whole contents, and then swaps back (using the usual t=a;a=b;b=t idiom)
the fields that need to stay on top.  Thus, it does not need explicit
bdrv_detach_dev, bdrv_iostatus_disable, etc.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:02 +02:00
Paolo Bonzini
a9fc4408e3 block: copy over job and dirty bitmap fields in bdrv_append
While these should not be in use at the time a transaction is started,
a command in the prepare phase of a transaction might have added them,
so they need to be brought over.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-07-09 15:53:02 +02:00
Markus Armbruster
f8d6bba1c1 block: Replace bdrv_get_format() by bdrv_get_format_name()
So callers don't need to know anything about maximum name length.
Returning a pointer is safe, because the name string lives as long as
the block driver it names, and block drivers don't die.

Requested by Peter Maydell.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Paolo Bonzini
e1e9b0aca0 block: always open drivers in writeback mode
Formats are entirely in charge of flushes for metadata writes.  For
guest-initiated writes, a writethrough cache is faked in the block layer.
So we can always open in writeback mode.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Paolo Bonzini
425b01487a block: add bdrv_set_enable_write_cache
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Paolo Bonzini
c4a248a138 block: copy enable_write_cache in bdrv_append
Because the guest will be able to flip enable_write_cache, the actual
state may not match what is used to open the new snapshot.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Paolo Bonzini
f05fa4ad03 block: flush in writethrough mode after writes
We want to make the formats handle their own flushes
autonomously, while keeping for guests the ability to use a writethrough
cache.  Since formats will write metadata via bs->file, bdrv_co_do_writev
is the only place where we need to add a flush.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Markus Armbruster
c843328783 block: New bdrv_get_flags()
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:43 +02:00
Kevin Wolf
4534ff5426 qemu-img check -r for repairing images
The QED block driver already provides the functionality to not only
detect inconsistencies in images, but also fix them. However, this
functionality cannot be manually invoked with qemu-img, but the
check happens only automatically during bdrv_open().

This adds a -r switch to qemu-img check that allows manual invocation
of an image repair.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Paolo Bonzini
188a7bbf94 stream: move is_allocated_above to block.c
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-06-15 14:03:42 +02:00
Michael Tokarev
d5e6b1619c change qemu_iovec_to_buf() to match other to,from_buf functions
It now allows specifying offset within qiov to start from and
amount of bytes to copy.  Actual implementation is just a call
to iov_to_buf().

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2012-06-11 23:12:11 +04:00
Michael Tokarev
1b093c480a consolidate qemu_iovec_copy() and qemu_iovec_concat() and make them consistent
qemu_iovec_concat() is currently a wrapper for
qemu_iovec_copy(), use the former (with extra
"0" arg) in a few places where it is used.

Change skip argument of qemu_iovec_copy() from
uint64_t to size_t, since size of qiov itself
is size_t, so there's no way to skip larger
sizes.  Rename it to soffset, to make it clear
that the offset is applied to src.

Also change the only usage of uint64_t in
hw/9pfs/virtio-9p.c, in v9fs_init_qiov_from_pdu() -
all callers of it actually uses size_t too,
not uint64_t.

One added restriction: as for all other iovec-related
functions, soffset must point inside src.

Order of argumens is already good:
 qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
                   int c, size_t bytes)
vs:
 qemu_iovec_concat(QEMUIOVector *dst,
                   QEMUIOVector *src,
                   size_t soffset, size_t sbytes)
(note soffset is after _src_ not dst, since it applies to src;
for memset it applies to qiov).

Note that in many places where this function is used,
the previous call is qemu_iovec_reset(), which means
many callers actually want copy (replacing dst content),
not concat.  So we may want to add a wrapper like
qemu_iovec_copy() with the same arguments but which
calls qemu_iovec_reset() before _concat().

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2012-06-11 23:12:11 +04:00
Michael Tokarev
03396148bc allow qemu_iovec_from_buffer() to specify offset from which to start copying
Similar to
 qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
                   int c, size_t bytes);
the new prototype is:
 qemu_iovec_from_buf(QEMUIOVector *qiov, size_t offset,
                     const void *buf, size_t bytes);

The processing starts at offset bytes within qiov.

This way, we may copy a bounce buffer directly to
a middle of qiov.

This is exactly the same function as iov_from_buf() from
iov.c, so use the existing implementation and rename it
to qemu_iovec_from_buf() to be shorter and to match the
utility function.

As with utility implementation, we now assert that the
offset is inside actual iovec.  Nothing changed for
current callers, because `offset' parameter is new.

While at it, stop using "bounce-qiov" in block/qcow2.c
and copy decrypted data directly from cluster_data
instead of recreating a temp qiov for doing that.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2012-06-11 23:12:11 +04:00
Jim Meyering
eba25057b9 block: prevent snapshot mode $TMPDIR symlink attack
In snapshot mode, bdrv_open creates an empty temporary file without
checking for mkstemp or close failure, and ignoring the possibility
of a buffer overrun given a surprisingly long $TMPDIR.
Change the get_tmp_filename function to return int (not void),
so that it can inform its two callers of those failures.
Also avoid the risk of buffer overrun and do not ignore mkstemp
or close failure.
Update both callers (in block.c and vvfat.c) to propagate
temp-file-creation failure to their callers.

get_tmp_filename creates and closes an empty file, while its
callers later open that presumed-existing file with O_CREAT.
The problem was that a malicious user could provoke mkstemp failure
and race to create a symlink with the selected temporary file name,
thus causing the qemu process (usually root owned) to open through
the symlink, overwriting an attacker-chosen file.

This addresses CVE-2012-2652.
http://bugzilla.redhat.com/CVE-2012-2652

Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-05-30 14:48:40 +08:00
Paolo Bonzini
dc5a137125 qemu-img: make "info" backing file output correct and easier to use
qemu-img info should use the same logic as qemu when printing the
backing file path, or debugging becomes quite tricky.  We can also
simplify the output in case the backing file has an absolute path
or a protocol.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
6405875cdd block: move field reset from bdrv_open_common to bdrv_close
bdrv_close should leave fields in the same state as bdrv_new.  It is
not up to bdrv_open_common to fix the mess.

Also, backing_format was not being re-initialized.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
947995c09e block: protect path_has_protocol from filenames with colons
path_has_protocol will erroneously return "true" if the colon is part
of a filename.  These names are common with stable device names produced
by udev.  We cannot fully protect against this in case the filename
does not have a path component (e.g. if the current directory is
/dev/disk/by-path), but in the common case there will be a slash before
and path_has_protocol can easily detect that and return false.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
f53f4da9c6 block: simplify path_is_absolute
On Windows, all the logic is already in is_windows_drive and
is_windows_drive_prefix.  On POSIX, there is no need to look
out for colons.

The win32 code changes the behaviour in some cases, we could have
something like "d:foo.img". The old code would treat it as relative
path, the new one as absolute. Now the path is absolute, because to
go from c:/program files/blah to d:foo.img you cannot say c:/program
files/blah/d:foo.img.  You have to say d:foo.img.  But you could also
say it's relative because (I think, at least it was like that in DOS
15 years ago) d:foo.img is relative to the current path of drive D.
Considering how path_is_absolute is used by path_combine, I think it's
better to treat it as absolute.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
fa4478d5c8 block: wait for job callback in block_job_cancel_sync
The limitation on not having I/O after cancellation cannot really be
kept.  Even streaming has a very small race window where you could
cancel a job and have it report completion.  If this window is hit,
bdrv_change_backing_file() will yield and possibly cause accesses to
dangling pointers etc.

So, let's just assume that we cannot know exactly what will happen
after the coroutine has set busy to false.  We can set a very lax
condition:

- if we cancel the job, the coroutine won't set it to false again
(and hence will not call co_sleep_ns again).

- block_job_cancel_sync will wait for the coroutine to exit, which
pretty much ensures no race.

Instead, we track the coroutine that executes the job and put very
strict conditions on what to do while it is quiescent (busy = false).
First of all, the coroutine must never set busy = false while the job
has been cancelled.  Second, the coroutine can be reentered arbitrarily
while it is quiescent, so you cannot really do anything but co_sleep_ns at
that time.  This condition is obeyed by the block_job_sleep_ns function.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
4513eafe92 block: add block_job_sleep_ns
This function abstracts the pretty complex semantics of the "busy"
member of BlockJob.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
0ac9377d04 block: fully delete bs->file when closing
We are reusing bs->file across close/open, which may not cause any
known bugs but is a recipe for trouble.  Prefer bdrv_delete, and
enjoy the new invariant in the implementation of bdrv_delete.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
a275fa42fa block: do not reuse the backing file across bdrv_close/bdrv_open
This is another bug caused by not doing a full cleanup of the BDS
across close/open.  This was found with mirroring by Shaolong Hu,
but it can probably be reproduced also with eject or change.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
3a389e7926 block: another bdrv_append fix
bdrv_append must also copy open_flags to the top, because the snapshot
has BDRV_O_NO_BACKING set.  This causes interesting results if you
later use drive-reopen (not upstream) to reopen the image, and lose
the backing file in the process.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
e023b2e244 block: fix snapshot on QED
QED's opaque data includes a pointer back to the BlockDriverState.
This breaks when bdrv_append shuffles data between bs_new and bs_top.
To avoid this, add a "rebind" function that tells the driver about
the new relationship between the BlockDriverState and its opaque.

The patch also adds rebind to VVFAT for completeness, even though
it is not used with live snapshots.

Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:12 +02:00
Paolo Bonzini
71df14fcbe block: fix allocation size for dirty bitmap
Also reuse elsewhere the new constant for sizeof(unsigned long) * 8.

The dirty bitmap is allocated in bits but declared as unsigned long.
Thus, its memory block is accessed beyond its end unless the image
is a multiple of 64 chunks (i.e. a multiple of 64 MB).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini
63090dac3a block: open backing file as read-only when probing for size
bdrv_img_create will temporarily open the backing file to probe its size.
However, this could be done with a read-write open if the wrong flags are
passed to bdrv_img_create.  Since there is really no documentation on
what flags can be passed, assume that bdrv_img_create receives the flags
with which the new image will be opened; sanitize them when opening
the backing file.

Reported-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini
469ef350e1 block: update in-memory backing file and format
These are needed to print "info block" output correctly.  QCOW2 does this
because it needs it to write the header, but QED does not, and common code
is the right place to do it.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Paolo Bonzini
5f3777945d block: push bdrv_change_backing_file error checking up from drivers
This check applies to all drivers, but QED lacks it.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Zhi Yong Wu
4c355d53c6 block: add the support to drain throttled requests
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
[ Iterate until all block devices have processed all requests,
  add comments. - Paolo ]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-10 10:32:11 +02:00
Zhi Yong Wu
5b7e1542cf block: make bdrv_create adopt coroutine
The current qemu.git introduces failure with preallocation and some
sizes:

qemu-img create -f qcow2 new.img 976563K -o preallocation=metadata
qemu-img: qemu-coroutine-lock.c:111: qemu_co_mutex_unlock: Assertion
`mutex->locked == 1' failed.

And lock needs to work in coroutine context. So to fix this issue, we
need to make bdrv_create adopt coroutine at first.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-05-07 19:33:18 +02:00
Stefan Hajnoczi
c83c66c3b5 block: add 'speed' optional parameter to block-stream
Allow streaming operations to be started with an initial speed limit.
This eliminates the window of time between starting streaming and
issuing block-job-set-speed.  Users should use the new optional 'speed'
parameter instead so that speed limits are in effect immediately when
the job starts.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Stefan Hajnoczi
882ec7ce53 block: change block-job-set-speed argument from 'value' to 'speed'
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Stefan Hajnoczi
9e6636c72d block: use Error mechanism instead of -errno for block_job_set_speed()
There are at least two different errors that can occur in
block_job_set_speed(): the job might not support setting speeds or the
value might be invalid.

Use the Error mechanism to report the error where it occurs.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Stefan Hajnoczi
fd7f8c6537 block: use Error mechanism instead of -errno for block_job_create()
The block job API uses -errno return values internally and we convert
these to Error in the QMP functions.  This is ugly because the Error
should be created at the point where we still have all the relevant
information.  More importantly, it is hard to add new error cases to
this case since we quickly run out of -errno values without losing
information.

Go ahead and use Error directly and don't convert later.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2012-04-27 11:44:50 -03:00
Kevin Wolf
621f058940 qcow2: Zero write support
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 15:57:30 +02:00
Liu Yuan
80ccf93b88 qemu-img: let 'qemu-img convert' flush data
The 'qemu-img convert -h' advertise that the default cache mode is
'writeback', while in fact it is 'unsafe'.

This patch 1) fix the help manual and 2) let bdrv_close() call bdrv_flush()

2) is needed because some backend storage doesn't have a self-flush
mechanism(for e.g., sheepdog), so we need to call bdrv_flush() to make
sure the image is really writen to the storage instead of hanging around
writeback cache forever.

Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-20 11:42:41 +02:00
Kevin Wolf
7094f12f86 block: Drain requests in bdrv_close
If an AIO request is in flight that refers to a BlockDriverState that
has been closed and possibly even freed, more or less anything could
happen. I have seen segfaults, -EBADF return values and qcow2 sometimes
actually catches the situation in bdrv_close() and abort()s.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2012-04-19 15:48:52 +02:00
Benoît Canet
077892696b block: add a function to clear incoming live migration flags
This function will clear all BDRV_O_INCOMING flags.

Signed-off-by: Benoit Canet <benoit.canet@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 16:27:56 +02:00
Jeff Cody
f6801b83d0 block: bdrv_append() fixes
A few fixups for bdrv_append():

The new bs (bs_new) passed into bdrv_append() should be anonymous.  Rather
than call bdrv_make_anon() to enforce this, use an assert to catch when a caller
is passing in a bs_new that is not anonymous.

Also, the new top layer should have its backing_format reflect the original
top's format.

And last, after the swap of bs contents, the device_name will have been copied
down. This needs to be cleared to reflect the anonymity of the bs that was
pushed down.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:41 +02:00
Paolo Bonzini
9f25eccc1c block: set job->speed in block_set_speed
There is no need to do this in every implementation of set_speed
(even though there is only one right now).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini
3e914655f2 block: fix streaming/closing race
Streaming can issue I/O while qcow2_close is running.  This causes the
L2 caches to become very confused or, alternatively, could cause a
segfault when the streaming coroutine is reentered after closing its
block device.  The fix is to cancel streaming jobs when closing their
underlying device.

The cancellation must be synchronous, on the other hand qemu_aio_wait
will not restart a coroutine that is sleeping in co_sleep.  So add
a flag saying whether streaming has in-flight I/O.  If the busy flag
is false, the coroutine is quiescent and, when cancelled, will not
issue any new I/O.

This protects streaming against closing, but not against deleting.
We have a reference count protecting us against concurrent deletion,
but I still added an assertion to ensure nothing bad happens.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Zhi Yong Wu
498e386c58 block: disable I/O throttling on sync api
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:40 +02:00
Paolo Bonzini
29cdb2513c block: push recursive flushing up from drivers
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-04-05 14:54:39 +02:00
Stefan Hajnoczi
e88774971c block: handle -EBUSY in bdrv_commit_all()
Monitor operations that manipulate image files must not execute while a
background job (like image streaming) is in progress.  This prevents
corruptions from happening when two pieces of code are manipulating the
image file without knowledge of each other.

The monitor "commit" command raises QERR_DEVICE_IN_USE when
bdrv_commit() returns -EBUSY but "commit all" has no error handling.
This is easy to fix, although note that we do not deliver a detailed
error about which device was busy in the "commit all" case.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-03-12 15:14:06 +01:00
Jeff Cody
8802d1fdd4 qapi: Introduce blockdev-group-snapshot-sync command
This is a QAPI/QMP only command to take a snapshot of a group of
devices. This is similar to the blockdev-snapshot-sync command, except
blockdev-group-snapshot-sync accepts a list devices, filenames, and
formats.

It is attempted to keep the snapshot of the group atomic; if the
creation or open of any of the new snapshots fails, then all of
the new snapshots are abandoned, and the name of the snapshot image
that failed is returned.  The failure case should not interrupt
any operations.

Rather than use bdrv_close() along with a subsequent bdrv_open() to
perform the pivot, the original image is never closed and the new
image is placed 'in front' of the original image via manipulation
of the BlockDriverState fields.  Thus, once the new snapshot image
has been successfully created, there are no more failure points
before pivoting to the new snapshot.

This allows the group of disks to remain consistent with each other,
even across snapshot failures.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Acked-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-29 15:48:33 +01:00
Paolo Bonzini
b6a127a156 block: drop aio_multiwrite in BlockDriver
These were never used.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-29 12:48:47 +01:00
Hervé Poussineau
f8d3d12857 block: add a transfer rate for floppy types
Floppies must be read at a specific transfer rate, depending of its own format.
Update floppy description table to include required transfer rate.

Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-29 12:48:46 +01:00
Luiz Capitulino
6f382ed226 qmp: add DEVICE_TRAY_MOVED event
It's emitted whenever the tray is moved by the guest or by HMP/QMP
commands.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2012-02-22 17:23:50 -02:00
Luiz Capitulino
f36f394952 block: bdrv_eject(): Make eject_flag a real bool
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2012-02-22 17:23:05 -02:00
Luiz Capitulino
329c0a48a9 block: Rename bdrv_mon_event() & BlockMonEventAction
They are QMP events, not monitor events. Rename them accordingly.

Also, move bdrv_emit_qmp_error_event() up in the file. A new event will
be added soon and it's good to have them next each other.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
2012-02-22 17:22:35 -02:00
Stefan Hajnoczi
79c053bde9 block: perform zero-detection during copy-on-read
Copy-on-Read populates the image file with data read from a backing
image.  In order to avoid bloating the image file when all zeroes are
read we should scan the buffer and perform an optimized zero write
operation.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:50 +01:00
Stefan Hajnoczi
f08f2ddae0 block: add .bdrv_co_write_zeroes() interface
The ability to zero regions of an image file is a useful primitive for
higher-level features such as image streaming or zero write detection.

Image formats may support an optimized metadata representation instead
of writing zeroes into the image file.  This allows zero writes to be
potentially faster than regular write operations and also preserve
sparseness of the image file.

The .bdrv_co_write_zeroes() interface should be implemented by block
drivers that wish to provide efficient zeroing.

Note that this operation is different from the discard operation, which
may leave the contents of the region indeterminate.  That means
discarded blocks are not guaranteed to contain zeroes and may contain
junk data instead.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-02-09 16:17:50 +01:00
Marcelo Tosatti
e8a6bb9caa block: add bdrv_find_backing_image
Add bdrv_find_backing_image: given a BlockDriverState pointer, and an id,
traverse the backing image chain to locate the id.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 14:49:18 +01:00
Stefan Hajnoczi
eeec61f291 block: add BlockJob interface for long-running operations
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Stefan Hajnoczi
470c05047a block: make copy-on-read a per-request flag
Previously copy-on-read could only be enabled for all requests to a
block device.  This means requests coming from the guest as well as
QEMU's internal requests would perform copy-on-read when enabled.

For image streaming we want to support finer-grained behavior than just
populating the image file from its backing image.  Image streaming
supports partial streaming where a common backing image is preserved.
In this case guest requests should not perform copy-on-read because they
would indiscriminately copy data which should be left in a backing image
from the backing chain.

Introduce a per-request flag for copy-on-read so that a block device can
process both regular and copy-on-read requests.  Overlapping reads and
writes still need to be serialized for correctness when copy-on-read is
happening, so add an in-flight reference count to track this.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Stefan Hajnoczi
2d3735d3bf block: check bdrv_in_use() before blockdev operations
Long-running block operations like block migration and image streaming
must have continual access to their block device.  It is not safe to
perform operations like hotplug, eject, change, resize, commit, or
external snapshot while a long-running operation is in progress.

This patch adds the missing bdrv_in_use() checks so that block migration
and image streaming never have the rug pulled out from underneath them.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2012-01-26 11:45:26 +01:00
Paolo Bonzini
3f3aace830 block: avoid useless checks on acb->bh
Coverity is confused by this "if" and reports leaks on acb->bh.
The bottom half is always deleted before releasing the AIOCB,
in either bdrv_aio_cancel_em or bdrv_aio_bh_cb.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:08 +01:00
Paolo Bonzini
df9309fb43 block: simplify failure handling for bdrv_aio_multiwrite
Now that early failure of bdrv_aio_writev is not possible anymore,
mcb->num_requests can be set before the loop starts.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:07 +01:00
Paolo Bonzini
ad54ae80c7 block: bdrv_aio_* do not return NULL
Initially done with the following semantic patch:

@ rule1 @
expression E;
statement S;
@@
  E =
(
   bdrv_aio_readv
|  bdrv_aio_writev
|  bdrv_aio_flush
|  bdrv_aio_discard
|  bdrv_aio_ioctl
)
     (...);
(
- if (E == NULL) { ... }
|
- if (E)
    { <... S ...> }
)

which however missed the occurrence in block/blkverify.c
(as it should have done), and left behind some unused
variables.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-15 12:40:07 +01:00
Stefan Hajnoczi
922453bca6 block: convert qemu_aio_flush() calls to bdrv_drain_all()
Many places in QEMU call qemu_aio_flush() to complete all pending
asynchronous I/O.  Most of these places actually want to drain all block
requests but there is no block layer API to do so.

This patch introduces the bdrv_drain_all() API to wait for requests
across all BlockDriverStates to complete.  As a bonus we perform checks
after qemu_aio_wait() to ensure that requests really have finished.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:56:06 +01:00
Stefan Hajnoczi
5f8b6491f2 block: wait_for_overlapping_requests() deadlock detection
Debugging a reentrant request deadlock was fun but in the future we need
a quick and obvious way of detecting such bugs.  Add an assert that
checks we are not about to deadlock when waiting for another request.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:52:34 +01:00
Stefan Hajnoczi
bd9533e36e block: implement bdrv_co_is_allocated() boundary cases
Cases beyond the end of the disk image are only implemented for block
drivers that do not provide .bdrv_co_is_allocated().  It's worth making
these cases generic so that block drivers that do implement
.bdrv_co_is_allocated() also get them for free.

Suggested-by: Mark Wu <wudxw@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:39 +01:00
Stefan Hajnoczi
ab1859218a block: core copy-on-read logic
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
d83947ac6d block: request overlap detection
Detect overlapping requests and remember to align to cluster boundaries
if the image format uses them.  This assumes that allocating I/O is
performed in cluster granularity - which is true for qcow2, qed, etc.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
f4658285f9 block: wait for overlapping requests
When copy-on-read is enabled it is necessary to wait for overlapping
requests before issuing new requests.  This prevents races between the
copy-on-read and a write request.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
53fec9d3fd block: add interface to toggle copy-on-read
The bdrv_enable_copy_on_read()/bdrv_disable_copy_on_read() functions can
be used to programmatically enable or disable copy-on-read for a block
device.  Later patches add the actual copy-on-read logic.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
dbffbdcfff block: add request tracking
The block layer does not know about pending requests.  This information
is necessary for copy-on-read since overlapping requests must be
serialized to prevent races that corrupt the image.

The BlockDriverState gets a new tracked_request list field which
contains all pending requests.  Each request is a BdrvTrackedRequest
record with sector_num, nb_sectors, and is_write fields.

Note that request tracking is always enabled but hopefully this extra
work is so small that it doesn't justify adding an enable/disable flag.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:38 +01:00
Stefan Hajnoczi
060f51c9de block: add bdrv_co_is_allocated() interface
This patch introduces the public bdrv_co_is_allocated() interface which
can be used to query image allocation status while the VM is running.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:37 +01:00
Stefan Hajnoczi
6aebab140d block: drop .bdrv_is_allocated() interface
Now that all block drivers have been converted to
.bdrv_co_is_allocated() we can drop .bdrv_is_allocated().

Note that the public bdrv_is_allocated() interface is still available
but is in fact a synchronous wrapper around .bdrv_co_is_allocated().

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:37 +01:00
Stefan Hajnoczi
376ae3f1cb block: add .bdrv_co_is_allocated()
This patch adds the .bdrv_co_is_allocated() interface which is identical
to .bdrv_is_allocated() but runs in coroutine context.  Running in
coroutine context implies that other coroutines might be performing I/O
at the same time.   Therefore it must be safe to run while the following
BlockDriver functions are in-flight:

    .bdrv_co_readv()
    .bdrv_co_writev()
    .bdrv_co_flush()
    .bdrv_co_is_allocated()

The new .bdrv_co_is_allocated() interface is useful because it can be
used when a VM is running, whereas .bdrv_is_allocated() is a synchronous
interface that does not cope with parallel requests.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:36 +01:00
Stefan Hajnoczi
05c4af54c6 block: use public bdrv_is_allocated() interface
There is no need for bdrv_commit() to use the BlockDriver
.bdrv_is_allocated() interface directly.  Converting to the public
interface gives us the freedom to drop .bdrv_is_allocated() entirely in
favor of a new .bdrv_co_is_allocated() in the future.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:36 +01:00
Zhi Yong Wu
727f005e6a hmp/qmp: add block_set_io_throttle
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:35 +01:00
Zhi Yong Wu
98f90dba5e block: add I/O throttling algorithm
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:35 +01:00
Zhi Yong Wu
0563e19151 block: add the blockio limits command line support
Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-12-05 14:51:35 +01:00
Anthony Liguori
0f15423c32 block: allow migration to work with image files (v3)
Image files have two types of data: immutable data that describes things like
image size, backing files, etc. and mutable data that includes offset and
reference count tables.

Today, image formats aggressively cache mutable data to improve performance.  In
some cases, this happens before a guest even starts.  When dealing with live
migration, since a file is open on two machines, the caching of meta data can
lead to data corruption.

This patch addresses this by introducing a mechanism to invalidate any cached
mutable data a block driver may have which is then used by the live migration
code.

NB, this still requires coherent shared storage.  Addressing migration without
coherent shared storage (i.e. NFS) requires additional work.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2011-11-21 14:58:48 -06:00
Kevin Wolf
ca716364f0 block: Make cache=unsafe flush to the OS
cache=unsafe completely ignored bdrv_flush, because flushing the host disk
costs a lot of performance. However, this means that qcow2 images (and
potentially any other format) can lose data even after the guest has issued a
flush if the qemu process crashes/is killed. In case of a host crash, data loss
is certainly expected with cache=unsafe, but if just the qemu process dies this
is a bit too unsafe.

Now that we have two separate flush functions, we can choose to flush
everythign to the OS, but don't enforce that it's physically written to the
disk.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:59 +01:00
Kevin Wolf
eb489bb1ec block: Introduce bdrv_co_flush_to_os
qcow2 has a writeback metadata cache, so flushing a qcow2 image actually
consists of writing back that cache to the protocol and only then flushes the
protocol in order to get everything stable on disk.

This introduces a separate bdrv_co_flush_to_os to reflect the split.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:59 +01:00
Kevin Wolf
c68b89acd6 block: Rename bdrv_co_flush to bdrv_co_flush_to_disk
There are two different types of flush that you can do: Flushing one level up
to the OS (i.e. writing data to the host page cache) or flushing it all the way
down to the disk. The existing functions flush to the disk, reflect this in the
function name.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:59 +01:00
Paolo Bonzini
025ccaa7f9 block: add eject request callback
Recent versions of udev always keep the tray locked so that the kernel
can observe "eject request" events (aka tray button presses) even on
discs that aren't mounted.  Add support for these events in the ATAPI
and SCSI cd drive device models.

To let management cope with the behavior of udev, an event should also
be added for "tray opened/closed".  This way, after issuing an "eject"
command, management can poll until the guests actually reacts to the
command.  They can then issue the "change" command after the tray has been
opened, or try with "eject -f" after a (configurable?) timeout.  However,
with this patch and the corresponding support in the device models,
at least it is possible to do a manual two-step eject+change sequence.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2011-11-11 14:02:57 +01:00
Anthony Liguori
8494a397b6 Merge remote-tracking branch 'kwolf/for-anthony' into staging
Conflicts:
	block/vmdk.c
2011-10-31 11:09:00 -05:00