Commit Graph

164 Commits

Author SHA1 Message Date
Kevin Wolf
cb6d3ca07b block: Fix multiwrite error handling
When two requests of the same multiwrite batch fail, the callback of all
requests in that batch were called twice. This could have any kind of nasty
effects, in my case it lead to use after free and eventually a segfault.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2010-04-10 00:14:23 +02:00
Shahar Havivi
fd04a2aeda Wrong error message in block_passwd command
Signed-off-by: Shahar Havivi <shaharh@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-03-17 10:41:38 -05:00
Naphtali Sprei
4dca4b639c block: more read-only changes, related to backing files
Open backing file read-only where possible
Upgrade backing file to read-write during commit, back to read-only after commit
  If upgrade fail, back to read-only. If also fail, "disconnect" the drive.

Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-19 15:32:15 -06:00
Luiz Capitulino
ba14414174 Monitor: remove unneeded checks
It's not needed to check the return of qobject_from_jsonf()
anymore, as an assert() has been added there.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 13:46:17 -06:00
Christoph Hellwig
15dc2697a5 block: saner flags filtering in bdrv_open2
Clean up the current mess about figuring out which flags to pass to the
driver.  BDRV_O_FILE, BDRV_O_SNAPSHOT and BDRV_O_NO_BACKING are flags
only used by the block layer internally so filter them out directly.
Previously BDRV_O_NO_BACKING could accidentally be passed to the drivers,
but wasn't ever used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 11:57:03 -06:00
Luiz Capitulino
2582bfedd2 block: BLOCK_IO_ERROR QMP event
This commit introduces the bdrv_mon_event() function, which
should be called by block subsystems (eg. IDE) when a I/O
error occurs, so that an QMP event is emitted.

The following information is currently provided in the event:

- device name
- operation (ie. "read" or "write")
- action taken (eg. "stop")

Event example:

{ "event": "BLOCK_IO_ERROR",
    "data": { "device": "ide0-hd1",
              "operation": "write",
              "action": "stop" },
    "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-10 11:57:03 -06:00
Liran Schour
aaa0eb75e2 Count dirty blocks and expose an API to get dirty count
This will manage dirty counter for each device and will allow to get the
dirty counter from above.

Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-02-09 16:56:14 -06:00
Christoph Hellwig
e2a305fb13 block: avoid creating too large iovecs in multiwrite_merge
If we go over the maximum number of iovecs support by syscall we get
back EINVAL from the kernel which translate to I/O errors for the guest.

Add a MAX_IOV defintion for platforms that don't have it.  For now we use
the same 1024 define that's used on Linux and various other platforms,
but until the windows block backend implements some kind of vectored I/O
it doesn't matter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 17:08:03 -06:00
Herve Poussineau
f8a83245d9 win32: pair qemu_memalign() with qemu_vfree()
Win32 suffers from a very big memory leak when dealing with SCSI devices.
Each read/write request allocates memory with qemu_memalign (ie
VirtualAlloc) but frees it with qemu_free (ie free).
Pair all qemu_memalign() calls with qemu_vfree() to prevent such leaks.

Signed-off-by: Herve Poussineau <hpoussin@reactos.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 16:41:06 -06:00
Christoph Hellwig
6987307ca3 block: clean up bdrv_open2 structure a bit
Check the whitelist as early as possible instead of continuing the
setup, and move all the error handling code to the end of the
function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 15:42:02 -06:00
Naphtali Sprei
37226ad946 No need anymoe for bdrv_set_read_only
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 15:42:01 -06:00
Kevin Wolf
9a8c4cceaf block: Return original error codes in bdrv_pread/write
Don't assume -EIO but return the real error.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-26 14:59:19 -06:00
Anthony Liguori
3e39789b64 Revert "block: prevent multiwrite_merge from creating too large iovecs"
This reverts commit 0076bc0c1d.

Kevin Wolf pointed out that this breaks the mingw32 build.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 10:12:23 -06:00
Christoph Hellwig
0076bc0c1d block: prevent multiwrite_merge from creating too large iovecs
If we go over the maximum number of iovecs support by syscall we get
back EINVAL from the kernel which translate to I/O errors for the guest.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 08:51:40 -06:00
Christoph Hellwig
1d44952fc7 block: fix cache flushing in bdrv_commit
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 08:51:11 -06:00
Naphtali Sprei
03cbdac7ef Disable fall-back to read-only when cannot open drive's file for read-write
Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 08:25:22 -06:00
Naphtali Sprei
f5edb014ed Clean-up a little bit the RW related bits of BDRV_O_FLAGS. BDRV_O_RDONLY gone (and so is BDRV_O_ACCESS). Default value for bdrv_flags (0/zero) is READ-ONLY. Need to explicitly request READ-WRITE.
Instead of using the field 'readonly' of the BlockDriverState struct for passing the request,
pass the request in the flags parameter to the function.

Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-20 08:25:22 -06:00
Christoph Hellwig
3f5075ae63 block: flush backing_hd in the right place
The backing device is only modified from bdrv_commit.  So instead of
flushing it every time bdrv_flush is called for the front-end device
only flush it after we're written data to it in bdrv_commit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-13 17:14:15 -06:00
Kevin Wolf
756e6736a1 block: Add bdrv_change_backing_file
Introduce the functions needed to change the backing file of an image. The
function is implemented for qcow2.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-13 17:14:15 -06:00
Kevin Wolf
b783e409bf block: Introduce BDRV_O_NO_BACKING
If an image references a backing file that doesn't exist, qemu-img info fails
to open this image. Exactly in this case the info would be valuable, though:
the user might want to find out which file is missing.

This patch introduces a BDRV_O_NO_BACKING flag to ignore the backing file when
opening the image. qemu-img info is the first user and provides info now even
if the backing file is invalid.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2010-01-13 17:14:15 -06:00
Kirill A. Shutemov
114cdfa908 block.c: fix warning with _FORTIFY_SOURCE
CC    block.o
cc1: warnings being treated as errors
block.c: In function 'bdrv_open2':
block.c:400: error: ignoring return value of 'realpath', declared with attribute warn_unused_result

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-12-25 18:19:22 +00:00
Luiz Capitulino
218a536a7a block: Convert bdrv_info_stats() to QObject
Each device statistic information is stored in a QDict and
the returned QObject is a QList of all devices.

This commit should not change user output.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-12 07:59:49 -06:00
Luiz Capitulino
d15e546567 block: Convert bdrv_info() to QObject
Each block device information is stored in a QDict and the
returned QObject is a QList of all devices.

This commit should not change user output.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-12 07:59:49 -06:00
Jan Kiszka
c6d2283068 block migration: Cleanup dirty tracking code
This switches the dirty bitmap to a true bitmap, reducing its footprint
(specifically in caches). It moreover fixes off-by-one bugs in
set_dirty_bitmap (nb_sectors+1 were marked) and bdrv_get_dirty (limit
check allowed one sector behind end of drive). And is drops redundant
dirty_tracking field from BlockDriverState.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 10:48:52 -06:00
Jan Kiszka
6ea44308b0 block migration: Rework constants API
Instead of duplicating the definition of constants or introducing
trivial retrieval functions move the SECTOR constants into the public
block API. This also obsoletes sector_per_block in BlkMigState.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 10:48:52 -06:00
Jan Kiszka
a55eb92c22 block migration: Fix coding style and whitespaces
No functional changes.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-12-03 10:48:52 -06:00
lirans@il.ibm.com
7cd1e32a86 Expose a mechanism to trace block writes
To support live migration without shared storage we need to be able to trace
writes to disk while migrating. This Patch expose dirty block tracking per
device to be polled from upper layer.

Changes from v4:
- Register dirty tracking for each block device.
- Minor coding style issues.
- Block.c will now manage a dirty bitmap per device once
  bdrv_set_dirty_tracking() is called. Bitmap is polled by the upper
  layer (block-migration.c).

Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-11-17 08:03:31 -06:00
Markus Armbruster
eb852011ab Configurable block format whitelist
We have code for a quite a few block formats.  While I trust that all
of these formats are useful at least for some people in some
circumstances, some of them are of a kind that friends don't let
friends use in production.

This patch provides an optional block format whitelist, default off.
If a whitelist is configured with --block-drv-whitelist, QEMU proper
can use only whitelisted formats.  Other programs, like qemu-img, are
not affected.

Drivers for formats off the whitelist still participate in format
probing, to ensure all programs probe exactly the same.  Without that,
QEMU proper would be prone to treat images with a format off the
whitelist as raw when the image's format is probed.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-11-09 08:43:02 -06:00
Naphtali Sprei
59f2689d90 Added readonly flag to -drive command
This is a slightly revised patch for adding readonly flag to the -drive command.
Even though this patch is "stand-alone", it assumes a previous related patch (in Anthony staging tree), that passes
the readonly attribute of the drive to the guest OS, applied first.

This enables sharing same image between guests, with readonly access.
Implementaion mark the drive as read_only and changes the flags when actually opening the file.
The readonly attribute of a qcow also passed to it's base file.
For ide that cannot pass the readonly attribute to the guest OS, disallow the readonly flag.

Also, return error code from bdrv_truncate for readonly drive.

Signed-off-by: Naphtali Sprei <nsprei@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-11-09 08:43:01 -06:00
Kevin Wolf
65d6b3d885 block: Use new AsyncContext for bdrv_read/write emulation
bdrv_read/write emulation is used as the perfect example why we need something
like AsyncContexts. So maybe they better start using it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:28:59 -05:00
Blue Swirl
72cf2d4f0e Fix sys-queue.h conflict for good
Problem: Our file sys-queue.h is a copy of the BSD file, but there are
some additions and it's not entirely compatible. Because of that, there have
been conflicts with system headers on BSD systems. Some hacks have been
introduced in the commits 15cc923584,
f40d753718,
96555a96d7 and
3990d09adf but the fixes were fragile.

Solution: Avoid the conflict entirely by renaming the functions and the
file. Revert the previous hacks.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-09-12 07:36:22 +00:00
Christoph Hellwig
b2e12bc6e3 block: add aio_flush operation
Instead stalling the VCPU while serving a cache flush try to do it
asynchronously.  Use our good old helper thread pool to issue an
asynchronous fdatasync for raw-posix.  Note that while Linux AIO
implements a fdatasync operation it is not useful for us because
it isn't actually implement in asynchronous fashion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11 10:19:46 -05:00
Christoph Hellwig
e900a7b748 block: add enable_write_cache flag
Add a enable_write_cache flag in the block driver state, and use it to
decide if we claim to have a volatile write cache that needs controlled
flushing from the guest.  The flag is off if cache=writethrough is
defined because O_DSYNC guarantees that every write goes to stable
storage, and it is on for cache=none and cache=writeback.

Both scsi-disk and ide now use the new flage, changing from their
defaults of always off (ide) or always on (scsi-disk).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11 10:19:46 -05:00
Kevin Wolf
40b4f53967 Add bdrv_aio_multiwrite
One performance problem of qcow2 during the initial image growth are
sequential writes that are not cluster aligned. In this case, when a first
requests requires to allocate a new cluster but writes only to the first
couple of sectors in that cluster, the rest of the cluster is zeroed - just
to be overwritten by the following second request that fills up the cluster.

Let's try to merge sequential write requests to the same cluster, so we can
avoid to write the zero padding to the disk in the first place.

As a nice side effect, also other formats take advantage of dealing with less
and larger requests.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11 10:18:06 -05:00
Christoph Hellwig
5c6c3a6c54 raw-posix: add Linux native AIO support
Now that do have a nicer interface to work against we can add Linux native
AIO support.  It's an extremly thing layer just setting up an iocb for
the io_submit system call in the submission path, and registering an
eventfd with the qemu poll handler to do complete the iocbs directly
from there.

This started out based on Anthony's earlier AIO patch, but after
estimated 42,000 rewrites and just as many build system changes
there's not much left of it.

To enable native kernel aio use the aio=native sub-command on the
drive command line.  I have also added an option to qemu-io to
test the aio support without needing a guest.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27 20:30:22 -05:00
Juan Quintela
71e72a19ba rename HOST_BSD to CONFIG_BSD
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-27 14:09:20 -05:00
Christoph Hellwig
45566e9c99 replace bdrv_{get, put}_buffer with bdrv_{load, save}_vmstate
The VM state offset is a concept internal to the image format.  Replace
the old bdrv_{get,put}_buffer method that require an index into the
image file that is constructed from the VM state offset and an offset
into the vmstate with the bdrv_{load,save}_vmstate that just take an
offset into the VM state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-16 08:28:13 -05:00
Avi Kivity
36afc45159 block: Clean up after deleting BHs
Commit 6a7ad299 ("Call qemu_bh_delete at bdrv_aio_bh_cb") deletes emulated
aio bottom halves to prevent endless accumulation.  However, it leaves a
stale ->bh pointer, which is then waited on when the aio is reused.

Zeroing the pointer fixes the issue, allowing vmdk format images to be used.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-10 13:44:30 -05:00
Anthony Liguori
1cec71e359 Revert "support colon in filenames"
This reverts commit 707c0dbc97.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09 16:06:38 -05:00
Blue Swirl
d43277c534 Fix missing strnlen problems
Fix missing strnlen (a GNU extension) problems by using qemu_strnlen
used for user emulators also for system emulators.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-07-01 18:24:44 +00:00
Ram Pai
707c0dbc97 support colon in filenames
Problem: It is impossible to feed filenames with the character colon because
qemu interprets such names as a protocol. For example filename scsi:0, is
interpreted as a protocol by name "scsi".

This patch allows user to espace colon characters. For example the above
filename can now be expressed either as 'scsi\:0' or as file:scsi:0

anything following the "file:" tag is interpreted verbatin. However if "file:"
tag is omitted then any colon characters in the string must be escaped using
backslash.

Here are couple of examples:

scsi\:0\:abc is a local file scsi:0:abc
http\://myweb is a local file by name http://myweb
file:scsi:0:abc is a local file scsi:0:abc
file:http://myweb is a local file by name http://myweb

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-29 13:50:05 -05:00
Mark McLoughlin
aea2a33c73 Prevent CD-ROM media eject while device is locked
Section 10.8.25 ("START/STOP UNIT Command") of SFF-8020i states that
if the device is locked we should refuse to eject if the device is
locked.

ASC_MEDIA_REMOVAL_PREVENTED is the appropriate return in this case.

In order to stop itself from ejecting the media it is running from,
Fedora's installer (anaconda) requires the CDROMEJECT ioctl() to fail
if the drive has been previously locked.

See also https://bugzilla.redhat.com/501412

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:52:37 -05:00
Dor Laor
6a7ad2998c Call qemu_bh_delete at bdrv_aio_bh_cb.
Also replave qemu_bh_cancel with qemu_bh_delete in bdrv_aio_cancel_em.
 Otherwise the bh will live forever in the bh list.

Signed-off-by: Dor Laor <dor@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:36:47 -05:00
Christoph Hellwig
508c7cb3fa block: add bdrv_probe_device method
Add a bdrv_probe_device method to all BlockDriver instances implementing
host devices to move matching of host device types into the actual drivers.
For now we keep exacly the old matching behaviour based on the devices names,
although we really should have better detetion methods based on device
information in the future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2009-06-15 14:04:22 +02:00
Christoph Hellwig
f3a5d3f8a1 raw-posix: split hdev drivers
Instead of declaring one BlockDriver for all host devices declared one
for each type:  a generic one for normal disk devices, a Linux floppy
driver and a CDROM driver for Linux and FreeBSD.  This gets rid of a lot
of messy ifdefs and switching based on the type in the various removal
device methods.

block.c grows a new method to find the correct host device driver based
on OS-sepcific criteria, which will later into the actual drivers in a
later patch in this series.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2009-06-15 13:55:19 +02:00
Christoph Hellwig
c16b5a2ca0 fully split aio_pool from BlockDriver
Now that we have a separate aio pool structure we can remove those
aio pool details from BlockDriver.

Every driver supporting AIO now needs to declare a static AIOPool
with the aiocb size and the cancellation method.  This cleans up the
current code considerably and will make it cleaner and more obvious
to support two different aio implementations behind a single
BlockDriver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-05-27 09:46:03 -05:00
Kevin Wolf
91a073a975 Drop bdrv_create2
This patch converts the remaining users of bdrv_create2 to bdrv_create and
removes the now unused function.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-05-27 09:45:23 -05:00
Kevin Wolf
0e7e1989f7 Convert all block drivers to new bdrv_create
Now we can make use of the newly introduced option structures. Instead of
having bdrv_create carry more and more parameters (which are format specific in
most cases), just pass a option structure as defined by the driver itself.

bdrv_create2() contains an emulation of the old interface to simplify the
transition.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-05-22 10:50:31 -05:00
Anthony Liguori
c833ab7351 Fix segv when passing an unknown protocol
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-05-22 10:50:29 -05:00
Anthony Liguori
5efa9d5a8b Convert block infrastructure to use new module init functionality
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-05-14 16:13:41 -05:00