Commit Graph

73 Commits

Author SHA1 Message Date
Kevin Wolf
c5baaa489f qcow2: Fix grow_refcount_table error handling
In case of failure, we haven't increased the refcount for the newly allocated
cluster yet. Therefore we must not free the cluster or its refcount will become
negative (and endless recursion is possible).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-27 12:28:35 -05:00
Kevin Wolf
ef845c3bf4 qcow2: Bring synchronous read/write back to life
When the synchronous read and write functions were dropped, they were replaced
by generic emulation functions. Unfortunately, these emulation functions don't
provide the same semantics as the original functions did.

The original bdrv_read would mean that we read some data synchronously and that
we won't be interrupted during this read. The latter assumption is no longer
true with the emulation function which needs to use qemu_aio_poll and therefore
allows the callback of any other concurrent AIO request to be run during the
read. Which in turn means that (meta)data read earlier could have changed and
be invalid now. qcow2 is not prepared to work in this way and it's just scary
how many places there are where other requests could run.

I'm not sure yet where exactly it breaks, but you'll see breakage with virtio
on qcow2 with a backing file. Providing synchronous functions again fixes the
problem for me.

Patchworks-ID: 35437
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-15 09:32:04 -05:00
Kevin Wolf
0b4ce02eb2 block/raw: Add create_options for host_device
Today host_devices have a create function, so they also need a create_options
field to prevent qemu-img from complaining.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-05 14:20:34 -05:00
Kevin Wolf
80ee15a6b2 qcow2: Increase maximum cluster size to 2 MB
This patch increases the maximum qcow2 cluster size to 2 MB. Starting with 128k
clusters, L2 tables span 2 GB or more of virtual disk space, causing 32 bit
truncation and wraparound of signed integers. Therefore some variables need to
use a larger data type.

While being at reviewing data types, change some integers that are used for
array indices to unsigned. In some places they were checked against some upper
limit but not for negative values. This could avoid potential segfaults with
corrupted qcow2 images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-05 09:32:52 -05:00
Stefan Weil
ee682d27a5 Check availability of uuid header / library
If available, the Universally Unique Identifier library
is used by the vdi block driver.

Other parts of QEMU (vl.c) could also use it.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2009-10-04 13:24:45 +02:00
Anthony Liguori
c227f0995e Revert "Get rid of _t suffix"
In the very least, a change like this requires discussion on the list.

The naming convention is goofy and it causes a massive merge problem.  Something
like this _must_ be presented on the list first so people can provide input
and cope with it.

This reverts commit 99a0949b72.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-10-01 16:12:16 -05:00
malc
99a0949b72 Get rid of _t suffix
Some not so obvious bits, slirp and Xen were left alone for the time
being.

Signed-off-by: malc <av1474@comtv.ru>
2009-10-01 22:45:02 +04:00
Michael S. Tsirkin
6ab00cee70 vvfat: fix coding style nit
Put space between = and & when taking a pointer,
to avoid confusion with old-style "&=".

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-09-30 18:45:50 +00:00
Blue Swirl
a2a45a26c9 Fix signedness warnings on OpenSolaris
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-09-12 12:36:09 +00:00
Blue Swirl
72cf2d4f0e Fix sys-queue.h conflict for good
Problem: Our file sys-queue.h is a copy of the BSD file, but there are
some additions and it's not entirely compatible. Because of that, there have
been conflicts with system headers on BSD systems. Some hacks have been
introduced in the commits 15cc923584,
f40d753718,
96555a96d7 and
3990d09adf but the fixes were fragile.

Solution: Avoid the conflict entirely by renaming the functions and the
file. Revert the previous hacks.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-09-12 07:36:22 +00:00
Christoph Hellwig
b2e12bc6e3 block: add aio_flush operation
Instead stalling the VCPU while serving a cache flush try to do it
asynchronously.  Use our good old helper thread pool to issue an
asynchronous fdatasync for raw-posix.  Note that while Linux AIO
implements a fdatasync operation it is not useful for us because
it isn't actually implement in asynchronous fashion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11 10:19:46 -05:00
Christoph Hellwig
6f1953c4c1 block: use fdatasync instead of fsync if possible
If we are flushing the caches for our image files we only care about the
data (including the metadata required for accessing it) but not things
like timestamp updates.  So try to use fdatasync instead of fsync to
implement the flush operations.

Unfortunately many operating systems still do not support fdatasync,
so we add a qemu_fdatasync wrapper that uses fdatasync if available
as per the _POSIX_SYNCHRONIZED_IO feature macro or fsync otherwise.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-11 10:19:46 -05:00
Kevin Wolf
f214978a42 qcow2: Order concurrent AIO requests on the same unallocated cluster
When two AIO requests write to the same cluster, and this cluster is
unallocated, currently both requests allocate a new cluster and the second one
merges the first one when it is completed. This means an cluster allocation, a
read and a cluster deallocation which cause some overhead. If we simply let the
second request wait until the first one is done, we improve overall performance
with AIO requests (specifially, qcow2/virtio combinations).

This patch maintains a list of in-flight requests that have allocated new
clusters. A second request touching the same cluster is limited so that it
either doesn't touch the allocation of the first request (so it can have a
non-overlapping allocation) or it waits for the first request to complete.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-09 17:31:26 -05:00
Kevin Wolf
ea80b906f4 qcow2: Fix metadata preallocation
The wrong version of the preallocation patch has been applied, so this is the
remaining diff.

We can't use truncate to grow the image file to the right size because we don't
know if metadata has been written after the last data cluster. In this case
truncate would shrink the file and destroy its metadata. Write a zero sector at
the end of the virtual disk instead to ensure that the file is big enough.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-09 17:31:26 -05:00
Stefan Weil
cc2040f8c2 Fix spelling in comment.
The company which made Virtual PC was Connectix.
They use the magic string "conectix" in their disk images.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-09-09 14:57:20 -05:00
Blue Swirl
2000cbc50d Fix gcc 3 warning about uninitialized variable
If nb_sectors is 0, cluster_offset will not be initialized.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-08-29 16:37:26 +03:00
Stefan Weil
e44bd6fc15 Don't compile aio code if CONFIG_LINUX_AIO is undefined
This patch fixes linker errors when building QEMU without Linux AIO support.

It is based on suggestions from malc and Kevin Wolf.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-28 08:57:49 -05:00
Christoph Hellwig
5c6c3a6c54 raw-posix: add Linux native AIO support
Now that do have a nicer interface to work against we can add Linux native
AIO support.  It's an extremly thing layer just setting up an iocb for
the io_submit system call in the submission path, and registering an
eventfd with the qemu poll handler to do complete the iocbs directly
from there.

This started out based on Anthony's earlier AIO patch, but after
estimated 42,000 rewrites and just as many build system changes
there's not much left of it.

To enable native kernel aio use the aio=native sub-command on the
drive command line.  I have also added an option to qemu-io to
test the aio support without needing a guest.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27 20:30:22 -05:00
Christoph Hellwig
9ef91a6771 raw-posix: refactor AIO support
Currently the raw-posix.c code contains a lot of knowledge about the
asynchronous I/O scheme that is mostly implemented in posix-aio-compat.c.
All this code does not really belong here and is getting a bit in the
way of implementing native AIO on Linux.

So instead move all the guts of the AIO implementation into
posix-aio-compat.c (which might need a better name, btw).

There's now a very small interface between the AIO providers and raw-posix.c:

 - an init routine is called from raw_open_common to return an AIO context
   for this drive.  An AIO implementation may either re-use one context
   for all drives, or use a different one for each as the Linux native
   AIO support will do.
 - an submit routine is called from the aio_reav/writev methods to submit
   an AIO request

There are no indirect calls involved in this interface as we need to
decide which one to call manually.  We will only call the Linux AIO native
init function if we were requested to by vl.c, and we will only call
the native submit function if we are asked to and the request is properly
aligned.  That's also the reason why the alignment check actually does
the inverse move and now goes into raw-posix.c.

The old posix-aio-compat.h headers is removed now that most of it's
content is private to posix-aio-compat.c, and instead we add a new
block/raw-posix-aio.h headers is created containing only the tiny interface
between raw-posix.c and the AIO implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27 20:30:22 -05:00
Kevin Wolf
a35e1c177d qcow2: Metadata preallocation
This introduces a qemu-img create option for qcow2 which allows the metadata to
be preallocated, i.e. clusters are reserved in the refcount table and L1/L2
tables, but no data is written to them. Metadata is quite small, so this
happens in almost no time.

Especially with qcow2 on virtio this helps to gain a bit of performance during
the initial writes. However, as soon as create a snapshot, we're back to the
normal slow speed, obviously. So this isn't the real fix, but kind of a cheat
while we're still having trouble with qcow2 on virtio.

Note that the option is disabled by default and needs to be specified
explicitly using qemu-img create -f qcow2 -o preallocation=metadata.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27 20:30:20 -05:00
Stefan Weil
6eea90eba4 block/vdi.c: Fix several bugs
* The code for option '-static' was wrong, so image creation
  always created static images.

* Static images created with qemu-img did not set header entry
  blocks_allocated.

* The size of the block map must be rounded to the next multiple
  of SECTOR_SIZE, otherwise the block map is only read partially
  for block map sizes which are not a multiple of SECTOR_SIZE.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-27 19:33:15 -05:00
Nathan Froyd
5ec4d682d2 eliminate errors about unused results in block/vpc.c
These errors come up when compiling with gcc-4.3.3 and some older headers:

/scratch/froydnj/qemu.git/block/vpc.c: In function 'vpc_create':
/scratch/froydnj/qemu.git/block/vpc.c:514: error: value computed is not used
/scratch/froydnj/qemu.git/block/vpc.c:516: error: value computed is not used
/scratch/froydnj/qemu.git/block/vpc.c:517: error: value computed is not used
/scratch/froydnj/qemu.git/block/vpc.c:566: error: value computed is not used

Use memcpy to copy the strings instead of strncpy.

Signed-off-by: Nathan Froyd <froydnj@codesourcery.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-24 08:46:48 -05:00
Christoph Hellwig
4dd75c702c make pthreads mandatory
As requested by Anthony make pthreads mandatory.  This means we will always
have AIO available on posix hosts, and it will also allow enabling the I/O
thread unconditionally once it's ready.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-08-24 08:46:47 -05:00
Blue Swirl
1786dc15ee Use pstrcpy to avoid OpenBSD linker warnings
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-08-15 11:33:58 +00:00
Stefan Weil
9aebd98aab Add new block driver for the VDI format (only aio supported)
This is a new block driver written from scratch
to support the VDI format in QEMU.

VDI is the native format used by Innotek / SUN VirtualBox.

Latest changes:

* stripped down version
  (code for synchronous operations and experimental code removed)

* don't open VDI snapshot images (with uuid_link or uuid_parent)

* modified vdi_aio_cancel

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Message-Id:
2009-08-10 13:05:30 -05:00
Blue Swirl
df3cee1a3a Fix Sparse warning about "expression using sizeof on a function"
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-08-01 10:13:44 +00:00
Juan Quintela
71e72a19ba rename HOST_BSD to CONFIG_BSD
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-27 14:09:20 -05:00
Kevin Wolf
b171271a50 vmdk: Fix backing file handling
Instead of storing the backing file in its own BlockDriverState, VMDK uses the
BlockDriverState of the raw image file it opened. This is wrong and breaks
functions that access the backing file or protocols. This fix replaces all
occurrences of s->hd->backing_* with bs->backing_*.

This fixes qemu-iotests failure in 020 (Commit changes to backing file).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-22 10:58:47 -05:00
Blue Swirl
0bf9e31af1 Fix most warnings (errors with -Werror) when debugging is enabled
I used the following command to enable debugging:
perl -p -i -e 's/^\/\/#define DEBUG/#define DEBUG/g' * */* */*/*

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-07-20 17:19:25 +00:00
Stefan Weil
1e37d05904 raw-posix: Handle errors in raw_create
In qemu-iotests, some large images are created using qemu-img.

Without checks for errors, qemu-img will just create an
empty image, and later read / write tests will fail.

With the patch, failures during image creation are detected
and reported.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-16 17:28:49 -05:00
Christoph Hellwig
45566e9c99 replace bdrv_{get, put}_buffer with bdrv_{load, save}_vmstate
The VM state offset is a concept internal to the image format.  Replace
the old bdrv_{get,put}_buffer method that require an index into the
image file that is constructed from the VM state offset and an offset
into the vmstate with the bdrv_{load,save}_vmstate that just take an
offset into the VM state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-16 08:28:13 -05:00
Kevin Wolf
3f6a3ee51e qcow2: Fix L1 table memory allocation
Contrary to what one could expect, the size of L1 tables is not cluster
aligned. So as we're writing whole sectors now instead of single entries,
we need to ensure that the L1 table in memory is large enough; otherwise
write would access memory after the end of the L1 table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-10 13:44:29 -05:00
Kevin Wolf
c53ffce91b qcow1: Fix qcow_aio_writev
Pass is_write = 1 to qcow_aio_setup when writing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-10 13:44:29 -05:00
G 3
1c27a8b35e Substitute O_DSYNC with O_SYNC or O_FSYNC when needed.
Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09 16:58:07 -05:00
Nolan
c76f4952bb Allow adjustment of http block device's readahead size, via a new
":readahead=###:" suffix.

Signed-off-by: Nolan Leake <nolan <at> sigbus.net>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09 16:06:40 -05:00
Anthony Liguori
1cec71e359 Revert "support colon in filenames"
This reverts commit 707c0dbc97.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09 16:06:38 -05:00
Kevin Wolf
0aa217e461 qcow2: Make cache=writethrough default
The performance of qcow2 has improved meanwhile, so we don't need to
special-case it any more. Switch the default to write-through caching
like all other block drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-07-09 16:06:37 -05:00
Kevin Wolf
3b88e52b41 qcow2: Cache refcount blocks during snapshot creation
The really time consuming part of snapshotting is to adjust the reference count
of all clusters. Currently after each adjusted cluster the refcount block is
written to disk.

Don't write each single byte immediately to disk but cache all writes to the
refcount block and write them out once we're done with the block.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-29 14:18:07 -05:00
Kevin Wolf
22afa7b5b6 block-raw: Allow pread beyond the end of growable images
When using O_DIRECT, qcow2 snapshots didn't work any more for me. In the
process of creating the snapshot, qcow2 tries to pwrite some new information
(e.g. new L1 table) which will often end up being after the old end of the
image file. Now pwrite tries to align things and reads the old contents of the
file, read returns 0 because there is nothing to read after the end of file and
pwrite is stuck in an endless loop.

This patch allows to pread beyond the end of an image file. Whenever the
given offset is after the end of the image file, the read succeeds and fills
the buffer with zeros.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-29 14:18:07 -05:00
Ram Pai
707c0dbc97 support colon in filenames
Problem: It is impossible to feed filenames with the character colon because
qemu interprets such names as a protocol. For example filename scsi:0, is
interpreted as a protocol by name "scsi".

This patch allows user to espace colon characters. For example the above
filename can now be expressed either as 'scsi\:0' or as file:scsi:0

anything following the "file:" tag is interpreted verbatin. However if "file:"
tag is omitted then any colon characters in the string must be escaped using
backslash.

Here are couple of examples:

scsi\:0\:abc is a local file scsi:0:abc
http\://myweb is a local file by name http://myweb
file:scsi:0:abc is a local file scsi:0:abc
file:http://myweb is a local file by name http://myweb

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-29 13:50:05 -05:00
Filip Navara
14899cdf3a Fix QCOW2 debugging code to compile again
Updated to use C99 comments.

Signed-off-by: Filip Navara <filip.navara@gmail.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-29 08:52:40 -05:00
Blue Swirl
19a3da7f4d Fix opening of read only raw images
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2009-06-17 18:27:44 +03:00
Kevin Wolf
9923e05e1a update_refcount: Write complete sectors
When updating the refcount blocks in update_refcount(), write complete sectors
instead of updating single entries.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:18:37 -05:00
Kevin Wolf
4c1612d954 alloc_cluster_link_l2: Write complete sectors
When updating the L2 tables in alloc_cluster_link_l2(), write complete
sectors instead of updating single entries.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:18:36 -05:00
Kevin Wolf
6583e3c7e8 l2_allocate: Write complete sectors
When modifying the L1 table, l2_allocate() needs to write complete sectors
instead of single entries. The L1 table is already in memory, reading it from
disk in the block layer to align the request is wasted performance.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:18:36 -05:00
Kevin Wolf
ed6ccf0f51 qcow2: Rename global functions
The qcow2 source is now split into several more manageable files. During the
conversion quite some functions that were static before needed to be changed to
be global to make the source compile again.

We were lucky enough not to get name conflicts with these additional global
names, but they are not nice. This patch adds a qcow2_ prefix to all of the
global functions in qcow2.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:18:36 -05:00
Kevin Wolf
c142442b06 qcow2: Split out snapshot functions
qcow2-snapshot.c contains the code related to snapshotting.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:18:36 -05:00
Kevin Wolf
45aba42fba qcow2: Split out guest cluster functions
qcow2-cluster.c contains all functions related to the management of guest
clusters, i.e. what the guest sees on its virtual disk. This code is about
mapping these guest clusters to host clusters in the image file using the
two-level lookup tables.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:18:36 -05:00
Kevin Wolf
f7d0fe0239 qcow2: Split out refcount handling
qcow2-refcount.c contains all functions which are related to cluster
allocation and management in the image file. A large part of this is the
reference counting of these clusters.

Also a header file qcow2.h is introduced which will contain the interface of
the split qcow2 modules.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:18:36 -05:00
Kevin Wolf
9ccb258e28 qcow2: Change default cluster size to 64k
Larger cluster sizes mean less metadata. This has been discussion a few times,
let's do it now. This turns 64k clusters on by default for new images.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2009-06-16 15:18:36 -05:00