Move AIO initialization for raw-posix block driver into a helper function.
In addition to just code motion, the aio_ctx pointer is checked for NULL,
prior to calling laio_init(), to make sure laio_init() is only run once.
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
ccc-analyzer reports these warnings:
block/vdi.c:704:13: warning: Dereference of null pointer
bmap[i] = VDI_UNALLOCATED;
^
block/vdi.c:702:13: warning: Dereference of null pointer
bmap[i] = i;
^
Moving some code into the if block fixes this.
It also avoids calling function write with 0 bytes of data.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Report from smatch:
block/curl.c:546 curl_close(21) info: redundant null check on s->url calling free()
The check was redundant, and free was also wrong because the memory
was allocated using g_strdup.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch sets data to be sent to Sheepdog correctly and fixes savevm
and loadvm operations on a Sheepdog image.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* kwolf/for-anthony:
qemu-iotests: add backing file smaller than image test case
stream: complete early if end of backing file is reached
qed: refuse unaligned zero writes with a backing file
It is possible to create an image that is larger than its backing file.
Reading beyond the end of the backing file produces zeroes if no writes
have been made to those sectors in the image file.
This patch finishes streaming early when the end of the backing file is
reached. Without this patch the block job hangs and continually tries
to stream the first sectors beyond the end of the backing file.
To reproduce the hung block job bug:
$ qemu-img create -f qcow2 backing.qcow2 128M
$ qemu-img create -f qcow2 -o backing_file=backing.qcow2 image.qcow2 6G
$ qemu -drive if=virtio,cache=none,file=image.qcow2
(qemu) block_stream virtio0
(qemu) info block-jobs
The qemu-iotests 030 streaming test still passes.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Zero writes have cluster granularity in QED. Therefore they can only be
used to zero entire clusters.
If the zero write request leaves sectors untouched, zeroing the entire
cluster would obscure the backing file. Instead return -ENOTSUP, which
is handled by block.c:bdrv_co_do_write_zeroes() and falls back to a
regular write.
The qemu-iotests 034 test cases covers this scenario.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The number of blocks of the device is used to compute the device size
in bdrv_getlength()/iscsi_getlength().
For MMC devices, the ReturnedLogicalBlockAddress in the READCAPACITY10
has a special meaning when it is 0.
In this case it does not mean that LBA 0 is the last accessible LBA,
and thus the device has 1 readable block, but instead it means that the
disc is blank and there are no readable blocks.
This change ensures that when the iSCSI LUN is loaded with a blank
DVD-R disk or similar that bdrv_getlength() will return the correct
size of the device as 0 bytes.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
* kwolf/for-anthony:
virtio-blk: hide VIRTIO_BLK_F_CONFIG_WCE from old machine types
Documentation: Warn against qemu-img on active image
vmdk: Read footer for streamOptimized images
vmdk: Fix header structure
Conflicts:
hw/virtio-blk.c
This patch fixes two main issues with block/iscsi.c:
1) iscsi_task_mgmt_abort_task_async calls iscsi_scsi_task_cancel which
was also directly called in iscsi_aio_cancel
2) a race between task completion and task abortion could happen cause
the scsi_free_scsi_task were done before iscsi_schedule_bh has finished.
To fix this, all the freeing of IscsiTasks and releasing of the AIOCBs
is centralized in iscsi_bh_cb, independent of whether the SCSI command
has completed or was cancelled.
3) iscsi_aio_cancel was not synchronously waiting for the end of the
command.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
It is always used with the same callback, remove the argument. And
its return value is never used, assume allocation succeeds.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This reverts commit 64e69e8092. The commit
returned immediately from iscsi_aio_cancel, risking corruption in case the
following happens:
guest qemu target
=========================================================================
send write 1 -------->
send write 1 -------->
cancel write 1 ------>
cancel write 1 ------>
<------------------ cancellation processed
send write 2 -------->
send write 2 -------->
<---------------- completed write 2
<------------------ completed write 2
<---------------- completed write 1
<---------------- cancellation not done
Here, the guest would see write 2 superseding write 1, when in fact the
outcome could have been the opposite. The right behavior is to return
only after the target says whether the cancellation was done or not, and
it will be implemented by the next three patches.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The footer takes precedence over the header when it exists. It contains
the real grain directory offset that is missing in the header. Without
this patch, streamOptimized images with a footer cannot be read.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
This patch converts all block layer close calls, that correspond
to qemu_open calls, to qemu_close.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch converts all block layer open calls to qemu_open.
Note that this adds the O_CLOEXEC flag to the changed open paths
when the O_CLOEXEC macro is defined.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* kwolf/for-anthony:
qemu-iotests: skip 039 with ./check -nocache
block: add BLOCK_O_CHECK for qemu-img check
qcow2: mark image clean after repair succeeds
qed: mark image clean after repair succeeds
blockdev: flip default cache mode from writethrough to writeback
virtio-blk: disable write cache if not negotiated
virtio-blk: support VIRTIO_BLK_F_CONFIG_WCE
qemu-iotests: Save some sed processes
ahci: Fix sglist memleak in ahci_dma_rw_buf()
ahci: Fix ahci cdrom read corruptions for reads > 128k
virtio-blk: fix use-after-free while handling scsi commands
* bonzini/scsi-next:
scsi-disk: add support for the UNMAP command
scsi-disk: improve out-of-range LBA detection for WRITE SAME
scsi-disk: more assertions and resets for aiocb
virtio-scsi: do not compare 32-bit QEMU tags against 64-bit virtio-scsi tags
iscsi: Pick default initiator-name based on the name of the VM
iscsi: reorganize code for parse_initiator_name
iscsi: do not leak initiator_name
Image formats with a dirty bit, like qed and qcow2, repair dirty image
files upon open with BDRV_O_RDWR. Performing automatic repair when
qemu-img check runs is not ideal because the bdrv_open() call repairs
the image before the actual bdrv_check() call from qemu-img.c.
Fix this "double repair" since it leads to confusing output from
qemu-img check. Tell the block driver that this image is being opened
just for bdrv_check(). This skips automatic repair and qemu-img.c can
invoke it manually with bdrv_check().
Update the golden output for qemu-iotests 039 to reflect the new
qemu-img check output.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The dirty bit is cleared after image repair succeeds in qcow2_open().
Move this into qcow2_check() so that all callers benefit from this
behavior when fix mode is enabled.
This is necessary so qemu-img check can call .bdrv_check() and mark the
image clean.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The dirty bit is cleared after image repair succeeds in qed_open().
Move this into qed_check() so that all callers benefit from this
behavior when fix=true.
This is necessary so qemu-img check can call .bdrv_check() and mark the
image clean.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch updates the iscsi layer to automatically pick a 'unique'
initiator-name based on the name of the vm in case the user has not set
an explicit iqn-name to use.
Create a new function qemu_get_vm_name() that returns the name of the VM,
if specified.
This way we can thus create default names to use as the initiator name
based on the guest session.
If the VM is not named via the '-name' command line argument, the iscsi
initiator-name used wiull simply be
iqn.2008-11.org.linux-kvm
If a name for the VM was specified with the '-name' option, iscsi will
use a default initiatorname of
iqn.2008-11.org.linux-kvm:<name>
These names are just the default iscsi initiator name that qemu will
generate/use only when the user has not set an explicit initiator name
via the commandlines or config files.
Signed-off-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
The argument of iscsi_create_context is never freed by libiscsi,
which in fact calls strdup on it. Avoid a leak.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Lazy refcounts is a performance optimization for qcow2 that postpones
refcount metadata updates and instead marks the image dirty. In the
case of crash or power failure the image will be left in a dirty state
and repaired next time it is opened.
Reducing metadata I/O is important for cache=writethrough and
cache=directsync because these modes guarantee that data is on disk
after each write (hence we cannot take advantage of caching updates in
RAM). Refcount metadata is not needed for guest->file block address
translation and therefore does not need to be on-disk at the time of
write completion - this is the motivation behind the lazy refcount
optimization.
The lazy refcount optimization must be enabled at image creation time:
qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G
qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough
Update qemu-iotests 031 and 036 since the extension header size changes
when we add feature bit table entries.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch adds an incompatible feature bit to mark images that have not
been closed cleanly. When a dirty image file is opened a consistency
check and repair is performed.
Update qemu-iotests 031 and 036 since the extension header size changes
when we add feature bit table entries.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
vvfat creates a virtual VFAT filesystem with a certain logical
geometry that depends on its options. It sets the "geometry hint" to
this geometry. It is the only block driver to do this.
The geometry hint is about about *physical* geometry, and used only by
certain hard disk device models.
vvfat's hint is normally invisible for device models, because
bdrv_open() puts a raw format on top of vvfat's fat protocol. That
raw format is where drive_init() puts the user's geometry (if any),
and where the device model gets it from.
Nobody complained, because the default physical geometry is the same
as vvfat's logical geometry:
opts LCHS def. PCHS
1024,16,63 same
:32: 1024,16,63 same
:16: 1024,16,63 same
:12: 64,16,63 same
Except when you specify :floppy:
opts LCHS def. PCHS
:floppy: 80, 2,36 5,16,63
:32:floppy: 80, 2,36 5,16,63
:16:floppy: 80, 2,36 5,16,63
:12:floppy: 80, 2,18 2,16,63
Silly thing to do for use with a hard disk.
However, the "raw" format can be suppressed by adding an
redundant-looking "format=vvfat" to "file=fat:FOO". Then, vvfat's
hint clobbers the user's geometry, i.e. -drive options cyls, heads,
secs get silently ignored. Don't do that.
No change without format=vvfat. With it, the user's hard disk
geometry (-drive options cyls, heads, secs) is now obeyed, and the
default hard disk geometry with :floppy: now matches the one without
format=vvfat.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Unless parameter ":floppy:" is given, vvfat creates a virtual image
with DOS MBR defining a single partition which holds the FAT file
system. The size of the virtual image depends on the width of the
FAT: 32 MiB (CHS 64, 16, 63) for 12 bit FAT, 504 MiB (CHS 1024, 16,
63) for 16 and 32 bit FAT, leaving (64*16-1)*63 = 64449 and
(1024*16-1)*64 = 1032129 sectors for the partition.
However, it screws up the end of the partition in the MBR:
FAT width param. start CHS end CHS start LBA size
:32: 0,1,1 1023,14,63 63 1032065
:16: 0,1,1 1023,14,55 63 1032057
:12: 0,1,1 63,14,55 63 64377
The actual FAT file system nevertheless assumes the partition has
1032129 or 64449 sectors. Oops.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Only buffers that map to unallocated blocks need to be zeroed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* mjt/mjt-iov2:
rewrite iov_send_recv() and move it to iov.c
cleanup qemu_co_sendv(), qemu_co_recvv() and friends
export iov_send_recv() and use it in iov_send() and iov_recv()
rename qemu_sendv to iov_send, change proto and move declarations to iov.h
change qemu_iovec_to_buf() to match other to,from_buf functions
consolidate qemu_iovec_copy() and qemu_iovec_concat() and make them consistent
allow qemu_iovec_from_buffer() to specify offset from which to start copying
consolidate qemu_iovec_memset{,_skip}() into single function and use existing iov_memset()
rewrite iov_* functions
change iov_* function prototypes to be more appropriate
virtio-serial-bus: use correct lengths in control_out() message
Conflicts:
tests/Makefile
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
* kwolf/for-anthony: (24 commits)
block: Factor bdrv_read_unthrottled() out of guess_disk_lchs()
qtest: Tidy up temporary files properly
fdc: Drop broken code for user-defined floppy geometry
fdc_test: introduce test_sense_interrupt
fdc_test: update media_change test
fdc: fix interrupt handling
fdc: rewrite seek and DSKCHG bit handling
block: introduce bdrv_swap, implement bdrv_append on top of it
block: copy over job and dirty bitmap fields in bdrv_append
raw: hook into blkdebug
blkdebug: optionally tie errors to a specific sector
blkdebug: store list of active rules
blkdebug: pass getlength to underlying file
blkdebug: tiny cleanup
blkdebug: remove sync i/o events
sheepdog: traverse pending_list from the first for each time
sheepdog: split outstanding list into inflight and pending
sheepdog: make sure we don't free aiocb before sending all requests
sheepdog: use coroutine based socket functions in coroutine context
sheepdog: restart I/O when socket becomes ready in do_co_req()
...
This makes blkdebug scripts more powerful, and independent of the
exact sequence of operations performed by streaming.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This prepares for the next patch, where some active rules may actually
not trigger depending on input to readv/writev. Store the active rules
in a SIMPLEQ (so that it can be emptied easily with QSIMPLEQ_INIT), and
fetch the errno/once/immediately arguments from there.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is required when using blkdebug with raw format. Unlike qcow2/QED,
raw asks blkdebug for the length of the file, it doesn't get it from
a header.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These are unused, except (by mistake more or less) in QED.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The pending list can be modified in other coroutine context
sd_co_rw_vector, so we need to traverse the list from the first again
after we send the pending request.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
outstanding_list_head is used for both pending and inflight requests.
This patch splits it and improves readability.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch increments the pending counter before sending requests, and
make sures that aiocb is not freed while sending them.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This removes blocking network I/Os in coroutine context.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Currently, no one reenters the yielded coroutine. This fixes it.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This fixes warnings about dprintf format in debug mode.
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When qcow2_alloc_clusters() error handling code was introduced in commit
5d757b563d, the value of free_byte_offset
was clobbered in the error case. This patch keeps free_byte_offset at 0
so we will try to allocate clusters again next time this function is
called.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>