xemu-project/xemu - xemu - Gitea: Git with a cup of tea

mirror of https://github.com/xemu-project/xemu.git synced 2024-11-24 20:19:44 +00:00

Author	SHA1	Message	Date
Alberto Garcia	6a63419980	commit: Use bdrv_append() in commit_start() This function combines bdrv_set_backing_hd() and bdrv_replace_node() so we can use it to simplify the code a bit in commit_start(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 20190403143748.9790-1-berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Max Reitz	21205c7c3b	block/ssh: Implement .bdrv_dirname() ssh_bdrv_dirname() is basically the generic bdrv_dirname(), except it takes care not to silently chop off any query string (i.e., host_key_check). Signed-off-by: Max Reitz <mreitz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Message-id: 20190225190828.17726-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Max Reitz	b8c1f90118	block/ssh: Implement .bdrv_refresh_filename() This requires some changes to keep iotests 104 and 207 working. qemu-img info in 104 will now return a filename including the user name and the port, which need to be filtered by adjusting REMOTE_TEST_DIR in common.rc. This additional information has to be marked optional, however (which is simple as REMOTE_TEST_DIR is a regex), because otherwise 197 and 215 would fail: They use it (indirectly) to filter qemu-img create output which contains a backing filename they have passed to it -- which probably does not contain a user name or port number. The problem in 207 is a nice one to have: qemu-img info used to return json:{} filenames, but with this patch it returns nice plain ones. We now need to adjust the filtering to hide the user name (and port number while we are at it). The simplest way to do this is to include both in iotests.remote_filename() so that bdrv_refresh_filename() will not change it, and then iotests.img_info_log() will filter it correctly automatically. Signed-off-by: Max Reitz <mreitz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Message-id: 20190225190828.17726-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Andrey Shinkevich	444b82369b	qcow2: discard bitmap when removed Bitmap data may take a lot of disk space, so it's better to discard it always. Signed-off-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Message-id: 1551346019-293202-1-git-send-email-andrey.shinkevich@virtuozzo.com Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [mreitz: Use the commit message proposed by Vladimir] Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy	54b10010eb	qcow2-refcount: don't mask corruptions under internal errors No reasons for not reporting found corruptions as corruptions in case of some internal errors, especially in case of just failed to fix l2 entry (and in this case, missed corruptions may influence comparing logic, when we calculate difference between corruptions fields of two results) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190227131433.197063-6-vsementsov@virtuozzo.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy	cbb51e9f93	qcow2-refcount: check_refcounts_l2: don't count fixed cluster as allocated Do not count a cluster which is fixed to be ZERO as allocated. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190227131433.197063-5-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy	1ef337b7a0	qcow2-refcount: check_refcounts_l2: reduce ignored overlaps Reduce number of structures ignored in overlap check: when checking active table ignore active tables, when checking inactive table ignore inactive ones. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190227131433.197063-4-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy	a5fff8d4b4	qcow2-refcount: avoid eating RAM qcow2_inc_refcounts_imrt() (through realloc_refcount_array()) can eat an unpredictable amount of memory on corrupted table entries, which are referencing regions far beyond the end of file. Prevent this, by skipping such regions from further processing. Interesting that iotest 138 checks exactly the behavior which we fix here. So, change the test appropriately. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190227131433.197063-3-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy	7e3e736cbd	qcow2-refcount: fix check_oflag_copied Increase corruptions_fixed only after successful fix. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190227131433.197063-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-05-07 17:14:21 +02:00
Vladimir Sementsov-Ogievskiy	54277a2aab	block/qed: add missed coroutine_fn markers qed_read_table and qed_write_table use coroutine-only interfaces but are not marked coroutine_fn. Happily, they are called only from coroutine context, so we only need to add missed markers. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Alberto Garcia	065abf9f2b	commit: Make base read-only if there is an early failure You can reproduce this by passing an invalid filter-node-name (like "1234") to block-commit. In this case the base image is put in read-write mode but is never reset back to read-only. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	f4326aefcf	block/stream: use buffer-based io Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	08b6261f34	block/commit: use buffer-based io Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	607dbdc4e0	block/backup: use buffer-based io Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	a4072543cc	block/parallels: use buffer-based io Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	696e8cb292	block/qed: use buffer-based io Move to _co_ versions of io functions qed_read_table() and qed_write_table(), as we use qemu_co_mutex_unlock() anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	4ed3e0c486	block/qcow: use buffer-based io Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Vladimir Sementsov-Ogievskiy	b00cb15bda	block/qcow2: use buffer-based io Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Alberto Garcia	e1f4a37a49	qcow2: Fix error handling in the compression code This patch fixes a few things in the way error codes are handled in the qcow2 compression code: a) qcow2_co_pwritev_compressed() expects qcow2_co_compress() to only return -1 or -2 on failure, but this is not correct. Since the change from qcow2_compress() to qcow2_co_compress() in commit `ceb029cd6f` the new code can also return -EINVAL (although there does not seem to exist any code path that would cause that error in the current implementation). b) -1 and -2 are ad-hoc error codes defined in qcow2_compress(). This patch replaces them with standard constants from errno.h. c) Both qcow2_compress() and qcow2_co_do_compress() return a negative value on failure, but qcow2_co_pwritev_compressed() stores the value in an unsigned data type. Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Kevin Wolf	db04524f82	qcow2: Fix qcow2_make_empty() with external data file make_completely_empty() is an optimisated path for bdrv_make_empty() where completely new metadata is created inside the image file instead of going through all clusters and discarding them. For an external data file, however, we actually need to do discard operations on the data file; just overwriting the qcow2 file doesn't get rid of the data. The necessary slow path with an explicit discard operation already exists for other cases. Use it for external data files, too. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-30 15:29:00 +02:00
Kevin Wolf	718c0fce2f	qcow2: Fix full preallocation with external data file preallocate_co() already gave the data file the full size without forwarding the requested preallocation mode to the protocol. When bdrv_co_truncate() was called later with the preallocation mode, the file didn't actually grow any more, so the data file stayed unallocated even if full preallocation was requested. Pass the right preallocation mode to preallocate_co() and remove the second bdrv_co_truncate() to fix this. As a side effect, the ugly one-byte write in preallocate_co() is replaced with a truncate call, now leaving the last block unallocated on the protocol level as it should be. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-30 15:29:00 +02:00
Kevin Wolf	360bd07471	qcow2: Add errp to preallocate_co() We'll add a bdrv_co_truncate() call in the next patch which can return an Error that we don't want to discard. So add an errp parameter to preallocate_co(). Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-30 15:29:00 +02:00
Kevin Wolf	f29fbf7c6b	qcow2: Avoid COW during metadata preallocation Limiting the allocation to INT_MAX bytes isn't particularly clever because it means that the final cluster will be a partial cluster which will be completed through a COW operation. This results in unnecessary data read and write requests which lead to an unwanted non-sparse filesystem block for metadata preallocation. Align the maximum allocation size down to the cluster size to avoid this situation. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-30 15:29:00 +02:00
Eric Blake	de38b5005e	qemu-img: Saner printing of large file sizes Disk sizes close to INT64_MAX cause overflow, for some pretty ridiculous output: $ ./nbdkit -U - memory size=$((2**63 - 512)) --run 'qemu-img info $nbd' image: nbd+unix://?socket=/tmp/nbdkitHSAzNz/socket file format: raw virtual size: -8388607T (9223372036854775296 bytes) disk size: unavailable But there's no reason to have two separate implementations of integer to human-readable abbreviation, where one has overflow and stops at 'T', while the other avoids overflow and goes all the way to 'E'. With this patch, the output now claims 8EiB instead of -8388607T, which really is the correct rounding of largest file size supported by qemu (we could go 511 bytes larger if we used byte-accurate sizing instead of rounding up to the next sector boundary, but that wouldn't change the human-readable result). Quite a few iotests need updates to expected output to match. Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Max Reitz <mreitz@redhat.com>	2019-04-30 15:29:00 +02:00
Stefano Garzarella	0cb98af218	block/vhdx: Use IEC binary prefixes for size constants Using IEC binary prefixes in order to make the code more readable, with the exception of DEFAULT_LOG_SIZE because it's passed to stringify(). Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Stefano Garzarella	e9991e29ea	block/vhdx: Remove redundant IEC binary prefixes definition IEC binary prefixes are already defined in "qemu/units.h", so we can remove redundant definitions in "block/vhdx.h". Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Sam Eiderman	7502be838e	vmdk: Set vmdk parent backing_format to vmdk Commit `b69864e5a` ("vmdk: Support version=3 in VMDK descriptor files") fixed the probe function to correctly guess vmdk descriptors with version=3. This solves the issue where vmdk snapshot with parent vmdk descriptor containing "version=3" would be treated as raw instead vmdk. In the future case where a new vmdk version is introduced, we will again experience this issue, even if the user will provide "-f vmdk" it will only apply to the tip image and not to the underlying "misprobed" parent image. The code in vmdk.c already assumes that the backing file of vmdk must be vmdk (see vmdk_is_cid_valid which returns 0 if backing file is not vmdk). So let's make it official by supplying the backing_format as vmdk. Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Reviewed-By: Liran Alon <liran.alon@oracle.com> Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Signed-off-by: Shmuel Eiderman <shmuel.eiderman@oracle.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <fam@euphon.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Zhengui li	126734c4f7	vpc: unlock Coroutine lock to make IO submit Concurrently Concurrent IO becomes serial IO because of the qemu Coroutine lock, which reduce IO performance severely. So unlock Coroutine lock before bdrv_co_pwritev and bdrv_co_preadv to fix it. Signed-off-by: Zhengui li <lizhengui@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-30 15:29:00 +02:00
Markus Armbruster	e1ce7d747b	block/qapi: Clean up how we print to monitor or stdout bdrv_snapshot_dump(), bdrv_image_info_specific_dump(), bdrv_image_info_dump() and their helpers take an fprintf()-like callback and a FILE * to pass to it. hmp.c passes monitor_printf() cast to fprintf_function and the current monitor cast to FILE *. qemu-img.c and qemu-io-cmds.c pass fprintf and stdout. The type-punning is technically undefined behaviour, but works in practice. Clean up: drop the callback, and call qemu_printf() instead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190417191805.28198-8-armbru@redhat.com>	2019-04-18 22:18:59 +02:00
Markus Armbruster	6b3048cee0	block/ssh: Do not report read/write/flush errors to the user Callbacks ssh_co_readv(), ssh_co_writev(), ssh_co_flush() report errors to the user with error_printf(). They shouldn't, it's their caller's job. Replace by a suitable trace point. While there, drop the unreachable !s->sftp case. Perhaps we should convert this part of the block driver interface to Error, so block drivers can pass more detail to their callers. Not today. Cc: "Richard W.M. Jones" <rjones@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190417190641.26814-3-armbru@redhat.com>	2019-04-17 21:21:49 +02:00
Kevin Wolf	93e32b3e20	qcow2: Fix preallocation bdrv_pwrite to wrong file With an external data file, preallocate_co() must write the final byte to the external data file, not to the qcow2 image file. This is harmless for preallocation of newly created images (only the qcow2 file size is increased to the virtual disk size while it should be much smaller), but with preallocated resize, it could in theory cause visible corruption if the metadata of the image is larger than the data (e.g. lots of bitmaps). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2019-04-16 16:23:24 +02:00
Alberto Garcia	20509c4b8b	block: freeze the backing chain earlier in stream_start() Commit `6585493369` added code to freeze the backing chain from 'top' to 'base' for the duration of the block-stream job. The problem is that the freezing happens too late in stream_start(): during the bdrv_reopen_set_read_only() call earlier in that function another job can jump in and remove the base image. If that happens we have an invalid chain and QEMU crashes. This patch puts the bdrv_freeze_backing_chain() call at the beginning of the function. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-02 12:04:44 +02:00
Vladimir Sementsov-Ogievskiy	696aaaed57	block/file-posix: do not fail on unlock bytes bdrv_replace_child() calls bdrv_check_perm() with error_abort on loosening permissions. However file-locking operations may fail even in this case, for example on NFS. And this leads to Qemu crash. Let's avoid such errors. Note, that we ignore such things anyway on permission update commit and abort. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-02 12:04:44 +02:00
Stefano Garzarella	de23e72bb7	block/gluster: limit the transfer size to 512 MiB Several versions of GlusterFS (3.12? -> 6.0.1) fail when the transfer size is greater or equal to 1024 MiB, so we are limiting the transfer size to 512 MiB to avoid this rare issue. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1691320 Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-04-02 12:04:44 +02:00
Eric Blake	75d34eb98c	nbd/client: Trace server noncompliance on structured reads Just as we recently added a trace for a server sending block status that doesn't match the server's advertised minimum block alignment, let's do the same for read chunks. But since qemu 3.1 is such a server (because it advertised 512-byte alignment, but when serving a file that ends in data but is not sector-aligned, NBD_CMD_READ would detect a mid-sector change between data and hole at EOF and the resulting read chunks are unaligned), we don't want to change our behavior of otherwise tolerating unaligned reads. Note that even though we fixed the server for 4.0 to advertise an actual block alignment (which gets rid of the unaligned reads at EOF for posix files), we can still trigger it via other means: $ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file Arguably, that is a bug in the blkdebug block status function, for leaking a block status that is not aligned. It may also be possible to observe issues with a backing layer with smaller alignment than the active layer, although so far I have been unable to write a reliable iotest for that scenario. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190330165349.32256-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-04-01 08:58:04 -05:00
Eric Blake	4841211e0d	block: Add bdrv_get_request_alignment() The next patch needs access to a device's minimum permitted alignment, since NBD wants to advertise this to clients. Add an accessor function, borrowing from blk_get_max_transfer() for accessing a backend's block limits. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190329042750.14704-6-eblake@redhat.com>	2019-04-01 08:46:52 -05:00
Eric Blake	9cf638508c	nbd/client: Support qemu-img convert from unaligned size If an NBD server advertises a size that is not a multiple of a sector, the block layer rounds up that size, even though we set info.size to the exact byte value sent by the server. The block layer then proceeds to let us read or query block status on the hole that it added past EOF, which the NBD server is unlikely to be happy with. Fortunately, qemu as a server never advertizes an unaligned size, so we generally don't run into this problem; but the nbdkit server makes it easy to test: $ printf %1000d 1 > f1 $ ~/nbdkit/nbdkit -fv file f1 & pid=$! $ qemu-img convert -f raw nbd://localhost:10809 f2 $ kill $pid $ qemu-img compare f1 f2 Pre-patch, the server attempts a 1024-byte read, which nbdkit rightfully rejects as going beyond its advertised 1000 byte size; the conversion fails and the output files differ (not even the first sector is copied, because qemu-img does not follow ddrescue's habit of trying smaller reads to get as much information as possible in spite of errors). Post-patch, the client's attempts to read (and query block status, for new enough nbdkit) are properly truncated to the server's length, with sane handling of the hole the block layer forced on us. Although f2 ends up as a larger file (1024 bytes instead of 1000), qemu-img compare shows the two images to have identical contents for display to the guest. I didn't add iotests coverage since I didn't want to add a dependency on nbdkit in iotests. I also did NOT patch write, trim, or write zeroes - these commands continue to fail (usually with ENOSPC, but whatever the server chose), because we really can't write to the end of the file, and because 'qemu-img convert' is the most common case where we care about being tolerant (which is read-only). Perhaps we could truncate the request if the client is writing zeros to the tail, but that seems like more work, especially if the block layer is fixed in 4.1 to track byte-accurate sizing (in which case this patch would be reverted as unnecessary). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190329042750.14704-5-eblake@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com>	2019-04-01 08:32:44 -05:00
Eric Blake	a62a85ef5c	nbd/client: Report offsets in bdrv_block_status It is desirable for 'qemu-img map' to have the same output for a file whether it is served over file or nbd protocols. However, ever since we implemented block status for NBD (2.12), the NBD protocol forgot to inform the block layer that as the final layer in the chain, the offset is valid; without an offset, the human-readable form of qemu-img map gives up with the unhelpful: $ nbdkit -U - data data="1" size=512 --run 'qemu-img map $nbd' Offset Length Mapped to File qemu-img: File contains external, encrypted or compressed clusters. The --output=json form always works, because it is reporting the lower-level bdrv_block_status results directly rather than trying to filter out sparse ranges for human consumption - but now it also shows the offset member. With this patch, the human output changes to: Offset Length Mapped to File 0 0x200 0 nbd+unix://?socket=/tmp/nbdkitOxeoLa/socket This change is observable to several iotests. Fixes: `78a33ab5` Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190329042750.14704-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 20:52:29 -05:00
Eric Blake	7da537f70d	nbd/client: Lower min_block for block-status, unaligned size We have a latent bug in our NBD client code, tickled by the brand new nbdkit 1.11.10 block status support: $ nbdkit --filter=log --filter=truncate -U - \ data data="1" size=511 truncate=64K logfile=/dev/stdout \ --run 'qemu-img convert $nbd /var/tmp/out' ... qemu-img: block/io.c:2122: bdrv_co_block_status: Assertion `pnum && QEMU_IS_ALIGNED(pnum, align) && align > offset - aligned_offset' failed. The culprit? Our implementation of .bdrv_co_block_status can return unaligned block status for any server that operates with a lower actual alignment than what we tell the block layer in request_alignment, in violation of the block layer's constraints. To date, we've been unable to trip the bug, because qemu as NBD server always advertises block sizing (at which point it is a server bug if the server sends unaligned status - although qemu 3.1 is such a server and I've sent separate patches for 4.0 both to get the server to obey the spec, and to let the client to tolerate server oddities at EOF). But nbdkit does not (yet) advertise block sizing, and therefore is not in violation of the spec for returning block status at whatever boundaries it wants, and those unaligned results can occur anywhere rather than just at EOF. While we are still wise to avoid sending sub-sector read/write requests to a server of unknown origin, we MUST consider that a server telling us block status without an advertised block size is correct. So, we either have to munge unaligned answers from the server into aligned ones that we hand back to the block layer, or we have to tell the block layer about a smaller alignment. Similarly, if the server advertises an image size that is not sector-aligned, we might as well assume that the server intends to let us access those tail bytes, and therefore supports a minimum block size of 1, regardless of whether the server supports block status (although we still need more patches to fix the problem that with an unaligned image, we can send read or block status requests that exceed EOF to the server). Again, qemu as server cannot trip this problem (because it rounds images to sector alignment), but nbdkit advertised unaligned size even before it gained block status support. Solve both alignment problems at once by using better heuristics on what alignment to report to the block layer when the server did not give us something to work with. Note that very few NBD servers implement block status (to date, only qemu and nbdkit are known to do so); and as the NBD spec mentioned block sizing constraints prior to documenting block status, it can be assumed that any future implementations of block status are aware that they must advertise block size if they want a minimum size other than 1. We've had a long history of struggles with picking the right alignment to use in the block layer, as evidenced by the commit message of `fd8d372d` (v2.12) that introduced the current choice of forced 512-byte alignment. There is no iotest coverage for this fix, because qemu can't provoke it, and I didn't want to make test 241 dependent on nbdkit. Fixes: `fd8d372d` Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190329042750.14704-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Richard W.M. Jones <rjones@redhat.com>	2019-03-30 20:52:19 -05:00
Eric Blake	737d3f5244	nbd-client: Work around server BLOCK_STATUS misalignment at EOF The NBD spec is clear that a server that advertises a minimum block size should reply to NBD_CMD_BLOCK_STATUS with extents aligned accordingly. However, we know that the qemu NBD server implementation has had a corner-case bug where it is not compliant with the spec, present since the introduction of NBD_CMD_BLOCK_STATUS in qemu 2.12 (and unlikely to be patched in time for 4.0). Namely, when qemu is serving a file that is not a multiple of 512 bytes, it rounds the size advertised over NBD up to the next sector boundary (someday, I'd like to fix that to be byte-accurate, but it's a much bigger audit not appropriate for this release); yet if the final sector contains data prior to EOF, lseek(SEEK_HOLE) will point to the implicit hole mid-sector which qemu then reported over NBD. We are well within our rights to hang up on a server that can't follow the spec, but it is more useful to try and keep the connection alive in spite of the problem. Do so by tracing a message about the problem, and then either truncating the request back to an aligned boundary (if it covered more than the final sector) or widening it out to the full boundary with a forced status of data (since truncating would result in 0 bytes, but we have to make progress, and valid since data is a default-safe answer). And in practice, since the problem only happens on a sector that starts with data and ends with a hole, we are going to want to read that full sector anyway (where qemu as the server fills in the tail beyond EOF with appropriate NUL bytes). Easy reproduction: $ printf %1000d 1 > file $ qemu-nbd -f raw -t file & pid=$! $ qemu-img map --output=json -f raw nbd://localhost:10809 qemu-img: Could not read file metadata: Invalid argument $ kill $pid where the patched version instead succeeds with: [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true}] Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190326171317.4036-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 10:06:08 -05:00
Eric Blake	ebd82cd872	nbd: Permit simple error to NBD_CMD_BLOCK_STATUS The NBD spec is clear that when structured replies are active, a simple error reply is acceptable to any command except for NBD_CMD_READ. However, we were mistakenly requiring structured errors for NBD_CMD_BLOCK_STATUS, and hanging up on a server that gave a simple error (since qemu does not behave as such a server, we didn't notice the problem until now). Broken since its introduction in commit `78a33ab5` (v2.12). Noticed while debugging a separate failure reported by nbdkit while working out its initial implementation of BLOCK_STATUS, although it turns out that nbdkit also chose to send structured error replies for BLOCK_STATUS, so I had to manually provoke the situation by hacking qemu's server to send a simple error reply: \| diff --git i/nbd/server.c w/nbd/server.c \| index fd013a2817a..833288d7c45 100644 \| 00--- i/nbd/server.c \| +++ w/nbd/server.c \| @@ -2269,6 +2269,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, \| "discard failed", errp); \| \| case NBD_CMD_BLOCK_STATUS: \| + return nbd_co_send_simple_reply(client, request->handle, ENOMEM, \| + NULL, 0, errp); \| if (!request->len) { \| return nbd_send_generic_reply(client, request->handle, -EINVAL, \| "need non-zero length", errp); \| Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Message-Id: <20190325190104.30213-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 10:06:08 -05:00
Eric Blake	b29f3a3d2a	nbd: Don't lose server's error to NBD_CMD_BLOCK_STATUS When the server replies with a (structured []) error to NBD_CMD_BLOCK_STATUS, without any extent information sent first, the client code was blindly throwing away the server's error code and instead telling the caller that EIO occurred. This has been broken since its introduction in `78a33ab5` (v2.12, where we should have called: error_setg(&local_err, "Server did not reply with any status extents"); nbd_iter_error(&iter, false, -EIO, &local_err); to declare the situation as a non-fatal error if no earlier error had already been flagged, rather than just blindly slamming iter.err and iter.ret), although it is more noticeable since commit `7f86068d`, which actually tries hard to preserve the server's code thanks to a separate iter.request_ret. [] The spec is clear that the server is also permitted to reply with a simple error, but that's a separate fix. I was able to provoke this scenario with a hack to the server, then seeing whether ENOMEM makes it back to the caller: \| diff --git a/nbd/server.c b/nbd/server.c \| index fd013a2817a..29c7995de02 100644 \| --- a/nbd/server.c \| +++ b/nbd/server.c \| @@ -2269,6 +2269,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client, \| "discard failed", errp); \| \| case NBD_CMD_BLOCK_STATUS: \| + return nbd_send_generic_reply(client, request->handle, -ENOMEM, \| + "no status for you today", errp); \| if (!request->len) { \| return nbd_send_generic_reply(client, request->handle, -EINVAL, \| "need non-zero length", errp); \| -- Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190325190104.30213-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 10:06:08 -05:00
Eric Blake	a39286dd61	nbd: Tolerate some server non-compliance in NBD_CMD_BLOCK_STATUS The NBD spec states that NBD_CMD_FLAG_REQ_ONE (which we currently always use) should not reply with an extent larger than our request, and that the server's response should be exactly one extent. Right now, that means that if a server sends more than one extent, we treat the server as broken, fail the block status request, and disconnect, which prevents all further use of the block device. But while good software should be strict in what it sends, it should be tolerant in what it receives. While trying to implement NBD_CMD_BLOCK_STATUS in nbdkit, we temporarily had a non-compliant server sending too many extents in spite of REQ_ONE. Oddly enough, 'qemu-img convert' with qemu 3.1 failed with a somewhat useful message: qemu-img: Protocol error: invalid payload for NBD_REPLY_TYPE_BLOCK_STATUS which then disappeared with commit `d8b4bad8`, on the grounds that an error message flagged only at the time of coroutine teardown is pointless, and instead we should rely on the actual failed API to report an error - in other words, the 3.1 behavior was masking the fact that qemu-img was not reporting an error. That has since been fixed in the previous patch, where qemu-img convert now fails with: qemu-img: error while reading block status of sector 0: Invalid argument But even that is harsh. Since we already partially relaxed things in commit `acfd8f7a` to tolerate a server that exceeds the cap (although that change was made prior to the NBD spec actually putting a cap on the extent length during REQ_ONE - in fact, the NBD spec change was BECAUSE of the qemu behavior prior to that commit), it's not that much harder to argue that we should also tolerate a server that sends too many extents. But at the same time, it's nice to trace when we are being tolerant of server non-compliance, in order to help server writers fix their implementations to be more portable (if they refer to our traces, rather than just stderr). Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190323212639.579-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>	2019-03-30 10:06:08 -05:00
Kevin Wolf	738301e117	file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes We know that the kernel implements a slow fallback code path for BLKZEROOUT, so if BDRV_REQ_NO_FALLBACK is given, we shouldn't call it. The other operations we call in the context of .bdrv_co_pwrite_zeroes should usually be quick, so no modification should be needed for them. If we ever notice that there are additional problematic cases, we can still make these conditional as well. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Kevin Wolf	80f5c33ff3	block: Advertise BDRV_REQ_NO_FALLBACK in filter drivers Filter drivers that support .bdrv_co_pwrite_zeroes can safely advertise BDRV_REQ_NO_FALLBACK because they just forward the request flags to their child node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Kevin Wolf	fe0480d629	block: Add BDRV_REQ_NO_FALLBACK For qemu-img convert, we want an operation that zeroes out the whole image if this can be done efficiently, but that returns an error otherwise so we don't write explicit zeroes and immediately overwrite them with the real data, potentially doubling the amount of data to be written. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Kevin Wolf	48ce986096	block: Remove error messages in bdrv_make_zero() There is only a single caller of bdrv_make_zero(), which is qemu-img convert. If the function fails, we just fall back to a different method of zeroing out blocks on the target image. There is no good reason to print error messages on stderr when the higher level operation will actually succeed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Eric Blake <eblake@redhat.com>	2019-03-26 11:37:51 +01:00
Markus Armbruster	a9779a3ab0	trace-events: Delete unused trace points Tracked down with cleanup-trace-events.pl. Funnies requiring manual post-processing: * block.c and blockdev.c trace points are in block/trace-events. * hw/block/nvme.c uses the preprocessor to hide its trace point use from cleanup-trace-events.pl. * include/hw/xen/xen_common.h trace points are in hw/xen/trace-events. * net/colo-compare and net/filter-rewriter.c use pseudo trace points colo_compare_udp_miscompare and colo_filter_rewriter_debug to guard debug code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20190314180929.27722-5-armbru@redhat.com Message-Id: <20190314180929.27722-5-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-03-22 16:18:07 +00:00
Markus Armbruster	500016e5db	trace-events: Shorten file names in comments We spell out sub/dir/ in sub/dir/trace-events' comments pointing to source files. That's because when trace-events got split up, the comments were moved verbatim. Delete the sub/dir/ part from these comments. Gets rid of several misspellings. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190314180929.27722-3-armbru@redhat.com Message-Id: <20190314180929.27722-3-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-03-22 16:18:07 +00:00
Alberto Garcia	782b9d06bf	block: Make bdrv_{copy_on_read,crypto_luks,replication} static Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-03-19 15:49:29 +01:00

1 2 3 4 5 ...

4270 Commits