qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h. Include it in
sysemu/os-posix.h and remove it from everywhere else.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
commit 243e6f69c1 ("m25p80: Switch to byte-based block access")
replaced blk_read() calls with blk_pread() but return values are
different.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit f9e98e5d7a ("xen/blkif: Avoid double access to
src->nr_segments") didn't go far enough: src->operation is also being
used twice. And nothing was done to prevent the compiler from using the
source side of the copy done by blk_get_request() (granted that's very
unlikely).
Move the barrier()s up, and add another one to blk_get_request().
Note that for completing XSA-155, the barrier() getting added to
blk_get_request() would suffice, and hence the changes to xen_blkif.h
are more like just cleanup. And since, as said, the unpatched code
getting compiled to something vulnerable is very unlikely (and not
observed in practice), this isn't being viewed as a new security issue.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Replace (((n) + (d) - 1) /(d)) by DIV_ROUND_UP(n,d).
This patch is the result of coccinelle script
scripts/coccinelle/round.cocci
CC: qemu-block@nongnu.org
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Block layer is prepared to unspecialize dataplane, an evidence is this
almost complete list of unblocked operations. It has all types except
two (actually three if DATAPLANE itself counts but blockdev.c makes sure
attaching twice is not possible): MIRROR_TARGET and BACKUP_TARGET.
blockdev-mirror refuses to start if target is attached, so the first is
not a problem.
By removing BACKUP_TARGET, blockdev-backup will become permissive to
write to a virtio-blk dataplane disk, but that is not worse than
non-dataplane given the latter is already possible. In either case,
blockdev.c always checks the target and source are on the same
AioContext, or bring them together if possible.
Signed-off-by: Fam Zheng <famz@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 1463969978-24970-4-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Move it to the actual users. There are still a few includes of
qemu/bswap.h in headers; removing them is left for future work.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Sector-based blk_read() should die; switch to byte-based
blk_pread() instead.
Likewise for blk_aio_readv() and blk_aio_writev().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead. Likewise for blk_read().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead. Likewise for blk_read().
This particular device picks its size during onenand_initfn(),
and can be at most 0x80000000 bytes; therefore, shifting an
'int sec' request to get back to a byte offset should never
overflow 32 bits. But adding assertions to document that point
should not hurt.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead. Likewise for blk_read().
This file is doing some complex computations to map various
flash page sizes (256, 512, and 2048) atop generic uses of
512-byte sector operations. Perhaps someone will want to tidy
up the file for fewer gymnastics in managing addresses and
offsets, and less wasteful visits of 256-byte pages, but it
was out of scope for this series, where I just went with the
mechanical conversion.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based blk_write() should die; switch to byte-based
blk_pwrite() instead. Likewise for blk_read().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based blk_aio_readv() and blk_aio_writev() should die; switch
to byte-based blk_aio_preadv() and blk_aio_pwritev() instead.
The trace is modified at the same time, and nb_sectors is now
unused. Fix a comment typo while in the vicinity.
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Sector-based blk_read() should die; convert the one-off
variant blk_read_unthrottled().
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Pflash migration (e.g. q35 + EFI variable storage) fails
with the assert:
bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed.
This avoids the problem by delaying the pflash update until after
the device loads complete.
Tested by:
Migrating Q35/EFI vm.
Changing efi variable content (with efiboot in the guest)
md5sum'ing the variable file before migration and after.
This is a fix that Paolo posted in the message
570244B3.4070105@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Eliminating the reentrancy is actually a nice thing that we can do
with the API that Michael proposed, so let's make it first class.
This also hides the complex assign/set_handler conventions from
callers of virtio_queue_aio_set_host_notifier_handler, which in
fact was always called with assign=true.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In addition to handling IO in vcpu thread and in io thread, dataplane
introduces yet another mode: handling it by AioContext.
This reuses the same handler as previous modes, which triggers races as
these were not designed to be reentrant. Use a separate handler just
for aio, and disable regular handlers when dataplane is active.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We must not call virtio_blk_data_plane_notify if dataplane is
disabled: we would hit a segmentation fault in notify_guest_bh as
s->guest_notifier has not been setup and is NULL.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-12-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-11-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Implements FSR register, it is used for busy waits.
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-10-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Adds fast read and 4bytes commands family.
This work is based on Pawel Lenkow patch from v1.
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-9-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Use the setting from the volatile cfg register to correctly
set the number of dummy cycles.
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-8-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch adds both volatile and non volatile configuration registers
and commands to allow modify them. It is needed for proper handling
dummy cycles. Initialization of those registers and flash state
has been included as well.
Some of this registers are used by kernel.
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Acked-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-7-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This patch adds only 4byte address mode (does not cover dummy cycles).
This mode is needed to access more than 16 MiB of flash.
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-6-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Extend address mode allows to switch flash 16 MiB banks,
allowing user to access all flash sectors.
This access mode is used by u-boot.
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-5-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Extend the width of the flags variable to support the already existing
(but unused) WR_1 flag, which is above the range of 8 bits.
This allows support of EEPROM emulation which requires the WR_1 feature.
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-4-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-3-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Message-id: 1458719789-29868-2-git-send-email-marcin.krzeminski@nokia.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Move declarations out of qemu-common.h for functions declared in
utils/ files: e.g. include/qemu/path.h for utils/path.c.
Move inline functions out of qemu-common.h and into new files (e.g.
include/qemu/bcd.h)
Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch replaces get_ticks_per_sec() calls with the macro
NANOSECONDS_PER_SECOND. Also, as there are no callers, get_ticks_per_sec()
is then removed. This replacement improves the readability and
understandability of code.
For example,
timer_mod(fdctrl->result_timer,
qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + (get_ticks_per_sec() / 50));
NANOSECONDS_PER_SECOND makes it obvious that qemu_clock_get_ns
matches the unit of the expression on the right side of the plus.
Signed-off-by: Rutuja Shah <rutu.shah.26@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit 57cb38b included qapi/error.h into qemu/osdep.h to get the
Error typedef. Since then, we've moved to include qemu/osdep.h
everywhere. Its file comment explains: "To avoid getting into
possible circular include dependencies, this file should not include
any other QEMU headers, with the exceptions of config-host.h,
compiler.h, os-posix.h and os-win32.h, all of which are doing a
similar job to this file and are under similar constraints."
qapi/error.h doesn't do a similar job, and it doesn't adhere to
similar constraints: it includes qapi-types.h. That's in excess of
100KiB of crap most .c files don't actually need.
Add the typedef to qemu/typedefs.h, and include that instead of
qapi/error.h. Include qapi/error.h in .c files that need it and don't
get it now. Include qapi-types.h in qom/object.h for uint16List.
Update scripts/clean-includes accordingly. Update it further to match
reality: replace config.h by config-target.h, add sysemu/os-posix.h,
sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h
comment quoted above similarly.
This reduces the number of objects depending on qapi/error.h from "all
of them" to less than a third. Unfortunately, the number depending on
qapi-types.h shrinks only a little. More work is needed for that one.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
[Fix compilation without the spice devel packages. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Before this patch, blk_new() automatically assigned a name to the new
BlockBackend and considered it referenced by the monitor. This patch
removes the implicit monitor_add_blk() call from blk_new() (and
consequently the monitor_remove_blk() call from blk_delete(), too) and
thus blk_new() (and related functions) no longer take a BB name
argument.
In fact, there is only a single point where blk_new()/blk_new_open() is
called and the new BB is monitor-owned, and that is in blockdev_init().
Besides thus relieving us from having to invent names for all of the BBs
we use in qemu-img, this fixes a bug where qemu cannot create a new
image if there already is a monitor-owned BB named "image".
If a BB and its BDS tree are created in a single operation, as of this
patch the BDS tree will be created before the BB is given a name
(whereas it was the other way around before). This results in minor
change to the output of iotest 087, whose reference output is amended
accordingly.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When populating ACPI objects for floppy drives one needs to provide the
maximum values for cylinder, sector, and head number the drive supports.
This patch adds a function that iterates through the array of predefined
floppy drive formats and returns the maximum values of c, h, s, out of
those matching the given floppy drive type.
Signed-off-by: Roman Kagan <rkagan@virtuozzo.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: John Snow <jsnow@redhat.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Kevin O'Connor <kevin@koconnor.net>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
In disabled mode, virtio-blk dataplane seems to be enabled, but flow
actually goes through the normal virtio path. This patch simplifies a bit
the handling of disabled mode. In disabled mode, virtio_blk_handle_output
might be called even if s->dataplane is not NULL.
This is a bit tricky, because the current check for s->dataplane will
always trigger, causing a continuous stream of calls to
virtio_blk_data_plane_start. Unfortunately, these calls will not
do anything. To fix this, set the "started" flag even in disabled
mode, and skip virtio_blk_data_plane_start if the started flag is true.
The resulting changes also prepare the code for the next patch, were
virtio-blk dataplane will reuse the same virtio_blk_handle_output function
as "regular" virtio-blk.
Because struct VirtIOBlockDataPlane is opaque in virtio-blk.c, we have
to move s->dataplane->started inside struct VirtIOBlock.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Make the API more similar to the regular virtqueue API. This will
help when modifying the code to not use vring.c anymore.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Include osdep.h as the first header in nand.c; this has to be
done manually because coccinelle gets confused by the way that
this C file includes itself.
We fix some odd spacing in #includes while we are in the area.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Coverity noticed that some variables are only used by debug prints, and
called them unused. Always compile the print statements. While we're
here, print to stderr as well.
Bonus: Fix a debug printf I broke in f31937aa8
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[Touched up commit message. --js]
Message-id: 1454971529-14830-1-git-send-email-jsnow@redhat.com
Included here:
Refactoring and bugfix patches in PC/ACPI.
New commands for ipmi.
Virtio optimizations.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJWtj8KAAoJECgfDbjSjVRpBIQIAJSB9xwTcBLXwD0+8z5lqjKC
GTtuVbHU0+Y/eO8O3llN5l+SzaRtPHo18Ele20Oz7IQc0ompANY273K6TOlyILwB
rOhrub71uqpOKbGlxXJflroEAXb78xVK02lohSUvOzCDpwV+6CS4ZaSer7yDCYkA
MODZj7rrEuN0RmBWqxbs1R7Mj2CeQJzlgTUNTBGCLEstoZGFOJq8FjVdG5P1q8vI
fnI9mGJ1JsDnmcUZe/bTFfB4VreqeQ7UuGyNAMMGnvIbr0D1a+CoaMdV7/HZ+KyT
5TIs0siVdhZei60A/Cq2OtSVCbj5QdxPBLhZfwJCp6oU4lh2U5tSvva0mh7MwJ0=
=D/cA
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pc and misc cleanups and fixes, virtio optimizations
Included here:
Refactoring and bugfix patches in PC/ACPI.
New commands for ipmi.
Virtio optimizations.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Sat 06 Feb 2016 18:44:26 GMT using RSA key ID D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>"
* remotes/mst/tags/for_upstream: (45 commits)
net: set endianness on all backend devices
fix MSI injection on Xen
intel_iommu: large page support
dimm: Correct type of MemoryHotplugState->base
pc: set the OEM fields in the RSDT and the FADT from the SLIC
acpi: add function to extract oem_id and oem_table_id from the user's SLIC
acpi: expose oem_id and oem_table_id in build_rsdt()
acpi: take oem_id in build_header(), optionally
pc: Eliminate PcGuestInfo struct
pc: Move APIC and NUMA data from PcGuestInfo to PCMachineState
pc: Move PcGuestInfo.fw_cfg to PCMachineState
pc: Remove PcGuestInfo.isapc_ram_fw field
pc: Remove RAM size fields from PcGuestInfo
pc: Remove compat fields from PcGuestInfo
acpi: Don't save PcGuestInfo on AcpiBuildState
acpi: Remove guest_info parameters from functions
pc: Simplify xen_load_linux() signature
pc: Simplify pc_memory_init() signature
pc: Eliminate struct PcGuestInfoState
pc: Move PcGuestInfo declaration to top of file
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Move allocation to virtio functions also when loading/saving a
VirtQueueElement. This will also let the load/save functions
keep backwards compatibility when the VirtQueueElement layout
is changed.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The return code of virtqueue_pop/vring_pop is unused except to check for
errors or 0. We can thus easily move allocation inside the functions
and just return a pointer to the VirtQueueElement.
The advantage is that we will be able to allocate only the space that
is needed for the actual size of the s/g list instead of the full
VIRTQUEUE_MAX_SIZE items. Currently VirtQueueElement takes about 48K
of memory, and this kind of allocation puts a lot of stress on malloc.
By cutting the size by two or three orders of magnitude, malloc can
use much more efficient algorithms.
The patch is pretty large, but changes to each device are testable
more or less independently. Splitting it would mostly add churn.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
All functions relative to DMA (DMA_*() functions) are stubs on sparc platform.
Disable the DMA in the floppy controller, instead of calling these stubs.
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Message-id: 1453843944-26833-14-git-send-email-hpoussin@reactos.org
Signed-off-by: John Snow <jsnow@redhat.com>
Accidentally, I removed a "feature" where empty drives had geometry
values applied to them, which allows seek on empty drives to work
"by accident," as QEMU actually tries to disallow that.
Seeks on empty drives should work, though, but the easiest thing is to
restore the misfeature where empty drives have non-zero geometries
applied.
Document the hack accordingly.
[Maintainer edit]
This fix corrects a regression introduced in d5d47efc, where
pick_geometry was modified such that it would not operate on empty
drives, and as a result if there is no diskette inserted, QEMU
no longer populates it with geometry bounds. As a result, seek fails
when QEMU denies to move the current track, but reports success anyway.
This can confuse the guest, leading to kernel panics in the guest.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1454106932-17236-1-git-send-email-jsnow@redhat.com
Put the code for setting up and removing op blockers into an own
function, respectively. Then, we can invoke those functions whenever a
BDS is removed from an virtio-blk BB or inserted into it.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This reverts the changes that commit
2e1280e8ff applied to hw/block/fdc.c;
also, an additional case of drv->media_inserted use has crept in since,
which is replaced by a call to blk_is_inserted().
That commit changed tests/fdc-test.c, too, because after it, one less
TRAY_MOVED event would be emitted when executing 'change' on an empty
drive. However, now, no TRAY_MOVED events will be emitted at all, and
the tray_open status returned by query-block will always be false,
necessitating (different) changes to tests/fdc-test.c and iotest 118,
which is why this patch is not a pure revert of said commit.
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 1454096953-31773-4-git-send-email-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Background on QEMU boot indices
-------------------------------
Normally, the "bootindex" property is configured for bootable devices
with:
DEVICE_instance_init()
device_add_bootindex_property(..., "bootindex", ...)
object_property_add(..., device_get_bootindex,
device_set_bootindex, ...)
and when the bootindex is set on the QEMU command line, with
-device DEVICE,...,bootindex=N
the setter that was configured above is invoked:
device_set_bootindex()
/* parse boot index */
visit_type_int32()
/* verify unicity */
check_boot_index()
/* store parsed boot index */
...
/* insert device path to boot order */
add_boot_device_path()
In the last step, add_boot_device_path() ensures that an OpenFirmware
device path will show up in the "bootorder" fw_cfg file, at a position
corresponding to the device's boot index. Thus guest firmware (SeaBIOS and
OVMF) can try to boot off the device with the right priority.
NVMe boot index
---------------
In QEMU commit 33739c7129,
nvma: ide: add bootindex to qom property
the following generic setters / getters:
- device_set_bootindex()
- device_get_bootindex()
were open-coded for NVMe, under the names
- nvme_set_bootindex()
- nvme_get_bootindex()
Plus nvme_instance_init() was added to configure the "bootindex" property
manually, designating the open-coded getter & setter, rather than calling
device_add_bootindex_property().
Crucially, nvme_set_bootindex() avoided the final add_boot_device_path()
call. This fact is spelled out in the message of commit 33739c7129, and
it was presumably the entire reason for all of the code duplication.
Now, Vladislav filed an RFE for OVMF
<https://github.com/tianocore/edk2/issues/48>; OVMF should boot off NVMe
devices. It is simple to build edk2's existent NvmExpressDxe driver into
OVMF, but the boot order matching logic in OVMF can only handle NVMe if
the "bootorder" fw_cfg file includes such devices.
Therefore this patch converts the NVMe device model to
device_set_bootindex() all the way.
Device paths
------------
device_set_bootindex() accepts an optional parameter called "suffix". When
present, it is expected to take the form of an OpenFirmware device path
node, and it gets appended as last node to the otherwise auto-generated
OFW path.
For NVMe, the auto-generated part is
/pci@i0cf8/pci8086,5845@6[,1]
^ ^ ^ ^
| | PCI slot and (present when nonzero)
| | function of the NVMe controller, both hex
| "driver name" component, built from PCI vendor & device IDs
PCI root at system bus port, PIO
to which here we append the suffix
/namespace@1,0
^ ^
| big endian (MSB at lowest address) numeric interpretation
| of the 64-bit IEEE Extended Unique Identifier, aka EUI-64,
| hex
32-bit NVMe namespace identifier, aka NSID, hex
resulting in the OFW device path
/pci@i0cf8/pci8086,5845@6[,1]/namespace@1,0
The reason for including the NSID and the EUI-64 is that an NVMe device
can in theory produce several different namespaces (distinguished by
NSID). Additionally, each of those may (optionally) have an EUI-64 value.
For now, QEMU only provides namespace 1.
Furthermore, QEMU doesn't even represent the EUI-64 as a standalone field;
it is embedded (and left unused) inside the "NvmeIdNs.res30" array, at the
last eight bytes. (Which is fine, since EUI-64 can be left zero-filled if
unsupported by the device.)
Based on the above, we set the "unit address" part of the last
("namespace") node to fixed "1,0".
OVMF will then map the above OFW device path to the following UEFI device
path fragment, for boot order processing:
PciRoot(0x0)/Pci(0x6,0x1)/NVMe(0x1,00-00-00-00-00-00-00-00)
^ ^ ^ ^ ^ ^
| | | | | octets of the EUI-64 in address order
| | | | NSID
| | | NVMe namespace messaging device path node
| PCI slot and function
PCI root bridge
Cc: Keith Busch <keith.busch@intel.com> (supporter:nvme)
Cc: Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core)
Cc: qemu-block@nongnu.org (open list:nvme)
Cc: Gonglei <arei.gonglei@huawei.com>
Cc: Vladislav Vovchenko <vladislav.vovchenko@sk.com>
Cc: Feng Tian <feng.tian@intel.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Kevin O'Connor <kevin@koconnor.net>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Acked-by: Gonglei <arei.gonglei@huawei.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Tested-by: Vladislav Vovchenko <vladislav.vovchenko@sk.com>
Message-id: 1453850483-27511-1-git-send-email-lersek@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
In Xen 4.7 we are refactoring parts libxenctrl into a number of
separate libraries which will provide backward and forward API and ABI
compatiblity.
One such library will be libxengnttab which provides access to grant
tables.
In preparation for this switch the compatibility layer in xen_common.h
(which support building with older versions of Xen) to use what will
be the new library API. This means that the gnttab shim will disappear
for versions of Xen which include libxengnttab.
To simplify things for the <= 4.0.0 support we wrap the int fd in a
malloc(sizeof int) such that the handle is always a pointer. This
leads to less typedef headaches and the need for
XC_HANDLER_INITIAL_VALUE etc for these interfaces.
Note that this patch does not add any support for actually using
libxengnttab, it just adjusts the existing shims.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
The 2.88 drive is more suitable as a default because
it can still read 1.44 images correctly, but the reverse
is not true.
Since there exist virtio-win drivers that are shipped on
2.88 floppy images, this patch will allow VMs booted without
a floppy disk inserted to later insert a 2.88MB floppy and
have that work.
This patch has been tested with msdos, freedos, fedora,
windows 8 and windows 10 without issue: if problems do
arise for certain guests being unable to cope with 2.88MB
drives as the default, they are in the minority and can use
type=144 as needed (or insert a proper boot medium and omit
type=144/288 or use type=auto) to obtain different drive types.
As icing, the default will remain auto/144 for any pre-2.6
machine types, hopefully minimizing the impact of this change
in legacy hw to basically zero.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1453495865-9649-13-git-send-email-jsnow@redhat.com
This one is the crazy one.
fd_revalidate currently uses pick_geometry to tell if the diskette
geometry has changed upon an eject/insert event, but it won't allow us
to insert a 1.44MB diskette into a 2.88MB drive. This is inflexible.
The new algorithm applies a new heuristic to guessing disk geometries
that allows us to switch diskette types as long as the physical size
matches before falling back to the old heuristic.
The old one is roughly:
- If the size (sectors) and type matches, choose it.
- Fall back to the first geometry that matched our type.
The new one is:
- If the size (sectors) and type matches, choose it.
- If the size (sectors) and physical size match, choose it.
- Fall back to the first geometry that matched our type.
Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1453495865-9649-11-git-send-email-jsnow@redhat.com
2.88MB capable drives can accept 1.44MB floppies,
for instance. To rework the pick_geometry function,
we need to know if our current drive can even accept
the type of disks we're considering.
NB: This allows us to distinguish between all of the
"total sectors" collisions between 1.20MB and 1.44MB
diskette types, by using the physical drive size as a
differentiator.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1453495865-9649-10-git-send-email-jsnow@redhat.com
This patch adds a new explicit Floppy Drive Type option. The existing
behavior in QEMU is to automatically guess a drive type based on the
media inserted, or if a diskette is not present, arbitrarily assign one.
This behavior can be described as "auto." This patch adds the option
to pick an explicit behavior: 120, 144, 288 or none. The new "auto"
option is intended to mimic current behavior, while the other types
pick one explicitly.
Set the type given by the CLI during fd_init. If the type remains the
default (auto), we'll attempt to scan an inserted diskette if present
to determine a type. If auto is selected but no diskette is present,
we fall back to a predetermined default (currently 1.44MB to match
legacy QEMU behavior.)
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1453495865-9649-9-git-send-email-jsnow@redhat.com
Currently, QEMU chooses a drive type automatically based on the inserted
media. If there is no disk inserted, it chooses a 1.44MB drive type.
Change this behavior to be configurable, but leave it defaulted to 1.44.
This is not earnestly intended to be used by a user or a management
library, but rather exists so that pre-2.6 board types can configure it
to be a legacy value.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1453495865-9649-8-git-send-email-jsnow@redhat.com
Split apart pick_geometry by creating a pick_drive routine that will only
ever called during device bring-up instead of relying on pick_geometry to
be used in both cases.
With this change, the drive field is changed to be 'write once'. It is
not altered after the initialization routines exit.
media_validated does not need to be migrated. The target VM
will just revalidate the media on post_load anyway.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1453495865-9649-7-git-send-email-jsnow@redhat.com
pick_geometry is a convoluted function that makes it difficult to tell
at a glance what QEMU's current behavior for choosing a floppy drive
type is when it can't quite identify the diskette.
The code iterates over all entries in the candidate geometry table
("fd_formats") and if our specific drive type matches a row in the table,
then either "match" is set to that entry (an exact match) and the loop
exits, or "first_match" will be non-negative (the first such entry that
shares the same drive type), and the loop continues. If our specific
drive type is NONE, then all drive types in the candidate geometry table
are considered. After iteration, if "match" was not set, we fall back to
"first match".
This means that either "match" was set, or we exited the loop without an
exact match, in which case:
- If drive type is NONE, the default is truly fd_formats[0], a 1.44MB
type, because "first_match" will always get set to the first item.
- If drive type is not NONE, pick_geometry's iteration was fussier and
only looked at rows that matched our drive type. However, since all
possible drive types are represented in the table, we still know that
"first match" was set.
- If drive type is not NONE and the fd_formats table lists no options for
our drive type, we choose fd_formats[1], an incomprehensibly bizarre
choice that can never happen anyway.
Correct this: If first_match is -1, it can ONLY mean we didn't edit our
fd_formats table correctly. Throw an assertion instead.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1453495865-9649-6-git-send-email-jsnow@redhat.com
Currently, 'drive' is used both to represent the current diskette
type as well as the current drive type.
This patch adds a 'disk' field that is updated explicitly to match
the type of the disk.
As of this patch, disk and drive are always the same, but forthcoming
patches to change the behavior of pick_geometry will invalidate this
assumption.
disk does not need to be migrated because it is not user-visible state
nor is it currently used for any calculations. It is purely informative,
and will be rebuilt automatically via fd_revalidate on the new host.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1453495865-9649-5-git-send-email-jsnow@redhat.com
Change the floppy drive type to a QAPI enum type, to allow us to
specify the floppy drive type from the CLI in a forthcoming patch.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1453495865-9649-4-git-send-email-jsnow@redhat.com
Modify this function to operate directly on FDrive objects instead of
unpacking and passing all of those parameters manually. Reduces the
complexity in the caller and reduces the number of args to just one.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1453495865-9649-3-git-send-email-jsnow@redhat.com
Code motion: I want to refactor this function to work with FDrive
directly, so shuffle it below that definition.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1453495865-9649-2-git-send-email-jsnow@redhat.com
Move the ssi.h include file into the ssi directory.
While touching the code also fix the typdef lines as
checkpatch complains.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add the sst25wf080 SPI flash device.
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Done with this Coccinelle semantic patch
@@
expression FMT, E1, E2;
expression list ARGS;
@@
- error_setg(E1, FMT, ARGS, error_get_pretty(E2));
+ error_propagate(E1, E2);/*###*/
+ error_prepend(E1, FMT/*@@@*/, ARGS);
followed by manual cleanup, first because I can't figure out how to
make Coccinelle transform strings, and second to get rid of now
superfluous error_propagate().
We now use or propagate the original error whole instead of just its
message obtained with error_get_pretty(). This avoids suppressing its
hint (see commit 50b7b00), but I can't see how the errors touched in
this commit could come with hints. It also improves the message
printed with &error_abort when we screw up (see commit 1e9b65b).
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1449764955-10741-3-git-send-email-armbru@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
It's necessary to distinguish source and target before we can add
blockdev-mirror, because we would want a concrete type of operation to
check on target bs before starting.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1450932306-13717-2-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
Request merging must not result in a huge request that exceeds the
maximum number of iovec elements. Use BlockLimits.max_iov instead of
hardcoding IOV_MAX.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
1. avoid possible superflous checking
2. make code more robustness
["make code more robustness" refers to avoiding integer
underflows/overflows.
--Stefan]
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Message-id: 1447207166-12612-1-git-send-email-arei.gonglei@huawei.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
src is stored in shared memory and src->nr_segments is dereferenced
twice at the end of the function. If a compiler decides to compile this
into two separate memory accesses then the size limitation could be
bypassed.
Fix it by removing the double access to src->nr_segments.
This is part of XSA-155.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
The Xen toolstack uses "vhd" to specify a disk in VHD format, however
the name of the driver in QEMU is "vpc". Replace "vhd" with "vpc", so
that QEMU can find the right driver to use for it.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
The official way of enabling dataplane is through the "iothread"
property that references an iothread object created by "-object
iothread". Since the old "x-data-plane=on" way now even crashes, it's
probably easier to just drop it:
$ qemu-system-x86_64 -drive file=null-co://,id=d0,if=none \
-device virtio-blk-pci,drive=d0,x-data-plane=on
ERROR:/home/fam/work/qemu/qom/object.c:1515:
object_get_canonical_path_component: assertion failed: (obj->parent != NULL)
Aborted
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1449485967-19240-1-git-send-email-famz@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For the "main area on file, oob in memory" case, fix the shifts so that
we erase the correct number of pages.
Signed-off-by: Ricard Wanderlöf <ricardw@axis.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This code has been dead for three years (since commit 7e7b7cba1).
Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
"werror=report" would free the req in virtio_blk_handle_rw_error, we
mustn't write to it in that case.
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1448239280-15025-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The shifts of the address mask and value shift beyond 32 bits when there
are 5 address cycles.
Cc: qemu-stable@nongnu.org
Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When a request R is absorbed by request M, it is appended to the
"mr_next" queue led by M, and is completed together with the completion
of M, in virtio_blk_rw_complete.
During DMA restart in virtio_blk_dma_restart_bh, requests in s->rq are
parsed and submitted again, possibly with a stale req->mr_next. It could
be a problem if the request merging in virtio_blk_handle_request hasn't
refreshed every mr_next pointer, in which case, virtio_blk_rw_complete
could walk through unexpected requests following the stale pointers.
Fix this by unsetting the pointer in virtio_blk_rw_complete. It is safe
because this req is either completed and freed right away, or it will be
restarted and parsed from scratch out of the vq later.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Currently both BLKIF_OP_WRITE and BLKIF_OP_FLUSH_DISKCACHE are being
accounted as write operations.
Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 7a2a14e3ac62027aa6267a6c02abc70717be9c0a.1446044837.git.berto@igalia.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When creating snapshot with the dataplane enabled, the snapshot file gets
not the actual state of virtqueue, because the current state is stored in
VirtIOBlockDataPlane. Therefore, before saving snapshot need to sync
the dataplane vring state to the virtqueue. The dataplane will resume its
work at the next notify virtqueue.
When snapshot loads with loadvm we get a message:
VQ 0 size 0x80 Guest index 0x15f5 inconsistent with Host index 0x0:
delta 0x15f5
error while loading state for instance 0x0 of device
'0000:00:08.0/virtio-blk'
Error -1 while loading VM state
to reproduce the error I used the following hmp commands:
savevm snap1
loadvm snap1
qemu parameters:
--enable-kvm -smp 4 -m 1024 -drive file=/var/lib/libvirt/images/centos6.4.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0 -set device.virtio-disk0.x-data-plane=on
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Message-id: 1445859777-2982-1-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Devices that are compliant with virtio-1 do not support scsi
passthrough any more (and it has not been a recommended setup
anyway for quite some time). To avoid having to switch it off
explicitly in newer qemus that turn on virtio-1 by default, let's
switch the default to scsi=false for 2.5.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Message-id: 1444991154-79217-4-git-send-email-cornelia.huck@de.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
They will be excluded by type in the nested event loops in block layer,
so that unwanted events won't be processed there.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All callers pass in false, and the real external ones will switch to
true in coming patches.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
blk_bs() will not necessarily return a non-NULL value any more (unless
blk_is_available() is true or it can be assumed to otherwise, e.g.
because it is called immediately after a successful blk_new_with_bs() or
blk_new_open()).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The tray of an FDD is open iff there is no medium inserted (there are
only two states for an FDD: "medium inserted" or "no medium inserted").
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Simplify memory allocation by sticking with a single API. GSlice
is not that fast anyway (tcmalloc/jemalloc are better).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The raw-posix block driver implements Linux AIO batching so multiple
requests can be submitted with a single io_submit(2) system call.
Batching is currently only used by virtio-scsi and
virtio-blk-data-plane.
Enable batching for regular virtio-blk so the number of io_submit(2)
system calls is reduced for workloads with queue depth > 1.
In 4KB random read performance tests with queue depth 32, the CPU
utilization on the host is reduced by 9.4%. The fio job is as follows:
[global]
bs=4k
ioengine=libaio
iodepth=32
direct=1
sync=0
time_based=1
runtime=30
clocksource=gettimeofday
ramp_time=5
[job1]
rw=randread
filename=/dev/vdb
size=4096M
write_bw_log=fio
write_iops_log=fio
write_lat_log=fio
log_avg_msec=1000
This benchmark was run on an raw image on LVM. The disk was an SSD
drive and -drive cache=none,aio=native was used.
Tested-by: Pradeep Surisetty <psuriset@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Symptom:
$ qemu-system-x86_64 -m 10000000
Unexpected error in ram_block_add() at /work/armbru/qemu/exec.c:1456:
upstream-qemu: cannot set up guest memory 'pc.ram': Cannot allocate memory
Aborted (core dumped)
Root cause: commit ef701d7 screwed up handling of out-of-memory
conditions. Before the commit, we report the error and exit(1), in
one place, ram_block_add(). The commit lifts the error handling up
the call chain some, to three places. Fine. Except it uses
&error_abort in these places, changing the behavior from exit(1) to
abort(), and thus undoing the work of commit 3922825 "exec: Don't
abort when we can't allocate guest memory".
The three places are:
* memory_region_init_ram()
Commit 4994653 (right after commit ef701d7) lifted the error
handling further, through memory_region_init_ram(), multiplying the
incorrect use of &error_abort. Later on, imitation of existing
(bad) code may have created more.
* memory_region_init_ram_ptr()
The &error_abort is still there.
* memory_region_init_rom_device()
Doesn't need fixing, because commit 33e0eb5 (soon after commit
ef701d7) lifted the error handling further, and in the process
changed it from &error_abort to passing it up the call chain.
Correct, because the callers are realize() methods.
Fix the error handling after memory_region_init_ram() with a
Coccinelle semantic patch:
@r@
expression mr, owner, name, size, err;
position p;
@@
memory_region_init_ram(mr, owner, name, size,
(
- &error_abort
+ &error_fatal
|
err@p
)
);
@script:python@
p << r.p;
@@
print "%s:%s:%s" % (p[0].file, p[0].line, p[0].column)
When the last argument is &error_abort, it gets replaced by
&error_fatal. This is the fix.
If the last argument is anything else, its position is reported. This
lets us check the fix is complete. Four positions get reported:
* ram_backend_memory_alloc()
Error is passed up the call chain, ultimately through
user_creatable_complete(). As far as I can tell, it's callers all
handle the error sanely.
* fsl_imx25_realize(), fsl_imx31_realize(), dp8393x_realize()
DeviceClass.realize() methods, errors handled sanely further up the
call chain.
We're good. Test case again behaves:
$ qemu-system-x86_64 -m 10000000
qemu-system-x86_64: cannot set up guest memory 'pc.ram': Cannot allocate memory
[Exit 1 ]
The next commits will repair the rest of commit ef701d7's damage.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1441983105-26376-3-git-send-email-armbru@redhat.com>
Reviewed-by: Peter Crosthwaite <crosthwaite.peter@gmail.com>
* qemu_mutex_lock_iothread "No such process" fix
* cutils: qemu_strto* wrappers
* iohandler.c simplification
* Many other fixes and misc patches.
And some MTTCG work (with Emilio's fixes squashed):
* Signal-free TCG kick
* Removing spinlock in favor of QemuMutex
* User-mode emulation multi-threading fixes/docs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJV8Tk7AAoJEL/70l94x66Ds3QH/3bi0RRR2NtKIXAQrGo5tfuD
NPMu1K5Hy+/26AC6mEVNRh4kh7dPH5E4NnDGbxet1+osvmpjxAjc2JrxEybhHD0j
fkpzqynuBN6cA2Gu5GUNoKzxxTmi2RrEYigWDZqCftRXBeO2Hsr1etxJh9UoZw5H
dgpU3j/n0Q8s08jUJ1o789knZI/ckwL4oXK4u2KhSC7ZTCWhJT7Qr7c0JmiKReaF
JEYAsKkQhICVKRVmC8NxML8U58O8maBjQ62UN6nQpVaQd0Yo/6cstFTZsRrHMHL3
7A2Tyg862cMvp+1DOX3Bk02yXA+nxnzLF8kUe0rYo6llqDBDStzqyn1j9R0qeqA=
=nB06
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Support for jemalloc
* qemu_mutex_lock_iothread "No such process" fix
* cutils: qemu_strto* wrappers
* iohandler.c simplification
* Many other fixes and misc patches.
And some MTTCG work (with Emilio's fixes squashed):
* Signal-free TCG kick
* Removing spinlock in favor of QemuMutex
* User-mode emulation multi-threading fixes/docs
# gpg: Signature made Thu 10 Sep 2015 09:03:07 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
* remotes/bonzini/tags/for-upstream: (44 commits)
cutils: work around platform differences in strto{l,ul,ll,ull}
cpu-exec: fix lock hierarchy for user-mode emulation
exec: make mmap_lock/mmap_unlock globally available
tcg: comment on which functions have to be called with mmap_lock held
tcg: add memory barriers in page_find_alloc accesses
remove unused spinlock.
replace spinlock by QemuMutex.
cpus: remove tcg_halt_cond and tcg_cpu_thread globals
cpus: protect work list with work_mutex
scripts/dump-guest-memory.py: fix after RAMBlock change
configure: Add support for jemalloc
add macro file for coccinelle
configure: factor out adding disas configure
vhost-scsi: fix wrong vhost-scsi firmware path
checkpatch: remove tests that are not relevant outside the kernel
checkpatch: adapt some tests to QEMU
CODING_STYLE: update mixed declaration rules
qmp: Add example usage of strto*l() qemu wrapper
cutils: Add qemu_strtoull() wrapper
cutils: Add qemu_strtoll() wrapper
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
A number of files were including signal.h but not using any
of the functions it provides
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Commit ef546f1275 ("virtio: add
feature checking helpers") introduced a helper __virtio_has_feature.
We don't want to use reserved identifiers, though, so let's
rename __virtio_has_feature to virtio_has_feature and virtio_has_feature
to virtio_vdev_has_feature.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The i8257 DMA controller uses an idle bottom half, which by default
does not cause the main loop to exit. Therefore, the DMA_schedule
function is there to ensure that the CPU relinquishes the iothread
mutex to the iothread.
However, this is not enough since the iothread will call
aio_compute_timeout() and go to sleep again. In the iothread
world, forcing execution of the idle bottom half is much simpler,
and only requires a call to qemu_notify_event(). Do it, removing
the need for the "cpu_request_exit" pseudo-irq. The next patch
will remove it.
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use pow2ceil() to round up to the next power of 2, rather
than an inline calculation.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1437741192-20955-4-git-send-email-peter.maydell@linaro.org
Chapter 6.3 of spec said
"
Transitional devices MUST offer, and if offered by the device
transitional drivers MUST accept the following:
VIRTIO_F_ANY_LAYOUT (27)
"
So this patch only clear VIRTIO_F_LAYOUT for legacy device.
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-block@nongnu.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
SCSI passthrough was no longer supported in virtio 1.0, so this patch
fail the get_features() when both 1.0 and scsi is set. And also only
advertise VIRTIO_BLK_F_SCSI for legacy virtio-blk device.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Implement support in Identify and Get/Set Features to properly report
and allow to change the Volatile Write Cache status reported by the
virtual NVMe device.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Implement a real flush instead of faking it. This is especially important
as Qemu assume Write back cashing by default and thus requires a working
cache flush operation for data integrity.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
in_num = req->elem.in_num, and req->elem.in_num is
checked in line 489, so the check about in_num variable
is superflous, let's drop it.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1435138164-11728-1-git-send-email-arei.gonglei@huawei.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Each call of the virtio_blk_reset() function calls blk_drain_all(),
which works for all existing BlockDriverStates, while draining only
one is needed.
This patch replaces blk_drain_all() by blk_drain() in
virtio_blk_reset(). virtio_blk_data_plane_stop() should be called
after draining because it restores vblk->complete_request.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Message-id: 1434537440-28236-3-git-send-email-yarygin@linux.vnet.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
We create optional sections with this patch. But we already have
optional subsections. Instead of having two mechanism that do the
same, we can just generalize it.
For subsections we just change:
- Add a needed function to VMStateDescription
- Remove VMStateSubsection (after removal of the needed function
it is just a VMStateDescription)
- Adjust the whole tree, moving the needed function to the corresponding
VMStateDescription
Signed-off-by: Juan Quintela <quintela@redhat.com>
* CONFIG_PARALLEL fix from Mirek
* Atomic/optimized dirty bitmap access from myself and Stefan
* BUILD_DIR convenience/bugfix from Peter C
* Memory leak fix from Shannon
* SMM improvements (though still TCG only) from myself and Gerd, acked by mst
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJVceAwAAoJEL/70l94x66Dyz4H/RHS/OUGo6HOwG1FZ4l8RxRl
FY+pwJqinxFyGySmMLVHEeQCsIfxgi8bOmuWblG7sdt245nhMIj2jglyEOCUA3RN
Q9qxQr6QyXBWiwK4bfB7xI1z3/mc8cVvuxjtkLaBMa16A4MXMunWCDcyhsX9/0Vw
VySgTgBbn5AyY5x58TbkB7Tl6hMZgxF0yNwU6IGQvP079dgREAL2tzR1Wk8kPC80
ltLWlrwTAzF2km5m6rmstpMeZ/XIaq3DD2LU03SyUhefMsYowGKK+7Boo4lHpVm9
XAlxflahN7VGtQuno5RpYNNSzGqSJgqu5X5JxCMnbWdPi4sX3bijQdcUhW3/0oo=
=KPIz
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* KVM error improvement from Laurent
* CONFIG_PARALLEL fix from Mirek
* Atomic/optimized dirty bitmap access from myself and Stefan
* BUILD_DIR convenience/bugfix from Peter C
* Memory leak fix from Shannon
* SMM improvements (though still TCG only) from myself and Gerd, acked by mst
# gpg: Signature made Fri Jun 5 18:45:20 2015 BST using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream: (62 commits)
update Linux headers from kvm/next
atomics: add explicit compiler fence in __atomic memory barriers
ich9: implement SMI_LOCK
q35: implement TSEG
q35: add test for SMRAM.D_LCK
q35: implement SMRAM.D_LCK
q35: add config space wmask for SMRAM and ESMRAMC
q35: fix ESMRAMC default
q35: implement high SMRAM
hw/i386: remove smram_update
target-i386: use memory API to implement SMRAM
hw/i386: add a separate region that tracks the SMRAME bit
target-i386: create a separate AddressSpace for each CPU
vl: run "late" notifiers immediately
qom: add object_property_add_const_link
vl: allow full-blown QemuOpts syntax for -global
pflash_cfi01: add secure property
pflash_cfi01: change to new-style MMIO accessors
pflash_cfi01: change big-endian property to BIT type
target-i386: wake up processors that receive an SMI
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
When this property is set, MMIO accesses are only allowed with the
MEMTXATTRS_SECURE attribute. This is used for secure access to UEFI
variables stored in flash.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The RQM bit in MSR should be set whenever the guest is supposed to
access the FIFO, and it should be cleared in all other cases. This is
important so the guest can't continue writing/reading the FIFO beyond
the length that it's suppossed to access (see CVE-2015-3456).
Commit e9077462 fixed the CVE by adding code that avoids the buffer
overflow; however it doesn't correct the wrong behaviour of the floppy
controller which should already have cleared RQM.
Currently, RQM stays set all the time and during all phases while a
command is being processed. This is error-prone because the command has
to explicitly clear the flag if it doesn't need data (and indeed, the
two buggy commands that are the culprits for the CVE just forgot to do
that).
This patch clears RQM immediately as soon as all bytes that are expected
have been received. If the the FIFO is used in the next phase, the flag
has to be set explicitly there.
It also clear RQM after receiving all bytes even if the phase transition
immediately sets it again. While it's technically not necessary at the
moment because the state between clearing and setting RQM is not
observable by the guest, this is more explicit and matches how real
hardware works. It will actually become necessary in qemu once
asynchronous code paths are introduced.
This alone should have been enough to fix the CVE, but now we have two
lines of defense - even better.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1432214378-31891-8-git-send-email-kwolf@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
This commit makes similar improvements as have already been made to the
write function: Instead of relying on a flag in the MSR to distinguish
controller phases, use the explicit phase that we store now. Assertions
of the right MSR flags are added.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1432214378-31891-7-git-send-email-kwolf@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Factor out a few common lines of code, reformat, improve comments.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1432214378-31891-6-git-send-email-kwolf@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Instead of relying on a flag in the MSR to distinguish controller phases,
use the explicit phase that we store now. Assertions of the right MSR
flags are added.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1432214378-31891-5-git-send-email-kwolf@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
The floppy controller spec describes three different controller phases,
which are currently not explicitly modelled in our emulation. Instead,
each phase is represented by a combination of flags in registers.
This patch makes explicit in which phase the controller currently is.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Message-id: 1432214378-31891-4-git-send-email-kwolf@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
What callers really do with this function is to switch from execution
phase (including data transfers) to result phase where the guest can
read out one or more status bytes from the FIFO (the number depends on
the command).
Rename the function accordingly.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1432214378-31891-3-git-send-email-kwolf@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
What all callers of fdctrl_reset_fifo() really want to do is to start
the command phase, where writes to the data port initiate a new command.
The function doesn't only clear the FIFO, but also sets up the state so
that a new command can be received. Rename it to reflect this.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1432214378-31891-2-git-send-email-kwolf@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
Make features 64bit wide everywhere.
On migration a full 64bit guest_features field is sent if one of the
high bits is set, in addition to the lower 32bit guest_features field
which must stay for compatibility reasons. That way we send the lower
32 feature bits twice, but the code is simpler because we don't have
to split and compose the 64bit features into two 32bit fields.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The SCSI emulation in the Linux NVMe driver really wants to know
if a device has a volatile write cache. Given that qemu has moved
away from a model where we report the backing store WCE bit to
one where the WCE bit is supposed to be part of the migratable
guest-visible state we always return 1 here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
During processing of certain commands such as FD_CMD_READ_ID and
FD_CMD_DRIVE_SPECIFICATION_COMMAND the fifo memory access could
get out of bounds leading to memory corruption with values coming
from the guest.
Fix this by making sure that the index is always bounded by the
allocated memory.
This is CVE-2015-3456.
Signed-off-by: Petr Matousek <pmatouse@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Windows seems to send two separate calls to NVMe controller configuration. The
first sends configuration info and the second the enable bit. I couldn't
enable the Windows 8.1 in-box NVMe driver with base Qemu. I made the
following change to store the configuration data and then handle enable and
NVMe driver works on Windows 8.1.
I am not a Windows expert and I'm not entirely sure this is the correct
approach. I'm offering it for anyone who wishes to use NVMe on Windows 8.1
using Qemu.
I have tested this change with Linux and Windows guests with NVMe devices.
Signed-off-by: Daniel Stekloff <dan@wendan.org>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
All of them were reported by codespell.
Most typos are in comments, one is in an error message.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
They were introduced in 6f7e9aec5e and
82407d1a40 and lots of bug fixes were done after that.
This fixes (at least) the detection of the floppy controller on Debian 4.0r9/SPARC,
and SS-5's OBP initialization routine still works.
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-id: 1426351846-6497-1-git-send-email-hpoussin@reactos.org
Signed-off-by: John Snow <jsnow@redhat.com>
Delay the call to blk_blockalign() until s->blk has been assigned.
This never caused a crash because blk_blockalign(NULL, size) defaults to
4096 alignment but it's technically incorrect.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1429091024-25098-1-git-send-email-stefanha@redhat.com
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Of the block devices that poked into -drive options via drive_get_next,
m25p80 was the only one who also did not attach itself to the BlockBackend.
Since sd does it, and all other devices go through a "drive" property,
with this change all block backends attached to the guest will have a
non-NULL result for blk_get_attached_dev().
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-id: 1429025387-11077-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
After qemu_iovec_destroy, the QEMUIOVector's size is zeroed and
the zero size ultimately is used to compute virtqueue_push's len
argument. Therefore, reads from virtio-blk devices did not
migrate their results correctly. (Writes were okay).
Save the size in virtio_blk_handle_request, and use it when the request
is completed.
Based on a patch by Wen Congyang.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Message-id: 1427997044-392-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Drives defined with if!=none are for board initialization to wire up.
Board code calls drive_get() or similar to find them, and creates
devices with their qdev drive properties set accordingly.
Except a few devices go on a fishing expedition for a suitable backend
instead of exposing a drive property for board code to set: they call
driver_get() or drive_get_next() in their realize() or init() method
to implicitly connect to the "next" backend with a certain interface
type.
Picking up backends that way works when the devices are created by
board code. But it's inappropriate for -device or device_add. Not
only is this inconsistent with how the other block device models work
(they connect to a backend explicitly identified by a "drive"
property), it breaks when the "next" backend has been picked up by the
board already.
Example:
$ qemu-system-arm -S -M connex -pflash flash.img -device ssi-sd
Aborted (core dumped)
Mark them with suitable FIXME comments.
Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Cc: "Andreas Färber" <andreas.faerber@web.de>
Cc: Michael Walle <michael@walle.cc>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
The shift operation on nlb gives a 32 bit result if no type cast is
applied. This bug was reported by Coverity.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 1426348844-8793-1-git-send-email-sw@weilnetz.de
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Coverity/sparse fix for iscsi driver
- RCU fallout: fix -daemonize and s390x system emulation
- KVM: kvm_stat improvements and new man page
- x86: SYSRET fix for VxWorks
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJU/sUFAAoJEL/70l94x66D1JwIAJ28Lan2DQwi+xHvNxF8zW6n
v7eMc04/fepuon0TYmUZC3qbqc00sccEQZQ+yAAauT9epZ/kdSDudDOzG+3F4MuQ
/X3crXw2/jrhtWedGq49vFCONX4MKoaoudqK8kOFMe1ImQgkOYeAzOoqeFXyHsFh
jINlKTJZB6oKzrZ+SYryY14cO7pvGaIhyqaCC+6GcVihTjm9Yq13lP1lFj7LsVRV
aGfd6xH9RSV/mwzvZwD4i3cUWSUaV/wY0NDhAEzDPCUcxX0/nAj3XF1YeJUF30Qd
ETaCLo/Nxq2R6POK3c/Zm/FRLvjzZ2caD+q1LcwB/bCYdc2lJ1JDxE/hr48ANv0=
=OWXY
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
- scsi: improvements to error reporting and conversion to realize,
Coverity/sparse fix for iscsi driver
- RCU fallout: fix -daemonize and s390x system emulation
- KVM: kvm_stat improvements and new man page
- x86: SYSRET fix for VxWorks
# gpg: Signature made Tue Mar 10 10:18:45 2015 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream:
x86: fix SS selector in SYSRET
scsi: Convert remaining PCI HBAs to realize()
scsi: Improve error reporting for invalid drive property
hw: Propagate errors through qdev_prop_set_drive()
scsi: Clean up duplicated error in legacy if=scsi code
cpus: initialize cpu->memory_dispatch
rcu: handle forks safely
qemu-thread: do not use PTHREAD_MUTEX_ERRORCHECK
kvm_stat: add kvm_stat.1 man page
kvm_stat: add column headers to text UI
iscsi: Fix check for username
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJU/uuVAAoJEH8JsnLIjy/WULwP/jeARjYkFuG3ahSWpeY0JnTK
QCkLF06iSQQUiirXI4H+Tofl8kNVBd/Iinv+LbkF27iWbTiwalmLz7NiyboX8dl+
NJZtCrqp44q7KFbl3g19/jop/zdZ9N5Gxp8BARVUILHQb1y5cXJwrDhBxTmNRDL+
sSZXfomCgKtMP40nGLa0CcNIYKlm8MePJEM2TsMoWv7tYz4CXgBG39EqK6NJluCY
kTTMcbdrLbR0imfKOVPutCgV8rhRXJ0oGVD3Q+D3/LFmPG++hoRnWCcDm6ZZ62Hi
Ra7u87TBfAUUtiT+vFQJnd7hTpN+stQidsCDBLEY3qPTKYhzm648PHvcEwOAv6YW
sjAELF2Rrsbe4vkL3/qgYDusnaPMElrHVEdbKtHofWtg6KctLnYIhusV+qKq1Fpa
cRQEbQIZMVFeWN1G9WuYH8RBYrwJqp+/qq7DcnV62lUAdY4e3iO7E3yMLFDwpxku
PLl7eofU/ZpnAOrrU2QAQvgXZRqy1ie/Unv8jFwefQkK5mXHoCtkAeBlOM8t4kJf
HjkC/hYO7kwPdaz6xK80wpXqYd3vT6jKi7mlJqC5oQQLGJbRigxlMZ16UIAx+IrL
NxhnQChp7IP21KMATFbpvYjcJyGMw3ZuVRaUhQBgqQArIomVHvM5WcN9M6S5dsmj
vClFOIqjlSbtsmChceWr
=hlbC
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block patches for 2.3
# gpg: Signature made Tue Mar 10 13:03:17 2015 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
* remotes/kevin/tags/for-upstream: (73 commits)
MAINTAINERS: Add jcody as blockjobs, block devices maintainer
iotests: add O_DIRECT alignment probing test
block/raw-posix: fix launching with failed disks
MAINTAINERS: Add jsnow as IDE maintainer
sheepdog: Fix misleading error messages in sd_snapshot_create()
Add testcase for scsi-hd devices without drive property
scsi-hd: fix property unset case
block/vdi: Add locking for parallel requests
iotests: Drop vpc from 004's and 104's format list
iotests: Remove 006
iotests: Fix 051's reference output
virtio-blk: Remove the stale FIXME comment
tests: Check QVIRTIO_F_ANY_LAYOUT flag in virtio-blk test
libqos: Solve bug in interrupt checking when using MSIX in virtio-pci.c
sheepdog: fix confused return values
qtest/ahci: add fragmented dma test
qtest/ahci: Add PIO and LBA48 tests
qtest/ahci: Add DMA test variants
libqos/ahci: add ahci command helpers
qtest/ahci: Add a macro bootup routine
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
By default, we have ioeventfd enabled, so the IO request processing is
in IO thread; in the vcpu thread, guest mode is returned to as quickly
as possible, and completion is delivered via irqfd. Therefore this
comment from the initial implementation is barely relevant.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
geometry: hd_geometry_guess function autodetects the drive geometry.
This patch adds a block backend call, that probes the backing device
geometry. If the inner driver method is implemented and succeeds
(currently only for DASDs), the blkconf_geometry will pass-through
the backing device geometry. Otherwise will fallback to old logic.
blocksize: This patch initializes blocksize properties to 0.
In order to set the property a blkconf_blocksizes was introduced.
If user didn't set physical or logical blocksize, it will
retrieve its value from a driver (only succeeds for DASD), otherwise
it will set default 512 value.
The blkconf_blocksizes call was added to all users of BlkConf.
Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1424087278-49393-6-git-send-email-tumanova@linux.vnet.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Since commit 1dc936aa84 (virtio-blk: Use blk_aio_ioctl) we silently lose
the request if blk_aio_ioctl returns NULL (not implemented).
Fix it by directly returning VIRTIO_BLK_S_UNSUPP as we used to do.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[ kwolf: Fixed build error on win32 ]
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Three kinds of callers:
1. On failure, report the error and abort
Passing &error_abort does the job. No functional change.
2. On failure, report the error and exit()
This is qdev_prop_set_drive_nofail(). Error reporting moves from
qdev_prop_set_drive() to its caller. Because hiding away the error
in the monitor right before exit() isn't helpful, replace
qerror_report_err() by error_report_err(). Shouldn't make a
difference, because qdev_prop_set_drive_nofail() should never be
used in QMP context.
3. On failure, report the error and recover
This is usb_msd_init() and scsi_bus_legacy_add_drive(). Error
reporting and freeing the error object moves from
qdev_prop_set_drive() to its callers.
Because usb_msd_init() can't run in QMP context, replace
qerror_report_err() by error_report_err() there.
No functional change.
scsi_bus_legacy_add_drive() calling qerror_report_err() is of
course inappropriate, but this commit merely makes it more obvious.
The next one will clean it up.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Message-Id: <1425925048-15482-3-git-send-email-armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a helper function for checking whether a bit is set in the guest
features for a vdev as well as one that works on a feature bit set.
Convert code that open-coded this: It cleans up the code and makes it
easier to extend the guest feature bits.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add virtio_{add,clear}_feature helper functions for manipulating a
feature bits variable. This has some benefits over open coding:
- add check that the bit is in a sane range
- make it obvious at a glance what is going on
- have a central point to change when we want to extend feature bits
Convert existing code manipulating features to use the new helpers.
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Drop duplicated code. Minor codechanges were required
as geometry is a sub-structure now.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
As part of the required changes, this fixes a bug where specifying an
invalid driver would result in the block layer probing the image format;
now it will result in an error, unless "<unset>" is specified as the
driver name. Fixing this would require further work on the xen_disk code
which does not seem worth it (at this point and for this patch).
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-7-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
The vring.c code currently assumes that guest and host endianness match,
which is not true for a number of cases:
- emulating targets with a different endianness than the host
- bi-endian targets, where the correct endianness depends on the virtio
device
- upcoming support for the virtio-1 standard mandates little-endian
accesses even for big-endian targets and hosts
Make sure to use accessors that depend on the virtio device.
Note that dataplane now needs to be built per-target.
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Fam Zheng <famz@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1422289602-17874-2-git-send-email-cornelia.huck@de.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
we check and adjust request sizes at several places with
sometimes inconsistent checks or default values:
INT_MAX
INT_MAX >> BDRV_SECTOR_BITS
UINT_MAX >> BDRV_SECTOR_BITS
SIZE_MAX >> BDRV_SECTOR_BITS
This patches introdocues a macro for the maximal allowed sectors
per request and uses it at several places.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
this adds a knob to disable request merging for debugging or benchmarks if dedired.
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
this patch finally introduces multiread support to virtio-blk. While
multiwrite support was there for a long time, read support was missing.
The complete merge logic is moved into virtio-blk.c which has
been the only user of request merging ever since. This is required
to be able to merge chunks of requests and immediately invoke callbacks
for those requests. Secondly, this is required to switch to
direct invocation of coroutines which is planned at a later stage.
The following benchmarks show the performance of running fio with
4 worker threads on a local ram disk. The numbers show the average
of 10 test runs after 1 run as warmup phase.
| 4k | 64k | 4k
MB/s | rd seq | rd rand | rd seq | rd rand | wr seq | wr rand
--------------+--------+---------+--------+---------+--------+--------
master | 1221 | 1187 | 4178 | 4114 | 1745 | 1213
multiread | 1829 | 1189 | 4639 | 4110 | 1894 | 1216
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As it was not obvious (at least for me) where the 32 comes from;
add a constant for it.
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
- New "edu" device (v1->v2: fix 32-bit compilation)
- Disabling HLE and RTM on Haswell & Broadwell
- kvm_stat updates
- Added --enable-modules to Travis, in preparation for switching
the default
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJUxiioAAoJEL/70l94x66D+zQIAKVq9DPm4RNJ2/c2nt6phAVr
6Z5yB+TMf4BKFORwVkionvIOEqOC0pdm3oo93/XH7DTsN7pFg7rdwJl+ADESgvl6
+tpUbrgjZuCuNQNXy/mjx0EJQUmTk8/x+a054hSo6XNvs2ZM9HjaKNX3ojS6pG1J
mhIH4cjGPMCwu2hgm2mho/1zdIs4Qk3xT8Uzfq8i5gES14YX0Fmt93idUn3DRs7m
zHdzHWr0JmXfZweNDdPgsfGO6g+NnwgGUOqeGY4Ucmurkepk9ViCaaJP7lOnuRhT
52isayOrfrZsSLm5xxwtSUjmgbxzlOEit1b8jLzpHb5b9b+LJiCtuJnN7vwHb34=
=jgup
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
- Many fixes from the floor as usual
- New "edu" device (v1->v2: fix 32-bit compilation)
- Disabling HLE and RTM on Haswell & Broadwell
- kvm_stat updates
- Added --enable-modules to Travis, in preparation for switching
the default
# gpg: Signature made Mon 26 Jan 2015 11:44:40 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini/tags/for-upstream:
kvm_stat: Add RESET support for perf event ioctl
target-i386: Disable HLE and RTM on Haswell & Broadwell
sparse: Fix build with sparse on .S files
exec: fix madvise of NULL pointer
.travis.yml: Add "--enable-modules"
apic: do not dereference pointer before it is checked for NULL
kvm_stat: Print errno when syscall to perf_event_open() fails
kvm_stat: Update exit reasons to the latest defintion
kvm_stat: Add aarch64 support
hw: misc, add educational driver
vmstate: accept QEMUTimer in VMSTATE_TIMER*, add VMSTATE_TIMER_PTR*
qemu-timer: introduce timer_deinit
qemu-timer: add timer_init and timer_init_ns/us/ms
target-i386: make xmm_regs 512-bit wide
target-i386: use vmstate_offset_sub_array for AVX registers
tests/multiboot: Add test for modules
multiboot: Fix offset of bootloader name
tests/multiboot: Update reference output
pc: fix KVM features in pc-1.3 and earlier machine types
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Use the asynchronous interface of ioctl. This will not make the VM
unresponsive if the ioctl takes a long time.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
In preparation for calling blk_aio_ioctl. Also make the function static
as no other files need it.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
According to NVMe specifications Bits 15:08 represent Minor Version number.
Signed-off-by: Anubhav Rakshit <anubhav.rakshit@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
According to the specification, the low 16 bits should contain the number of
I/O submission queues, and the high 16 bits should contain the number of
I/O completion queues.
Signed-off-by: Alex Friedman <alex@e8storage.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Like BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET,
block-commit involves two asymmetric devices.
This change is not user-visible (yet), because commit only works with
device names.
But once we enable backing reference in blockdev-add, or specifying
node-name in block-commit command, we don't want the user to start two
commit jobs on the same backing chain, which will corrupt things because
of the final bdrv_swap.
Before we have per category blockers, splitting this type is still
better.
[Resolved virtio-blk dataplane conflict by replacing
BLOCK_OP_TYPE_COMMIT with both BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET}.
They are safe since the block job runs in the same AioContext as the
dataplane IOThread.
--Stefan]
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUid7mAAoJEL7lnXSkw9fbX0YH/RXeoX7IN1QVEP8Z2dLq4nzt
igHm4Gd4ENX6pdluHmXAxzKoHR78yNGFegzuv0IJBfav8kGCQWW6zAf4TUk/udtn
AKPk1yauVKNzCvmYldrfbuu4HedZqkftE0tyDZuAK50pZH1hzX7qiAT1C0OlarLQ
Tqy4+ouYiRja2hLq4YJCM9mmYt0sbMDShIcHBYRTdD0cxoPZ+JZEeAQYg+FYNdIo
jioVg8NgmFZW37UWeBTCKG+DcX9NwXwyo/ASdIozM+aQcTBx/nXn7/NOAxXlxUX8
M9AS9iz+LWBfwof3HLbzLLTvTmE66Z78/TluFMmEbpK4ts0ZXRJKDHE/pfynRD0=
=PXlQ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2014-12-11' into staging
trivial patches for 2014-12-11
# gpg: Signature made Thu 11 Dec 2014 18:13:58 GMT using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg: aka "Michael Tokarev <mjt@corpit.ru>"
# gpg: aka "Michael Tokarev <mjt@debian.org>"
* remotes/mjt/tags/pull-trivial-patches-2014-12-11:
Sort include/qemu/typedefs.h
hpet: increase spelling precision
pflash_cfi02.c: associate "cfi.pflash02" to "Storage devices" category
vt82c686: fix coverity warning about out-of-bounds write
virtio: remove useless declaration of virtio_net_init()
qapi-schema: fix typo about change-vnc-password
fw_cfg: remove superfluous blank line
get_maintainer.pl: Remove the --git-chief-penguins option
configure: Replace which(1) with "has"
util: Use g_new() & friends where that makes obvious sense
util: Fuse g_malloc(); memset() into g_new0()
util: Drop superfluous conditionals around g_free()
Drop superfluous conditionals around g_strdup()
Drop superfluous conditionals around qemu_opts_del()
usb: delete redundant brackets in usb_host_handle_control()
virtio-bus: avoid breaking build when open DEBUG switch
acpi-build: Make DPRINTF working for acpi-build
acpi-build: adjust indention 8 -> 4 spaces
target-s390x: fix possible out of bounds read
qmp: fix typo in input-send-event examples
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Initialise our maximum page size capability to 64kB and increase
the page_size variable from 16 to 32 bits.
Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The transaction QMP command performs operations atomically on a group of
drives. This command needs to acquire AioContext in order to work
safely when virtio-blk dataplane IOThreads are accessing drives.
The transactional nature of the command means that actions are split
into prepare, commit, abort, and clean functions. Acquire the
AioContext in prepare and don't release it until one of the other
functions is called. This prevents the IOThread from running the
AioContext before the transaction has completed.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1416566940-4430-4-git-send-email-stefanha@redhat.com
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add dataplane support to the change-backing-file QMP commands. By
acquiring the AioContext we avoid race conditions with the dataplane
thread which may also be accessing the BlockDriverState.
Note that this command operates on both bs and a node in its chain
(image_bs). The bdrv_chain_contains(bs, image_bs) check guarantees that
bs and image_bs are in the same AioContext.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
By acquiring the AioContext we avoid race conditions with the dataplane
thread which may also be accessing the BlockDriverState.
Fix up eject, change, and block_passwd in a single patch because
qmp_eject() and qmp_change_blockdev() both call eject_device(). Also
fix block_passwd while we're tackling a command that takes a block
encryption password.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add dataplane support to the blockdev-snapshot-delete-internal-sync QMP
command. By acquiring the AioContext we avoid race conditions with the
dataplane thread which may also be accessing the BlockDriverState.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Operands don't affect result (CONSTANT_EXPRESSION_RESULT)
((n->bar.aqa >> AQA_ASQS_SHIFT) & AQA_ASQS_MASK) > 4095
is always false regardless of the values of its operands.
This occurs as the logical second operand of '||'.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch fixes two issues with persistent grants and the disk PV backend
(Qdisk):
- Keep track of memory regions where persistent grants have been mapped
since we need to unmap them as a whole. It is not possible to unmap a
single grant if it has been batch-mapped. A new check has also been added
to make sure persistent grants are only used if the whole mapped region
can be persistently mapped in the batch_maps case.
- Unmap persistent grants before switching to the closed state, so the
frontend can also free them.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Now that blockjobs use AioContext they are safe for use with dataplane.
Unblock them!
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1413889440-32577-12-git-send-email-stefanha@redhat.com
blockdev_init() always creates a DriveInfo, but only drive_new() fills
it in. qmp_blockdev_add() leaves it blank. This results in a drive
with type = IF_IDE, bus = 0, unit = 0. Screwed up in commit ee13ed1c.
Board initialization code looking for IDE drive (0,0) can pick up one
of these bogus drives. The QMP command has to execute really early to
be visible. Not sure how likely that is in practice.
Fix by creating DriveInfo in drive_new(). Block backends created by
blockdev-add don't get one.
Breaks the test for "has been created by qmp_blockdev_add()" in
blockdev_mark_auto_del() and do_drive_del(), because it changes the
value of dinfo && !dinfo->enable_auto_del from true to false. Simply
test !dinfo instead.
Leaves DriveInfo member enable_auto_del unused. Drop it.
A few places assume a block backend always has a DriveInfo. Fix them
up.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Device models should access their block backends only through the
block-backend.h API. Convert them, and drop direct includes of
inappropriate headers.
Just four uses of BlockDriverState are left:
* The Xen paravirtual block device backend (xen_disk.c) opens images
itself when set up via xenbus, bypassing blockdev.c. I figure it
should go through qmp_blockdev_add() instead.
* Device model "usb-storage" prompts for keys. No other device model
does, and this one probably shouldn't do it, either.
* ide_issue_trim_cb() uses bdrv_aio_discard() instead of
blk_aio_discard() because it fishes its backend out of a BlockAIOCB,
which has only the BlockDriverState.
* PC87312State has an unused BlockDriverState[] member.
The next two commits take care of the latter two.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is consistent with how VirtIOFOOConf variables are named
elsewhere, and makes blk available for BlockBackend variables.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Commit 12c5674 turned it into a pointer to member blk.conf.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
I'll use BlockDriverAIOCB with block backends shortly, and the name is
going to fit badly there. It's a block layer thing anyway, not just a
block driver thing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The patch is big, but all it really does is replacing
dinfo->bdrv
by
blk_bs(blk_by_legacy_dinfo(dinfo))
The replacement is repetitive, but the conversion of device models to
BlockBackend is imminent, and will shorten it to just
blk_legacy_dinfo(dinfo).
Line wrapping muddies the waters a bit. I also omit tests whether
dinfo->bdrv is null, because it never is.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
On BlockBackend destruction, unref its BlockDriverState. Replaces the
callers' unrefs.
This turns the pointer from BlockBackend to BlockDriverState into a
strong reference, managed with bdrv_ref() / bdrv_unref(). The
back-pointer remains weak.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Convenience function blk_new_with_bs() creates a BlockBackend with its
BlockDriverState. Callers have to unref both. The commit after next
will relieve them of the need to unref the BlockDriverState.
Complication: due to the silly way drive_del works, we need a way to
hide a BlockBackend, just like bdrv_make_anon(). To emphasize its
"special" status, give the function a suitably off-putting name:
blk_hide_on_behalf_of_do_drive_del(). Unfortunately, hiding turns the
BlockBackend's name into the empty string. Can't avoid that without
breaking the blk->bs->device_name equals blk->name invariant.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
A block device consists of a frontend device model and a backend.
A block backend has a tree of block drivers doing the actual work.
The tree is managed by the block layer.
We currently use a single abstraction BlockDriverState both for tree
nodes and the backend as a whole. Drawbacks:
* Its API includes both stuff that makes sense only at the block
backend level (root of the tree) and stuff that's only for use
within the block layer. This makes the API bigger and more complex
than necessary. Moreover, it's not obvious which interfaces are
meant for device models, and which really aren't.
* Since device models keep a reference to their backend, the backend
object can't just be destroyed. But for media change, we need to
replace the tree. Our solution is to make the BlockDriverState
generic, with actual driver state in a separate object, pointed to
by member opaque. That lets us replace the tree by deinitializing
and reinitializing its root. This special need of the root makes
the data structure awkward everywhere in the tree.
The general plan is to separate the APIs into "block backend", for use
by device models, monitor and whatever other code dealing with block
backends, and "block driver", for use by the block layer and whatever
other code (if any) dealing with trees and tree nodes.
Code dealing with block backends, device models in particular, should
become completely oblivious of BlockDriverState. This should let us
clean up both APIs, and the tree data structures.
This commit is a first step. It creates a minimal "block backend"
API: type BlockBackend and functions to create, destroy and find them.
BlockBackend objects are created and destroyed exactly when root
BlockDriverState objects are created and destroyed. "Root" in the
sense of "in bdrv_states". They're not yet used for anything; that'll
come shortly.
A root BlockDriverState is created with bdrv_new_root(), so where to
create a BlockBackend is obvious. Where these roots get destroyed
isn't always as obvious.
It is obvious in qemu-img.c, qemu-io.c and qemu-nbd.c, and in error
paths of blockdev_init(), blk_connect(). That leaves destruction of
objects successfully created by blockdev_init() and blk_connect().
blockdev_init() is used only by drive_new() and qmp_blockdev_add().
Objects created by the latter are currently indestructible (see commit
48f364d "blockdev: Refuse to drive_del something added with
blockdev-add" and commit 2d246f0 "blockdev: Introduce
DriveInfo.enable_auto_del"). Objects created by the former get
destroyed by drive_del().
Objects created by blk_connect() get destroyed by blk_disconnect().
BlockBackend is reference-counted. Its reference count never exceeds
one so far, but that's going to change.
In drive_del(), the BB's reference count is surely one now. The BDS's
reference count is greater than one when something else is holding a
reference, such as a block job. In this case, the BB is destroyed
right away, but the BDS lives on until all extra references get
dropped.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Creating an anonymous BDS can't fail. Make that obvious.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
On this way, we can assure the new bootindex take effect
during vm rebooting.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
At present, nvma cannot boot. However, it provides already
a bootindex property, so change bootindex to qom for nvma
device, but not call add_boot_device_path.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Add a qom property with the same name 'bootindex',
when we remove it form qdev property, things will
continue to work just fine, and we can use qom features
which are not supported by qdev property.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Remove bootindexA/B form qdev property to qom, things will
continue to work just fine, and we can use qom features
which are not supported by qdev property.
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Dataplane doesn't depend on linux-aio any more, so we don't need the
compiling condition now.
Configure options are kept but just print a message.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1410329871-28885-4-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJUEwy3AAoJEH8JsnLIjy/WqkUQALPsR68w2bB6aiN6zUaJt1X3
VaksCQGgtZdN6itDvn6v6ktayFXXfjRE+U0hK7joXUiokq17YZmKqf+1V4LPJRSW
Tv21gIAHuIyf+8LL/xGS3W9+EEXAaKbp1t6AT/VDWv/mQ4KY5xrvhn2E/+7r0wKr
EBOHrKd4tQualV12MtrZsrWZy3oMQvkimcVIfnjFZ2gJg5dmUBXQ35Kdj9+AxDiX
1hDizBRbozvzSBCnS9PUcJ1OfCxoCRewbHn43LeCYWyB8m3ttpdPpuMaUoSNGrVY
Tw7aYvYjMArr/ChrF8eH2vKJSeHabSPbYqgNsGqpS2n5KYJbzoyv8iQQCSHjtKZe
vagoIRomF/BtOWT8mvUSHGw2vmQm6JZJdHJsXNeyDJ/P8ZSSm0vsZMjqh6vwS7sB
+AURb5BaFWNnThwm80tJl23uJLjohNsdrmuLvAiHX0e03dyyQFDBS1zqb9BTbOsP
SdBPFZy1hA0deYnJlyeLj94iyIosdsMihLkDJrIdNzn6qMF9QCdFs+rgOepwsfml
ZNG1h2V+Wo3LS1SkKpK0mhiTBFLCit8Cq03+n95zBTcPCBMGgoJVC2VZef8XXKDn
v6vuSYikCkEIDEWhsUrIZmDWKv/83AwSW+i+ir3IOVgxOJ51Z/mr5PAQQ+3/Gaat
G5gSIDmW4rGgYDk/coDf
=3He1
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block patches
# gpg: Signature made Fri 12 Sep 2014 16:09:43 BST using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
* remotes/kevin/tags/for-upstream: (22 commits)
qcow2: Add falloc and full preallocation option
raw-posix: Add falloc and full preallocation option
qapi: introduce PreallocMode and new PreallocModes full and falloc.
block: don't convert file size to sector size
block: round up file size to nearest sector
iotests: Send the correct fd in socket_scm_helper
blockdev: Refuse to drive_del something added with blockdev-add
block: extend BLOCK_IO_ERROR with reason string
dataplane: fix virtio_blk_data_plane_create() op blocker error path
qemu-iotests: Run 025 for Archipelago block driver
block/archipelago: Implement bdrv_truncate()
block: Make the block accounting functions operate on BlockAcctStats
block: rename BlockAcctType members to start with BLOCK_ instead of BDRV_
block: Extract the block accounting code
block: Extract the BlockAcctStats structure
IDE: MMIO IDE device control should be little endian
thread-pool: Drop unnecessary includes
xen: Drop redundant bdrv_close() from pci_piix3_xen_ide_unplug()
xen_disk: Plug memory leak on error path
qemu-io: Clean up openfile() after commit 2e40134
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
- Build: fixing block/iscsi.so and ranlib warnings on Mac OS X
- Migration fixes for x86
- The odd KVM patch.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJUEXeWAAoJEBvWZb6bTYby4AwP/0Hh55A7QzkkzZ66y65zM+G5
dsgRcLjufHSRQHoNQqm6LOcicV3Ygc/X644EY6jnZCZxFh/fsWuTPqUDGxLAnxEc
2V0PkLRIScAMOPezzxvRy6/9hkG+UYM3ZOL5D9yxA9pGuBtttw7tkts19Vqf9WZc
NYG5TBDuEGM1c596Zpo7t10m+Oiw+Jyi5luLXsb4lh5ikdFPDrtJaf0AnFvR+ym0
HXlj2K/0vHNowUeLoo+oWnZsW8mLE6OyJhgfo1tJtsH1BR+lQJnBnQ4moq4Sl/Wz
+iht/4gtz34XwLILokFR6yiNrPe+MIryyv+FYxOD5loIdGVDtKMx30UkIE2/D933
6/n5i3GBLi9JapeT9gkKTxk/UVRPzJ1PK07RWevgNZNQyTGKAUGp+p48nSzMYX7V
7GFSy3Q8uqOR8g9n+t+RURxkoMNbhhw7v53Z3PPXPCALCMDzg9RARlW/nkfiExcZ
oThUjE/8xfMTQlN1SO5HTyQXEkYjtknZhfC7/KFvkWYMbCG0KBTf212Md0zlTNkj
+C6r8Gq4ZWVIc07QyKkoCMxB+a9Uhvy4T1PKuSlm6iu94zUgZRhdf/PlOXimhFqH
9GL67Tv15kpj05xCS6jDXjeMZ416/UKw91OcsiT1UUHcq7/rc+GBycd0ngV1UgnQ
di5V12IVt8JwdzFxMeCT
=GIKW
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
- Memory: improve error reporting and avoid crashes on hotplug
- Build: fixing block/iscsi.so and ranlib warnings on Mac OS X
- Migration fixes for x86
- The odd KVM patch.
# gpg: Signature made Thu 11 Sep 2014 11:21:10 BST using RSA key ID 9B4D86F2
# gpg: Good signature from "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: aka "Paolo Bonzini <bonzini@gnu.org>"
* remotes/bonzini/tags/for-upstream: (21 commits)
gdbstub: init mon_chr through qemu_chr_alloc
pckbd: adding new fields to vmstate
mc146818rtc: add missed field to vmstate
piix: do not set irq while loading vmstate
serial: fixing vmstate for save/restore
parallel: adding vmstate for save/restore
fdc: adding vmstate for save/restore
cpu: init vmstate for ticks and clock offset
apic_common: vapic_paddr synchronization fix
vl: use QLIST_FOREACH_SAFE to visit change state handlers
exec: add parameter errp to gethugepagesize
exec: report error when memory < hpagesize
hostmem-ram: don't exit qemu if size of memory-backend-ram is way too big
memory: add parameter errp to memory_region_init_rom_device
memory: add parameter errp to memory_region_init_ram
exec: add parameter errp to qemu_ram_alloc and qemu_ram_alloc_from_ptr
rules.mak: Fix DSO build by pulling in archive symbols
util: Don't link host-utils.o if it's empty
util: Move general qemu_getauxval to util/getauxval.c
trace: Only link generated-tracers.o with "simple" backend
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit 3718d8ab65 ("block: Replace in_use
with operation blocker") broke the error path because it consumed
local_err instead of propagating it.
The caller has no way to know that the function failed. This caused
virtio-blk to start "successfully" even though there was a fatal
dataplane error.
Steps to reproduce:
$ qemu-system-x86_64 -enable-kvm -object iothread,id=iothread0 \
-drive if=none,id=drive0,file=a.img \
(qemu) drive_mirror drive0 /tmp/foo.img
(qemu) device_add virtio-blk-pci,iothread=iothread0,drive=drive0
Expected result:
Since the mirror block job is using drive0 it is not possible to start
virtio-blk data-plane.
device_add fails and the PCI adapter is not added.
Actual result:
device_add completes and the PCI adapter is added.
Cc: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
VMState added by this patch preserves correct
loading of the FDC device state.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Acked-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is the next step for decoupling block accounting functions from
BlockDriverState.
In a future commit the BlockAcctStats structure will be moved from
BlockDriverState to the device models structures.
Note that bdrv_get_stats was introduced so device models can retrieve the
BlockAcctStats structure of a BlockDriverState without being aware of it's
layout.
This function should go away when BlockAcctStats will be embedded in the device
models structures.
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Keith Busch <keith.busch@intel.com>
CC: Anthony Liguori <aliguori@amazon.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Michael Tokarev <mjt@tls.msk.ru>
CC: John Snow <jsnow@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Alexander Graf <agraf@suse.de>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The middle term goal is to move the BlockAcctStats structure in the device models.
(Capturing I/O accounting statistics in the device models is good for billing)
This patch make a small step in this direction by removing a reference to BDRV.
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Keith Busch <keith.busch@intel.com>
CC: Anthony Liguori <aliguori@amazon.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: John Snow <jsnow@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Markus Armbruster <armbru@redhat.com>
CC: Alexander Graf <agraf@suse.de>i
Signed-off-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The Error object was leaked after failed bdrv_new(). While there,
streamline control flow a bit.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Add parameter errp to memory_region_init_rom_device and update all call
sites to propagate the error.
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
[Propagate the error out of realize. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add parameter errp to memory_region_init_ram and update all call sites
to pass in &error_abort.
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A drive that backs a pflash device is special:
- it is very small,
- its entire contents are kept in a RAMBlock at all times, covering the
guest-phys address range that provides the guest's view of the emulated
flash chip.
The pflash device model keeps the drive (the host-side file) and the
guest-visible flash contents in sync. When migrating the guest, the
guest-visible flash contents (the RAMBlock) is migrated by default, but on
the target host, the drive (the host-side file) remains in full sync with
the RAMBlock only if:
- the source and target hosts share the storage underlying the pflash
drive,
- or the migration requests full or incremental block migration too, which
then covers all drives.
Due to the special nature of pflash drives, the following scenario makes
sense as well:
- no full nor incremental block migration, covering all drives, alongside
the base migration (justified eg. by shared storage for "normal" (big)
drives),
- non-shared storage for pflash drives.
In this case, currently only those portions of the flash drive are updated
on the target disk that the guest reprograms while running on the target
host.
In order to restore accord, dump the entire flash contents to the bdrv in
a post_load() callback.
- The read-only check follows the other call-sites of pflash_update();
- both "pfl->ro" and pflash_update() reflect / consider the case when
"pfl->bs" is NULL;
- the total size of the flash device is calculated as in
pflash_cfi01_realize().
When using shared storage, or requesting full or incremental block
migration along with the normal migration, the patch should incur a
harmless rewrite from the target side.
It is assumed that, on the target host, RAM is loaded ahead of the call to
pflash_post_load().
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>