The return value of .save_live_pending() is the number of bytes
remaining. This is just an estimate because we do not know how many
blocks will be dirtied by the running guest.
Currently our return value for .save_live_pending() is wrong because it
includes dirty blocks but not in-flight bdrv_aio_readv() requests or
unsent blocks. Crucially, it also doesn't include the bulk phase where
the entire device is transferred - therefore we risk completing block
migration before all blocks have been transferred!
The return value of .save_live_iterate() is the number of bytes
transferred this iteration. Currently we return whether there are bytes
remaining, which is incorrect.
Move the bytes remaining calculation into .save_live_pending() and
really return the number of bytes transferred this iteration in
.save_live_iterate().
Also fix the %ld format specifier which was used for a uint64_t
argument. PRIu64 must be use to avoid warnings on 32-bit hosts.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-id: 1360661835-28663-3-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The .save_live_iterate() function returns 0 to continue iterating or 1
to stop iterating.
Since 16310a3cca it only ever returns 0,
leading to an infinite loop.
Return 1 if we have finished sending dirty blocks.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1360534366-26723-4-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Commit 43be3a25c9 changed the
blk_mig_save_dirty_block() return code handling. The function's doc
comment says:
/* return value:
* 0: too much data for max_downtime
* 1: few enough data for max_downtime
*/
Because of the 1 return value, callers must check for ret < 0 instead of
just:
if (ret) { ... }
We do not want to bail when 1 is returned, only on error.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1360534366-26723-3-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Show the actual flags value and include "block migration" in the error
message so it's clear where the error is coming from.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1360534366-26723-2-git-send-email-stefanha@redhat.com
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Code just now does (simplified for clarity)
if (qemu_savevm_state_iterate(s->file) == 1) {
vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
qemu_savevm_state_complete(s->file);
}
Problem here is that qemu_savevm_state_iterate() returns 1 when it
knows that remaining memory to sent takes less than max downtime.
But this means that we could end spending 2x max_downtime, one
downtime in qemu_savevm_iterate, and the other in
qemu_savevm_state_complete.
Changed code to:
pending_size = qemu_savevm_state_pending(s->file, max_size);
DPRINTF("pending size %lu max %lu\n", pending_size, max_size);
if (pending_size >= max_size) {
ret = qemu_savevm_state_iterate(s->file);
} else {
vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
qemu_savevm_state_complete(s->file);
}
So what we do is: at current network speed, we calculate the maximum
number of bytes we can sent: max_size.
Then we ask every save_live section how much they have pending. If
they are less than max_size, we move to complete phase, otherwise we
do an iterate one.
This makes things much simpler, because now individual sections don't
have to caluclate the bandwidth (it was implossible to do right from
there).
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Make consistent the result of blk_mig_save_dirty_block() and
mig_save_device_dirty()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
This means we don't need to pass through qemu_file to get the errors.
Adjust all callers.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
When cancelling block migration, all in-flight requests of the block
migration must be completed before the data can be freed. This was
visible as failing assertions and segfaults.
Reported-by: Peter Lieven <pl@dlhnet.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We split it into 2 functions, foo_live_iterate, and foo_live_complete.
At this point, we only remove the bits that are for the other stage,
functionally this is equivalent to previous code.
Signed-off-by: Juan Quintela <quintela@redhat.com>
This patch splits stage 1 to its own function for both save_live
users, ram and block. It is just a copy of the function, removing the
parts of the other stages. Optimizations would came later.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Enable the creation of a method to tell migration if that section is
active and should be migrate. We use it for blk-migration, that is
normally not active. We don't create the method for RAM, as setups
without RAM are very strange O:-)
Signed-off-by: Juan Quintela <quintela@redhat.com>
Notice that the live migration users never unregister, so no problem
about freeing the ops structure.
Signed-off-by: Juan Quintela <quintela@redhat.com>
The Monitor object is passed back and forth within the migration/savevm
code so that it can print errors and progress to the user.
However, that approach assumes a HMP monitor, being completely invalid
in QMP.
This commit drops almost every single usage of the Monitor object, all
monitor_printf() calls have been converted into DPRINTF() ones.
There are a few remaining Monitor objects, those are going to be dropped
by the next commit.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
All files under GPLv2 will get GPLv2+ changes starting tomorrow.
event_notifier.c and exec-obsolete.h were only ever touched by Red Hat
employees and can be relicensed now.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Initially done with the following semantic patch:
@ rule1 @
expression E;
statement S;
@@
E =
(
bdrv_aio_readv
| bdrv_aio_writev
| bdrv_aio_flush
| bdrv_aio_discard
| bdrv_aio_ioctl
)
(...);
(
- if (E == NULL) { ... }
|
- if (E)
{ <... S ...> }
)
which however missed the occurrence in block/blkverify.c
(as it should have done), and left behind some unused
variables.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Many places in QEMU call qemu_aio_flush() to complete all pending
asynchronous I/O. Most of these places actually want to drain all block
requests but there is no block layer API to do so.
This patch introduces the bdrv_drain_all() API to wait for requests
across all BlockDriverStates to complete. As a bonus we perform checks
after qemu_aio_wait() to ensure that requests really have finished.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
These errors were detected by codespell:
remaing -> remaining
soley -> solely
virutal -> virtual
seperate -> separate
libcacard.txt still needs some more patches.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Make *save_live() return negative values when there is one error, and
updates all callers to check for the error.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Now the function returned errno, so it is better the new name.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
error_report() prepends location, and appends a newline. The message
constructed from the arguments should not contain a newline. Fix the
obvious offenders.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
block_mig_state.total_time is currently the sum of the read request
latencies. This is not very accurate because block migration uses aio and
so several requests can be submitted at once. Bandwidth should be computed
with wall-clock time, not by adding the latencies. In this case,
"total_time" has a higher value than it should, and so the computed
bandwidth is lower than it is in reality. This means that migration can
take longer than it needs to.
However, we don't want to use pure wall-clock time here. We are computing
bandwidth in the asynchronous phase, where the migration repeatedly wakes
up and sends some aio requests. The computed bandwidth will be used for
synchronous transfer.
Signed-off-by: Avishay Traeger <avishay@il.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
block_mig_state.reads is an int, and multiplying by BLOCK_SIZE yielded a
negative number, resulting in a negative bandwidth (running on a 32-bit
machine). Change order to avoid.
Signed-off-by: Avishay Traeger <avishay@il.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Set block device in use during block migration, disallow drive_del and
bdrv_truncate for in use devices.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
So that ejection of attached device by guest does not free data
in use by block migration instance.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
b02bea3a85 added a check on the return
value of bdrv_write and aborts migration when it fails. However, if the
size of the block device to migrate is not a multiple of BLOCK_SIZE
(currently 1 MB), the last bdrv_write will fail with -EIO.
Fixed by calling bdrv_write with the correct size of the last block.
Signed-off-by: Pierre Riteau <Pierre.Riteau@irisa.fr>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When block migration is requested and no read-write block device is
present, a divide by zero exception is triggered because
total_sector_sum equals zero.
Signed-off-by: Pierre Riteau <Pierre.Riteau@irisa.fr>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
An old version of this patch was applied to master, so this contains the
differences between v1 and v2.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Block migration can submit multiple AIO reads for the same sector/chunk, but
completion of such reads can happen out of order:
migration guest
- get_dirty(N)
- aio_read(N)
- clear_dirty(N)
write(N)
set_dirty(N)
- get_dirty(N)
- aio_read(N)
If the first aio_read completes after the second, stale data will be
migrated to the destination.
Fix by not allowing multiple AIOs inflight for the same sector.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Currently block_load() doesn't check return value of bdrv_write(), and
even the destination weren't prepared to execute block migration, it
proceeds and guest boots on the target. This patch fix this issue.
Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When there is no block driver associate with BlockDriverState bdrv_getlength
returns -ENOMEDIUM that cause block migration to fail
Signed-off-by: Shahar Havivi <shaharh@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
When available, we'd like to be able to access the DeviceState
when registering a savevm. For buses with a get_dev_path()
function, this will allow us to create more unique savevm
id strings.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
init_blk_migration_it() skips drives with type hint BDRV_TYPE_CDROM.
The intention is to skip read-only drives. However, BDRV_TYPE_CDROM
is only a hint. It is currently sufficent for read-only. But it's
not necessary, and it may not remain sufficient.
Use bdrv_is_read_only() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The bdrv_first linked list of BlockDriverStates is currently extern so
that block migration can iterate the list. However, since there is
already a bdrv_iterate() function there is no need to expose bdrv_first.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Move to stage3 only when remaining work can be done below max downtime.
Use qemu_get_clock_ns for measuring read performance.
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Start transfer dirty blocks during the iterative stage. That will
reduce the time that the guest will be suspended
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>