In case we load the vmstate during incoming migration, we start from a
clean, default machine state as we went through system reset before. But
if we load from a snapshot, the machine can be in any state. That can
cause troubles if loading an older image which does not carry all state
information the executing QEMU requires. Hardly any device takes care of
this scenario.
However, fixing this is trivial. We just need to issue a system reset
during loadvm as well.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
This patch removes all references to signal.h when qemu-common.h is included
as they become redundant.
Signed-off-by: Alexandre Raymond <cerbere@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
helpfull -> helpful
usefull -> useful
cotrol -> control
and a grammar fix.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
commit 82fa39b751
only contains half of the fix. It forgots the save state fix for
UINT8 indexes.
Anthony, please apply, without this migration using hpet is broken.
(only current user).
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
* 'for-anthony' of git://github.com/bonzini/qemu:
remove qemu_get_clock
add a generic scaling mechanism for timers
change all other clock references to use nanosecond resolution accessors
change all rt_clock references to use millisecond resolution accessors
add more helper functions with explicit milli/nanosecond resolution
This was done with:
sed -i 's/qemu_get_clock\>/qemu_get_clock_ns/' \
$(git grep -l 'qemu_get_clock\>' )
sed -i 's/qemu_new_timer\>/qemu_new_timer_ns/' \
$(git grep -l 'qemu_new_timer\>' )
after checking that get_clock and new_timer never occur twice
on the same line. There were no missed occurrences; however, even
if there had been, they would have been caught by the compiler.
There was exactly one false positive in qemu_run_timers:
- current_time = qemu_get_clock (clock);
+ current_time = qemu_get_clock_ns (clock);
which is of course not in this patch.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This was done with:
sed -i '/get_clock\>.*rt_clock/s/get_clock\>/get_clock_ms/' \
$(git grep -l 'get_clock\>.*rt_clock' )
sed -i '/new_timer\>.*rt_clock/s/new_timer\>/new_timer_ms/' \
$(git grep -l 'new_timer\>.*rt_clock' )
after checking that get_clock and new_timer never occur twice
on the same line. There were no missed occurrences; however, even
if there had been, they would have been caught by the compiler.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Define and use dedicated constants for vm_stop reasons, they actually
have nothing to do with the EXCP_* defines used so far. At this chance,
specify more detailed reasons so that VM state change handlers can
evaluate them.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Although it's rare to happen in live migration, when the head of a
byte stream contains 0x05 which is the marker of subsection, the
loader gets corrupted because vmstate_subsection_load() continues even
the device doesn't require it. This patch adds a checker whether
subsection is needed, and skips following routines if not needed.
Signed-off-by: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Before commit 622b520f, index=12 meant bus=1,unit=5.
Since the commit, it means bus=0,unit=12. The drive is created, but
not the guest device. That's because the controllers we use with
if=scsi drives (lsi53c895a and esp) support only 7 units, and
scsi_bus_legacy_handle_cmdline() ignores drives with unit numbers
exceeding that limit.
Changing the mapping of index to bus, unit is a regression. Breaking
-drive invocations that used to work just makes it worse.
Revert the part of commit 622b520f that causes this, and clean up
some.
Note that the fix only affects if=scsi. You can still put more than 7
units on a SCSI bus with -device & friends.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The no_migrate save state flag is currently only checked in the
last phase of migration. This means that we potentially waste
a lot of time and bandwidth with the live state handlers before
we ever check the no_migrate flags. The error message printed
when we catch a non-migratable device doesn't get printed for
a detached migration. And, no_migrate does nothing to prevent
an incoming migration to a target that includes a non-migratable
device. This attempts to fix all of these.
One notable difference in behavior is that an outgoing migration
now checks for non-migratable devices before ever connecting to
the target system. This means the target will remain listening
rather than exit from failure.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There's no need to flush requests after vmstop
as vmstop does it for us automatically now.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
I'd like to disable bandwidth limit or make it very high,
Use int64_t all over to make values >= 4g work.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Jason Wang <jasowang@redhat.com>
Compiling with GCC 4.6.0 20100925 produced warnings like:
/src/qemu/net/tap-win32.c: In function 'tap_win32_open':
/src/qemu/net/tap-win32.c:582:12: error: variable 'hThread' set but not used [-Werror=unused-but-set-variable]
Fix by removing the unused variables.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Fix this warning:
CC savevm.o
/src/qemu/savevm.c: In function `do_savevm':
/src/qemu/savevm.c:1900: warning: passing arg 1 of `localtime_r' from incompatible pointer type
It looks like on OpenBSD the type of tv_sec in struct timeval is still
'long' instead of time_t as in most other OS. Fix by adding a cast.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
When savevm is run without a name, the name stays blank and the snapshot is
saved anyway.
The new behavior is when savevm is run without parameters a name will be
created automaticaly, so the snapshot is accessible to the user without needing
the id when loadvm is run.
(qemu) savevm
(qemu) info snapshots
ID TAG VM SIZE DATE VM CLOCK
1 vm-20100728134640 978K 2010-07-28 13:46:40 00:00:08.603
We use a name with the format 'vm-YYYYMMDDHHMMSS'.
This is a first step to hide the internal id, because I don't see a reason to
expose this kind of internals to the user.
Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The output generated by 'info snapshots' shows only snapshots that exist on the
block device that saves the VM state. This output can cause an user to
erroneously try to load an snapshot that is not available on all block devices.
$ qemu-img snapshot -l xxtest.qcow2
Snapshot list:
ID TAG VM SIZE DATE VM CLOCK
1 1.5M 2010-07-26 16:51:52 00:00:08.599
2 1.5M 2010-07-26 16:51:53 00:00:09.719
3 1.5M 2010-07-26 17:26:49 00:00:13.245
4 1.5M 2010-07-26 19:01:00 00:00:46.763
$ qemu-img snapshot -l xxtest2.qcow2
Snapshot list:
ID TAG VM SIZE DATE VM CLOCK
3 0 2010-07-26 17:26:49 00:00:13.245
4 0 2010-07-26 19:01:00 00:00:46.763
Current output:
$ qemu -hda xxtest.qcow2 -hdb xxtest2.qcow2 -monitor stdio -vnc :0
QEMU 0.12.4 monitor - type 'help' for more information
(qemu) info snapshots
Snapshot devices: ide0-hd0
Snapshot list (from ide0-hd0):
ID TAG VM SIZE DATE VM CLOCK
1 1.5M 2010-07-26 16:51:52 00:00:08.599
2 1.5M 2010-07-26 16:51:53 00:00:09.719
3 1.5M 2010-07-26 17:26:49 00:00:13.245
4 1.5M 2010-07-26 19:01:00 00:00:46.763
Snapshots 1 and 2 do not exist on xxtest2.qcow, but they are displayed anyway.
This patch sumarizes the output to only show fully available snapshots.
New output:
(qemu) info snapshots
ID TAG VM SIZE DATE VM CLOCK
3 1.5M 2010-07-26 17:26:49 00:00:13.245
4 1.5M 2010-07-26 19:01:00 00:00:46.763
Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
A non-migratable device should be removed before migration and re-added after.
Signed-off-by: Cam Macdonell <cam@cs.ualberta.ca>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch improves the resilience of the load_vmstate() function, doing
further and better ordered tests.
In load_vmstate(), if there is any error on bdrv_snapshot_goto(), except if the
error is on VM state device, load_vmstate() will return zero and the VM will be
started with major corruption chances.
The current process:
- test if there is any writable device without snapshot support
- if exists return -error
- get the device that saves the VM state, possible return -error but unlikely
because it was tested earlier
- flush I/O
- run bdrv_snapshot_goto() on devices
- if fails, give an warning and goes to the next (not good!)
- if fails on the VM state device, return zero (not good!)
- check if the requested snapshot exists on the device that saves the VM state
and the state is not zero
- if fails return -error
- open the file with the VM state
- if fails return -error
- load the VM state
- if fails return -error
- return zero
New behavior:
- get the device that saves the VM state
- if fails return -error
- check if the requested snapshot exists on the device that saves the VM state
and the state is not zero
- if fails return -error
- test if there is any writable device without snapshot support
- if exists return -error
- test if the devices with snapshot support have the requested snapshot
- if anyone fails, return -error
- flush I/O
- run snapshot_goto() on devices
- if anyone fails, return -error
- open the file with the VM state
- if fails return -error
- load the VM state
- if fails return -error
- return zero
do_loadvm must not call vm_start if any error has occurred in load_vmstate.
Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Forgot to check for and free these.
Found-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
This commit adds subsections for each device section.
Subsections is the way to handle information that don't need to be sent
to de destination of a migration because its values are not needed. It is
the way to handle optional information. Notice that only the source can
decide if the information is optional or not. The destination needs to
understand all subsections that it receives to have a sucessful load.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
For callers that pass a device we can traverse up the qdev tree and
make use of the BusInfo.get_dev_path information for creating unique
savevm id strings. This avoids needing to rely on the instance number,
which can cause problems with device initialization order and hotplug.
For compatibility, we also store away the old id string and instance
so we can accept migrations from VMs as we add new get_dev_path
implementations.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When available, we'd like to be able to access the DeviceState
when registering a savevm. For buses with a get_dev_path()
function, this will allow us to create more unique savevm
id strings.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
savevm.c keeps a pointer to the snapshot block device. If you manage
to get that device deleted, the pointer dangles, and the next snapshot
operation will crash & burn. Unplugging a guest device that uses it
does the trick:
$ MALLOC_PERTURB_=234 qemu-system-x86_64 [...]
QEMU 0.12.50 monitor - type 'help' for more information
(qemu) info snapshots
No available block device supports snapshots
(qemu) drive_add auto if=none,file=tmp.qcow2
OK
(qemu) device_add usb-storage,id=foo,drive=none1
(qemu) info snapshots
Snapshot devices: none1
Snapshot list (from none1):
ID TAG VM SIZE DATE VM CLOCK
(qemu) device_del foo
(qemu) info snapshots
Snapshot devices:
Segmentation fault (core dumped)
Move management of that pointer to block.c, and zap it when the device
it points becomes unusable.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We find snapshots by iterating over the list of drives defined with
drive_init(). This misses host block devices defined by other means.
Such means don't exist now, but will be introduced later in this
series.
Iterate over all host block devices instead, with bdrv_next().
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Both bdrv_can_snapshot() and bdrv_has_snapshot() does not work as advertized.
First issue: Their names implies different porpouses, but they do the same thing
and have exactly the same code. Maybe copied and pasted and forgotten?
bdrv_has_snapshot() is called in various places for actually checking if there
is snapshots or not.
Second issue: the way bdrv_can_snapshot() verifies if a block driver supports or
not snapshots does not catch all cases. E.g.: a raw image.
So when do_savevm() is called, first thing it does is to set a global
BlockDriverState to save the VM memory state calling get_bs_snapshots().
static BlockDriverState *get_bs_snapshots(void)
{
BlockDriverState *bs;
DriveInfo *dinfo;
if (bs_snapshots)
return bs_snapshots;
QTAILQ_FOREACH(dinfo, &drives, next) {
bs = dinfo->bdrv;
if (bdrv_can_snapshot(bs))
goto ok;
}
return NULL;
ok:
bs_snapshots = bs;
return bs;
}
bdrv_can_snapshot() may return a BlockDriverState that does not support
snapshots and do_savevm() goes on.
Later on in do_savevm(), we find:
QTAILQ_FOREACH(dinfo, &drives, next) {
bs1 = dinfo->bdrv;
if (bdrv_has_snapshot(bs1)) {
/* Write VM state size only to the image that contains the state */
sn->vm_state_size = (bs == bs1 ? vm_state_size : 0);
ret = bdrv_snapshot_create(bs1, sn);
if (ret < 0) {
monitor_printf(mon, "Error while creating snapshot on '%s'\n",
bdrv_get_device_name(bs1));
}
}
}
bdrv_has_snapshot(bs1) is not checking if the device does support or has
snapshots as explained above. Only in bdrv_snapshot_create() the device is
actually checked for snapshot support.
So, in cases where the first device supports snapshots, and the second does not,
the snapshot on the first will happen anyways. I believe this is not a good
behavior. It should be an all or nothing process.
This patch addresses these issues by making bdrv_can_snapshot() actually do
what it must do and enforces better tests to avoid errors in the middle of
do_savevm(). bdrv_has_snapshot() is removed and replaced by bdrv_can_snapshot()
where appropriate.
bdrv_can_snapshot() was moved from savevm.c to block.c. It makes more sense to me.
The loadvm_state() function was updated too to enforce that when loading a VM at
least all writable devices must support snapshots too.
Signed-off-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Anything that moves hundreds of lines out of vl.c can't be all bad.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This patch makes sure that if the exec: process exits with a non-zero return
status, we treat the migration as failed.
This fixes https://bugs.launchpad.net/qemu/+bug/391879
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Some legacy users (mostly PC devices) of vmstate_register manage
instance IDs on their own, and that unfortunately in a way that is
incompatible with automatically generated ones. This so far prevents
switching those users to vmstates that are registered by qdev.
To establish a migration path, this patch introduces the concept of
alias IDs. They can be passed to an extended vmstate registration
service, and qdev provides a set service to be used during device init.
find_se will consider the alias in addition to the default ID. We can
then start generating the default ID automatically and writing it on
vmsave, thus converting that format without breaking support for upward
migration.
The user is required specify the highest vmstate version for which the
alias is required. Once this version falls behind the minimum required
for a specific vmstate, an assertion triggers to motivate cleaning up
the obsolete alias.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The packet(s) sent out after migration are supposed to be RAPR type of
packets. If they are supposed to go anywhere useful, the RAPR ethernet
identifier needs to be fix.
Also see http://www.iana.org/assignments/ethernet-numbers for 0x8035 for
RARP.
Signed-off-by: Stefan Berger <stefanb@us.ibm.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
error_report() terminates the message with a newline. Strip it it
from its arguments.
This fixes a few error messages lacking a newline:
net_handle_fd_param()'s "No file descriptor named %s found", and
tap_open()'s "vnet_hdr=1 requested, but no kernel support for
IFF_VNET_HDR available" (all three versions).
There's one place that passes arguments without newlines
intentionally: load_vmstate(). Fix it up.
This grand cleanup drops all reset and vmsave/load related
synchronization points in favor of four(!) generic hooks:
- cpu_synchronize_all_states in qemu_savevm_state_complete
(initial sync from kernel before vmsave)
- cpu_synchronize_all_post_init in qemu_loadvm_state
(writeback after vmload)
- cpu_synchronize_all_post_init in main after machine init
- cpu_synchronize_all_post_reset in qemu_system_reset
(writeback after system reset)
These writeback points + the existing one of VCPU exec after
cpu_synchronize_state map on three levels of writeback:
- KVM_PUT_RUNTIME_STATE (during runtime, other VCPUs continue to run)
- KVM_PUT_RESET_STATE (on synchronous system reset, all VCPUs stopped)
- KVM_PUT_FULL_STATE (on init or vmload, all VCPUs stopped as well)
This level is passed to the arch-specific VCPU state writing function
that will decide which concrete substates need to be written. That way,
no writer of load, save or reset functions that interact with in-kernel
KVM states will ever have to worry about synchronization again. That
also means that a lot of reasons for races, segfaults and deadlocks are
eliminated.
cpu_synchronize_state remains untouched, just as Anthony suggested. We
continue to need it before reading or writing of VCPU states that are
also tracked by in-kernel KVM subsystems.
Consequently, this patch removes many cpu_synchronize_state calls that
are now redundant, just like remaining explicit register syncs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
savevm without id or tag segfaults in:
(gdb) bt
#0 0x00007f600a83bf8a in __strcmp_sse42 () from /lib64/libc.so.6
#1 0x00000000004745b6 in bdrv_snapshot_find (bs=<value optimized out>,
sn_info=0x7fff996be280, name=0x0) at savevm.c:1631
#2 0x0000000000475c80 in del_existing_snapshots (name=<value optimized out>,
mon=<value optimized out>) at savevm.c:1654
#3 do_savevm (name=<value optimized out>, mon=<value optimized out>)
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
CC savevm.o
cc1: warnings being treated as errors
savevm.c: In function 'file_put_buffer':
savevm.c:342: error: ignoring return value of 'fwrite', declared with attribute warn_unused_result
make: *** [savevm.o] Error 1
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
The effect of this patch with current block migration is that its stage
2, ie. the first full walk-through of the block devices will be
performed completely before RAM migration starts. This ensures that
continuously changing RAM pages are not re-synchronized all the time
while block migration is not completed.
Future versions of block migration which will respect the specified
downtime will generate a different pattern: After RAM migration has
started as well, block migration may also continue to inject dirty
blocks into the RAM stream once it detects that the number of pending
blocks would extend the downtime unacceptably.
Note that all this relies on the current registration order: block
before RAM migration.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
In order to allow proper progress reporting to the monitor that
initiated the migration, forward the monitor reference through the
migration layer down to SaveLiveStateHandler.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Introduce qemu_savevm_state_cancel and inject a stage -1 to cancel a
live migration. This gives the involved subsystems a chance to clean up
dynamically allocated resources. Namely, the block migration layer can
now free its device descriptors and pending blocks.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
When the size that we want to transmit is in another field, but in an
unit different that bytes
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Support for buffer that are pointed by a pointer (i.e. not embedded)
where the size that we want to use is a field in the state.
We also need a new place to store where to start in the middle of the
buffer, as now it is a pointer, not the offset of the 1st field.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>