xemu-project/xemu - xemu - Gitea: Git with a cup of tea

mirror of https://github.com/xemu-project/xemu.git synced 2024-11-27 13:30:52 +00:00

Author	SHA1	Message	Date
Li Zhijian	911965ace9	migration/rdma: advise prefetch write for ODP region The responder mr registering with ODP will sent RNR NAK back to the requester in the face of the page fault. --------- ibv_poll_cq wc.status=13 RNR retry counter exceeded! ibv_poll_cq wrid=WRITE RDMA! --------- ibv_advise_mr(3) helps to make pages present before the actual IO is conducted so that the responder does page fault as little as possible. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Li Zhijian	e2daccb0d0	migration/rdma: Try to register On-Demand Paging memory region Previously, for the fsdax mem-backend-file, it will register failed with Operation not supported. In this case, we can try to register it with On-Demand Paging[1] like what rpma_mr_reg() does on rpma[2]. [1]: https://community.mellanox.com/s/article/understanding-on-demand-paging--odp-x [2]: http://pmem.io/rpma/manpages/v0.9.0/rpma_mr_reg.3 CC: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Li Zhijian	5ad15e8614	migration: allow enabling mutilfd for specific protocol only To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org> CC: Li Zhijian <lizhijian@cn.fujitsu.com> Date: Sat, 31 Jul 2021 22:05:52 +0800 (5 weeks, 4 days, 17 hours ago) And change the default to true so that in '-incoming defer' case, user is able to change multifd capability. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Li Zhijian	b7acd65707	migration: allow multifd for socket protocol only To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org> CC: Li Zhijian <lizhijian@cn.fujitsu.com> Date: Sat, 31 Jul 2021 22:05:51 +0800 (5 weeks, 4 days, 17 hours ago) multifd with unsupported protocol will cause a segment fault. (gdb) bt #0 0x0000563b4a93faf8 in socket_connect (addr=0x0, errp=0x7f7f02675410) at ../util/qemu-sockets.c:1190 #1 0x0000563b4a797a03 in qio_channel_socket_connect_sync (ioc=0x563b4d16e8c0, addr=0x0, errp=0x7f7f02675410) at ../io/channel-socket.c:145 #2 0x0000563b4a797abf in qio_channel_socket_connect_worker (task=0x563b4cd86c30, opaque=0x0) at ../io/channel-socket.c:168 #3 0x0000563b4a792631 in qio_task_thread_worker (opaque=0x563b4cd86c30) at ../io/task.c:124 #4 0x0000563b4a91da69 in qemu_thread_start (args=0x563b4c44bb80) at ../util/qemu-thread-posix.c:541 #5 0x00007f7fe9b5b3f9 in ?? () #6 0x0000000000000000 in ?? () It's enough to check migrate_multifd_is_allowed() in multifd cleanup() and multifd setup() though there are so many other places using migrate_use_multifd(). Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
David Hildenbrand	1230a25f6f	migration/ram: Don't passs RAMState to migration_clear_memory_region_dirty_bitmap_*() The parameter is unused, let's drop it. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Lukas Straub	e9ab82b858	multifd: Unconditionally unregister yank function To: qemu-devel <qemu-devel@nongnu.org> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras Soares Passos <lsoaresp@redhat.com> Date: Wed, 4 Aug 2021 21:26:32 +0200 (5 weeks, 11 hours, 52 minutes ago) [[PGP Signed Part:No public key for 35AB0B289C5DB258 created at 2021-08-04T21:26:32+0200 using RSA]] Unconditionally unregister yank function in multifd_load_cleanup(). If it is not unregistered here, it will leak and cause a crash in yank_unregister_instance(). Now if the ioc is still in use afterwards, it will only lead to qemu not being able to recover from a hang related to that ioc. After checking the code, i am pretty sure that ref is always 1 when arriving here. So all this currently does is remove the unneeded check. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Lukas Straub	20171ea895	multifd: Implement yank for multifd send side To: qemu-devel <qemu-devel@nongnu.org> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras Soares Passos <lsoaresp@redhat.com> Date: Wed, 1 Sep 2021 17:58:57 +0200 (1 week, 15 hours, 17 minutes ago) [[PGP Signed Part:No public key for 35AB0B289C5DB258 created at 2021-09-01T17:58:57+0200 using RSA]] When introducing yank functionality in the migration code I forgot to cover the multifd send side. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Emanuele Giuseppe Esposito	68b88468f6	migration: add missing qemu_mutex_lock_iothread in migration_completion qemu_savevm_state_complete_postcopy assumes the iothread lock (BQL) to be held, but instead it isn't. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20211005080751.3797161-3-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-05 13:10:29 +02:00
Emanuele Giuseppe Esposito	3c158eba1e	migration: block-dirty-bitmap: add missing qemu_mutex_lock_iothread init_dirty_bitmap_migration assumes the iothread lock (BQL) to be held, but instead it isn't. Instead of adding the lock to qemu_savevm_state_setup(), follow the same pattern as the other ->save_setup callbacks and lock+unlock inside dirty_bitmap_save_setup(). Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20211005080751.3797161-2-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-05 13:10:29 +02:00
Markus Armbruster	7d6f6933aa	migration: Handle migration_incoming_setup() errors consistently Commit `b673eab4e2` "multifd: Make multifd_load_setup() get an Error parameter" changed migration_incoming_setup() to take an Error ** argument, and adjusted the callers accordingly. It neglected to change adjust multifd_load_setup(): it still exit()s on error. Clean that up. The error now gets propagated up two call chains: via migration_fd_process_incoming() to rdma_accept_incoming_migration(), and via migration_ioc_process_incoming() to migration_channel_process_incoming(). Both chain ends report the error with error_report_err(), but otherwise ignore it. Behavioral change: we no longer exit() on this error. This is consistent with how we handle other errors here, e.g. from multifd_recv_new_channel() via migration_ioc_process_incoming() to migration_channel_process_incoming(). Whether it's consistently right or consistently wrong I can't tell. Also clean up the return value from the unusual 0 on success, 1 on error to the more common true on success, false on error. Cc: Juan Quintela <quintela@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210720125408.387910-11-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2021-08-26 17:15:28 +02:00
Markus Armbruster	f9734d5d40	error: Use error_fatal to simplify obvious fatal errors (again) We did this with scripts/coccinelle/use-error_fatal.cocci before, in commit `50beeb6809` and `007b06578a`. This commit cleans up rarer variations that don't seem worth matching with Coccinelle. Cc: Thomas Huth <thuth@redhat.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Peter Xu <peterx@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Marc-André Lureau <marcandre.lureau@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210720125408.387910-2-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2021-08-26 17:15:28 +02:00
Wei Wang	3143577d6a	migration: clear the memory region dirty bitmap when skipping free pages When skipping free pages to send, their corresponding dirty bits in the memory region dirty bitmap need to be cleared. Otherwise the skipped pages will be sent in the next round after the migration thread syncs dirty bits from the memory region dirty bitmap. Cc: David Hildenbrand <david@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Message-Id: <20210722083055.23352-1-wei.w.wang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:50:13 +01:00
Peter Xu	39675ffffb	migration: Move the yank unregister of channel_close out It's efficient, but hackish to call yank unregister calls in channel_close(), especially it'll be hard to debug when qemu crashed with some yank function leaked. Remove that hack, but instead explicitly unregister yank functions at the places where needed, they are: (on src) - migrate_fd_cleanup - postcopy_pause (on dst) - migration_incoming_state_destroy - postcopy_pause_incoming Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-6-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:45:03 +01:00
Peter Xu	c6ad5be7ae	migration: Teach QEMUFile to be QIOChannel-aware migration uses QIOChannel typed qemufiles. In follow up patches, we'll need the capability to identify this fact, so that we can get the backing QIOChannel from a QEMUFile. We can also define types for QEMUFile but so far since we only need to be able to identify QIOChannel, introduce a boolean which is simpler. Introduce another helper qemu_file_get_ioc() to return the ioc backend of a qemufile if has_ioc is set. No functional change. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-5-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:44:59 +01:00
Peter Xu	18711405b5	migration: Introduce migration_ioc_[un]register_yank() There're plenty of places in migration/* that checks against either socket or tls typed ioc for yank operations. Provide two helpers to hide all these information. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-4-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:44:54 +01:00
Peter Xu	43044ac0ee	migration: Make from_dst_file accesses thread-safe Accessing from_dst_file is potentially racy in current code base like below: if (s->from_dst_file) do_something(s->from_dst_file); Because from_dst_file can be reset right after the check in another thread (rp_thread). One example is migrate_fd_cancel(). Use the same qemu_file_lock to protect it too, just like to_dst_file. When it's safe to access without lock, comment it. There's one special reference in migration_thread() that can be replaced by the newly introduced rp_thread_created flag. Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <20210722175841.938739-3-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> with Peter's fixup	2021-07-26 12:44:46 +01:00
Peter Xu	53021ea165	migration: Fix missing join() of rp_thread It's possible that the migration thread skip the join() of the rp_thread in below race and crash on src right at finishing migration: migration_thread rp_thread ---------------- --------- migration_completion() (before rp_thread quits) from_dst_file=NULL [thread got scheduled out] s->rp_state.from_dst_file==NULL (skip join() of rp_thread) migrate_fd_cleanup() qemu_fclose(s->to_dst_file) yank_unregister_instance() assert(yank_find_entry()) <------- crash It could mostly happen with postcopy, but that shouldn't be required, e.g., I think it could also trigger with MIGRATION_CAPABILITY_RETURN_PATH set. It's suspected that above race could be the root cause of a recent (but rare) migration-test break reported by either Dave or PMM: https://lore.kernel.org/qemu-devel/YPamXAHwan%2FPPXLf@work-vm/ The issue is: from_dst_file is reset in the rp_thread, so if the thread reset it to NULL fast enough then the migration thread will assume there's no rp_thread at all. This could potentially cause more severe issue (e.g. crash) after the yank code. Fix it by using a boolean to keep "whether we've created rp_thread". Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-2-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:44:26 +01:00
Peter Xu	63268c4970	migration: Move bitmap_mutex out of migration_bitmap_clear_dirty() Taking the mutex every time for each dirty bit to clear is too slow, especially we'll take/release even if the dirty bit is cleared. So far it's only used to sync with special cases with qemu_guest_free_page_hint() against migration thread, nothing really that serious yet. Let's move the lock to be upper. There're two callers of migration_bitmap_clear_dirty(). For migration, move it into ram_save_iterate(). With the help of MAX_WAIT logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so taking the lock once there at the entry. It also means any call sites to qemu_guest_free_page_hint() can be delayed; but it should be very rare, only during migration, and I don't see a problem with it. For COLO, move it up to colo_flush_ram_cache(). I think COLO forgot to take that lock even when calling ramblock_sync_dirty_bitmap(), where another example is migration_bitmap_sync() who took it right. So let the mutex cover both the ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls. It's even possible to drop the lock so we use atomic operations upon rb->bmap and the variable migration_dirty_pages. I didn't do it just to still be safe, also not predictable whether the frequent atomic ops could bring overhead too e.g. on huge vms when it happens very often. When that really comes, we can keep a local counter and periodically call atomic ops. Keep it simple for now. Cc: Wei Wang <wei.w.wang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: Leonardo Bras Soares Passos <lsoaresp@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210630200805.280905-1-peterx@redhat.com> Reviewed-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Peter Xu	ca7bd0821b	migration: Clear error at entry of migrate_fd_connect() For each "migrate" command, remember to clear the s->error before going on. For one reason, when there's a new error it'll be still remembered; see migrate_set_error() who only sets the error if error==NULL. Meanwhile if a failed migration completes (e.g., postcopy recovered and finished), we shouldn't dump an error when calling migrate_fd_cleanup() at last. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210708190653.252961-4-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Peter Xu	ca30f24d12	migration: Don't do migrate cleanup if during postcopy resume Below process could crash qemu with postcopy recovery: 1. (hmp) migrate -d .. 2. (hmp) migrate_start_postcopy 3. [network down, postcopy paused] 4. (hmp) migrate -r $WRONG_PORT when try the recover on an invalid $WRONG_PORT, cleanup_bh will be cleared 5. (hmp) migrate -r $RIGHT_PORT [qemu crash on assert(cleanup_bh)] The thing is we shouldn't cleanup if it's postcopy resume; the error is set mostly because the channel is wrong, so we return directly waiting for the user to retry. migrate_fd_cleanup() should only be called when migration is cancelled or completed. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210708190653.252961-3-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Peter Xu	2e3e3da3c2	migration: Release return path early for paused postcopy When postcopy pause triggered, we rely on the migration thread to cleanup the to_dst_file handle, and the return path thread to cleanup the from_dst_file handle (which is stored in the local variable "rp"). Within the process, from_dst_file cleanup (qemu_fclose) is postponed until it's setup again due to a postcopy recovery. It used to work before yank was born; after yank is introduced we rely on the refcount of IOC to correctly unregister yank function in channel_close(). If without the early and on-time release of from_dst_file handle the yank function will be leftover during paused postcopy. Without this patch, below steps (quoted from Xiaohui) could trigger qemu src crash: 1.Boot vm on src host 2.Boot vm on dst host 3.Enable postcopy on src&dst host 4.Load stressapptest in vm and set postcopy speed to 50M 5.Start migration from src to dst host, change into postcopy mode when migration is active. 6.When postcopy is active, down the network card(do migration via this network) on dst host. 7.Wait untill postcopy is paused on src&dst host. 8.Before up network card, recover migration on dst host, will get error like following. 9.Ignore the error of step 8, go on recovering migration on src host: After step 9, qemu on src host will core dump after some seconds: qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. 1.sh: line 38: 44662 Aborted (core dumped) Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210708190653.252961-2-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Laurent Vivier	a51dcef08b	migration: failover: emit a warning when the card is not fully unplugged When the migration fails or is canceled we wait the end of the unplug operation to be able to plug it back. But if the unplug operation is never finished we stop to wait and QEMU emits a warning to inform the user. Based-on: 20210629155007.629086-1-lvivier@redhat.com Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20210701131458.112036-1-lvivier@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Li Zhijian	224f364a49	migration/rdma: prevent from double free the same mr backtrace: '0x00007ffff5f44ec2 in __ibv_dereg_mr_1_1 (mr=0x7fff1007d390) at /home/lizhijian/rdma-core/libibverbs/verbs.c:478 478 void *addr = mr->addr; (gdb) bt #0 0x00007ffff5f44ec2 in __ibv_dereg_mr_1_1 (mr=0x7fff1007d390) at /home/lizhijian/rdma-core/libibverbs/verbs.c:478 #1 0x0000555555891fcc in rdma_delete_block (block=<optimized out>, rdma=0x7fff38176010) at ../migration/rdma.c:691 #2 qemu_rdma_cleanup (rdma=0x7fff38176010) at ../migration/rdma.c:2365 #3 0x00005555558925b0 in qio_channel_rdma_close_rcu (rcu=0x555556b8b6c0) at ../migration/rdma.c:3073 #4 0x0000555555d652a3 in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:281 #5 0x0000555555d5edf9 in qemu_thread_start (args=0x7fffe88bb4d0) at ../util/qemu-thread-posix.c:541 #6 0x00007ffff54c73f9 in start_thread () at /lib64/libpthread.so.0 #7 0x00007ffff53f3b03 in clone () at /lib64/libc.so.6 ' Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210708144521.1959614-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Olaf Hering	179a808045	migration: fix typo in mig_throttle_guest_down comment Fixes commit `3d0684b2ad` ("ram: Update all functions comments") Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210708162159.18045-1-olaf@aepfle.de> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-07-09 18:42:46 +02:00
Li Zhijian	eb1960aac1	misc: Remove redundant new line in perror() Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210706094433.1766952-1-lizhijian@cn.fujitsu.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-07-09 18:42:46 +02:00
Li Zhijian	e5f607913c	migration/rdma: Use error_report to suppress errno message Since the prior calls are successful, in this case a errno doesn't indicate a real error which would just make us confused. before: (qemu) migrate -d rdma:192.168.22.23:8888 source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect: No space left on device Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210628071959.23455-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Laurent Vivier	944bc52842	migration: failover: continue to wait card unplug on error If the user cancels the migration in the unplug-wait state, QEMU will try to plug back the card and this fails because the card is partially unplugged. To avoid the problem, continue to wait the card unplug, but to allow the migration to be canceled if the card never finishes to unplug use a timeout. Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1976852 Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210629155007.629086-3-lvivier@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Laurent Vivier	fde93d99d9	migration: move wait-unplug loop to its own function The loop is used in migration_thread() and bg_migration_thread(), so we can move it to its own function and call it from these both places. Moreover, in migration_thread() we have a wrong state transition from SETUP to ACTIVE while state could be WAIT_UNPLUG. This is correctly managed in bg_migration_thread() so use this code instead. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20210629155007.629086-2-lvivier@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Peter Xu	b7f9afd48e	migration: Allow reset of postcopy_recover_triggered when failed It's possible qemu_start_incoming_migration() failed at any point, when it happens we should reset postcopy_recover_triggered to false so that the user can still retry with a saner incoming port. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210629181356.217312-3-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Peter Xu	cc48c587d2	migration: Move yank outside qemu_start_incoming_migration() Starting from commit `b5eea99ec2`, qmp_migrate_recover() calls unregister before calling qemu_start_incoming_migration(). I believe it wanted to mitigate the next call to yank_register_instance(), but I think that's wrong. Firstly, if during recover, we should keep the yank instance there, not "quickly removing and adding it back". Meanwhile, calling qmp_migrate_recover() twice with `b5eea99ec2` will directly crash the dest qemu (right now it can't; but it'll start to work right after the next patch) because the 1st call of qmp_migrate_recover() will unregister permanently when the channel failed to establish, then the 2nd call of qmp_migrate_recover() crashes at yank_unregister_instance(). This patch fixes it by moving yank ops out of qemu_start_incoming_migration() into qmp_migrate_incoming. For qmp_migrate_recover(), drop the unregister of yank instance too since we keep it there during the recovery phase. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210629181356.217312-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Feng Lin	c00d434ac6	migration: fix the memory overwriting risk in add_to_iovec When testing migration, a Segmentation fault qemu core is generated. 0 error_free (err=0x1) 1 0x00007f8b862df647 in qemu_fclose (f=f@entry=0x55e06c247640) 2 0x00007f8b8516d59a in migrate_fd_cleanup (s=s@entry=0x55e06c0e1ef0) 3 0x00007f8b8516d66c in migrate_fd_cleanup_bh (opaque=0x55e06c0e1ef0) 4 0x00007f8b8626a47f in aio_bh_poll (ctx=ctx@entry=0x55e06b5a16d0) 5 0x00007f8b8626e71f in aio_dispatch (ctx=0x55e06b5a16d0) 6 0x00007f8b8626a33d in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) 7 0x00007f8b866bdba4 in g_main_context_dispatch () 8 0x00007f8b8626cde9 in glib_pollfds_poll () 9 0x00007f8b8626ce62 in os_host_main_loop_wait (timeout=<optimized out>) 10 0x00007f8b8626cffd in main_loop_wait (nonblocking=nonblocking@entry=0) 11 0x00007f8b862ef01f in main_loop () Using gdb print the struct QEMUFile f = { ..., iovcnt = 65, last_error = 21984, last_error_obj = 0x1, shutdown = true } Well iovcnt is overflow, because the max size of MAX_IOV_SIZE is 64. struct QEMUFile { ...; struct iovec iov[MAX_IOV_SIZE]; unsigned int iovcnt; int last_error; Error *last_error_obj; bool shutdown; }; iovcnt and last_error is overwrited by add_to_iovec(). Right now, add_to_iovec() increase iovcnt before check the limit. And it seems that add_to_iovec() assumes that iovcnt will set to zero in qemu_fflush(). But qemu_fflush() will directly return when f->shutdown is true. The situation may occur when libvirtd restart during migration, after f->shutdown is set, before calling qemu_file_set_error() in qemu_file_shutdown(). So the safiest way is checking the iovcnt before increasing it. Signed-off-by: Feng Lin <linfeng23@huawei.com> Message-Id: <20210625062138.1899-1-linfeng23@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fix typo in 'writeable' which is actually misnamed 'writable'	2021-07-05 10:51:26 +01:00
Philippe Mathieu-Daudé	5590f65fac	migration/tls: Use qcrypto_tls_creds_check_endpoint() Avoid accessing QCryptoTLSCreds internals by using the qcrypto_tls_creds_check_endpoint() helper. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-29 18:30:20 +01:00
David Hildenbrand	8dbe22c686	memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() Let's introduce RAM_NORESERVE, allowing mmap'ing with MAP_NORESERVE. The new flag has the following semantics: " RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge pages if applicable) is skipped: will bail out if not supported. When not set, the OS will do the reservation, if supported for the memory type. " Allow passing it into: - memory_region_init_ram_nomigrate() - memory_region_init_resizeable_ram() - memory_region_init_ram_from_file() ... and teach qemu_ram_mmap() and qemu_anon_ram_alloc() about the flag. Bail out if the flag is not supported, which is the case right now for both, POSIX and win32. We will add Linux support next and allow specifying RAM_NORESERVE via memory backends. The target use case is virtio-mem, which dynamically exposes memory inside a large, sparse memory area to the VM. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-9-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-15 20:27:38 +02:00
Daniel P. Berrangé	85cd1cc668	migration: use GDateTime for formatting timestamp in snapshot names The GDateTime APIs provided by GLib avoid portability pitfalls, such as some platforms where 'struct timeval.tv_sec' field is still 'long' instead of 'time_t'. When combined with automatic cleanup, GDateTime often results in simpler code too. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Daniel P. Berrangé	626ff6515d	migration: add trace point when vm_stop_force_state fails This is a critical failure scenario for migration that is hard to diagnose from existing probes. Most likely it is caused by an error from bdrv_flush(), but we're not logging the errno anywhere, hence this new probe. Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Rao, Lei	3ba024457f	Remove migrate_set_block_enabled in checkpoint We can detect disk migration in migrate_prepare, if disk migration is enabled in COLO mode, we can directly report an error.and there is no need to disable block migration at every checkpoint. Signed-off-by: Lei Rao <lei.rao@intel.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Jason Wang <jasowang@redhat.com>	2021-06-11 10:30:13 +08:00
Peter Xu	a4a571d978	hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds These two commands are missing when adding the QMP sister commands. Add them, so developers can play with them easier. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Message-Id: <4cc0039fc3ad6145136770cf3b0f056c09a2910b.1623027729.git.huangy81@chinatelecom.cn> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 20:18:26 +01:00
Hyman Huang(黄勇)	7afa08cd8f	migration/dirtyrate: make sample page count configurable introduce optional sample-pages argument in calc-dirty-rate, making sample page count per GB configurable so that more accurate dirtyrate can be calculated. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Message-Id: <3103453a3b2796f929269c99a6ad81a9a7f1f405.1623027729.git.huangy81@chinatelecom.cn> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Wrapped a couple of long lines	2021-06-08 20:18:25 +01:00
Dr. David Alan Gilbert	a59136f3b1	migration/socket: Close the listener at the end Delay closing the listener until the cleanup hook at the end; mptcp needs the listener to stay open while the other paths come in. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210421112834.107651-5-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 19:36:19 +01:00
Dr. David Alan Gilbert	1df6ddb43b	migration: Add cleanup hook for inwards migration Add a cleanup hook for incoming migration that gets called at the end as a way for a transport to allow cleanup. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210421112834.107651-4-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 19:36:18 +01:00
Li Zhijian	6b8c2eb5c6	migration/rdma: Fix cm event use after free Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210602023506.3821293-1-lizhijian@cn.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 18:55:43 +01:00
Leonardo Bras	7de2e85653	yank: Unregister function when using TLS migration After yank feature was introduced in migration, whenever migration is started using TLS, the following error happens in both source and destination hosts: (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. This happens because of a missing yank_unregister_function() when using qio-channel-tls. Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform yank_unregister_function() in channel_close() and multifd_load_cleanup(). Also, inside migration_channel_connect() and migration_channel_process_incoming() move yank_register_function() so it only runs once on a TLS migration. Fixes: `b5eea99ec2` ("migration: Add yank feature", 2021-01-13) Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326 Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Peter Xu <peterx@redhat.com> -- Changes since v2: - Dropped all references to ioc->master - yank_register_function() and yank_unregister_function() now only run once in a TLS migration. Changes since v1: - Cast p->c to QIOChannelTLS into multifd_load_cleanup() Message-Id: <20210601054030.1153249-1-leobras.c@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 18:50:03 +01:00
Stefano Garzarella	d0fb9657a3	docs: fix references to docs/devel/tracing.rst Commit `e50caf4a5c` ("tracing: convert documentation to rST") converted docs/devel/tracing.txt to docs/devel/tracing.rst. We still have several references to the old file, so let's fix them with the following command: sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt) Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210517151702.109066-2-sgarzare@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-06-02 06:51:09 +02:00
Peter Maydell	c5847f5e4e	Virtiofs, migration and hmp pull 2021-05-26 Fixes for a loadvm regression from Kevin, some virtiofsd cleanups from Vivek and Mahmoud, and some RDMA migration fixups from Li. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmCuiMIACgkQBRYzHrxb /edPvxAAimpeeMuCBKPswmuxyAqAoes6FKWNohzCyTCWXEmhvZGFl6SEgMxW0YWh DRLfyGUp9CODqNS2oVbj3I7/R5rDvpaqWFJ8GiBH99pc5B+pxYtXjGhjR8JWZ691 ZvoQsV/nN9dC7woIG5rcus09aPg1X8n8lD2GcV007Gacj3FGGHY6+DOmpfYmrdoG QY2OTQ2lMW3l1ACOhtdpWzVKgldYwTDLZsGWuFy+z5b64ijXyImpWa39Hu0HoWdS vds5D9GGMfI8kgOlIgsCoC26kre40952VqhOS5ie3axSZt48kufLMUaspiHMiA/B mwmrqMrNXoqC7pAWnv8Ur4NjEBlTkPtL4+kIMX0zeqJsRG24Fiee0e42CvNjFixF 1F419HHKFQPYOvCb/SPLv18gmcyfHdTEbB+yjYIO48ccrPfu/LtTTKG85D1lFw7C cy3AWKz6EzYuzs0wtia+0K+5UwOhm2h9Z77kyX5MIfyrb+KWAfupFW6l9NnfmBtp +sJBSkIrph6oNmZADx2eSwev+pdlcy0uTOULOiPTF4Iu826LQxoaV9pFcvA+/HXZ EbKNrYxDNNYCugLybtULIiFJ8f1i90/UtQlrqAr21deC+gJYliDZScXnzi6lPuv7 afn4QgMmFLus4ehB0TKcA7QbWL3yyrivXO4TSQNzcZaeXAvHOw0= =dIM2 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20210526a' into staging Virtiofs, migration and hmp pull 2021-05-26 Fixes for a loadvm regression from Kevin, some virtiofsd cleanups from Vivek and Mahmoud, and some RDMA migration fixups from Li. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> # gpg: Signature made Wed 26 May 2021 18:43:30 BST # gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full] # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert/tags/pull-migration-20210526a: migration/rdma: source: poll cm_event from return path migration/rdma: destination: create the return patch after the first accept migration/rdma: Fix rdma_addrinfo res leaks migration/rdma: cleanup rdma in rdma_start_incoming_migration error path migration/rdma: Fix cm_event used before being initialized tools/virtiofsd/fuse_opt.c: Replaced a malloc with GLib's g_try_malloc tools/virtiofsd/buffer.c: replaced a calloc call with GLib's g_try_new0 virtiofsd: Set req->reply_sent right after sending reply virtiofsd: Check EOF before short read virtiofsd: Simplify skip byte logic virtiofsd: get rid of in_sg_left variable virtiofsd: Use iov_discard_front() to skip bytes virtiofsd: Get rid of unreachable code in read virtiofsd: Check for EINTR in preadv() and retry hmp: Fix loadvm to resume the VM on success instead of failure Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-05-27 14:57:01 +01:00
Li Zhijian	e49e49dd73	migration/rdma: source: poll cm_event from return path source side always blocks if postcopy is only enabled at source side. users are not able to cancel this migration in this case. Let source side have chance to cancel this migration Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210525080552.28259-4-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Typo fix	2021-05-26 18:39:32 +01:00
Li Zhijian	44bcfd45e9	migration/rdma: destination: create the return patch after the first accept destination side: $ build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5902,disable-ticketing -incoming rdma:192.168.1.10:8888 (qemu) migrate_set_capability postcopy-ram on (qemu) dest_init RDMA Device opened: kernel name rocep1s0f0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/rocep1s0f0, transport: (2) Ethernet Segmentation fault (core dumped) (gdb) bt #0 qemu_rdma_accept (rdma=0x0) at ../migration/rdma.c:3272 #1 rdma_accept_incoming_migration (opaque=0x0) at ../migration/rdma.c:3986 #2 0x0000563c9e51f02a in aio_dispatch_handler (ctx=ctx@entry=0x563ca0606010, node=0x563ca12b2150) at ../util/aio-posix.c:329 #3 0x0000563c9e51f752 in aio_dispatch_handlers (ctx=0x563ca0606010) at ../util/aio-posix.c:372 #4 aio_dispatch (ctx=0x563ca0606010) at ../util/aio-posix.c:382 #5 0x0000563c9e4f4d9e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:306 #6 0x00007fe96ef3fa9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #7 0x0000563c9e4ffeb8 in glib_pollfds_poll () at ../util/main-loop.c:231 #8 os_host_main_loop_wait (timeout=12188789) at ../util/main-loop.c:254 #9 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530 #10 0x0000563c9e3c7211 in qemu_main_loop () at ../softmmu/runstate.c:725 #11 0x0000563c9dfd46fe in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50 The rdma return path will not be created when qemu incoming is starting since migrate_copy() is false at that moment, then a NULL return path rdma was referenced if the user enabled postcopy later. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210525080552.28259-3-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Li Zhijian	f53b450ada	migration/rdma: Fix rdma_addrinfo res leaks rdma_freeaddrinfo() is the reverse operation of rdma_getaddrinfo() Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210525080552.28259-2-lizhijian@cn.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Li Zhijian	4e812d2338	migration/rdma: cleanup rdma in rdma_start_incoming_migration error path the error path after calling qemu_rdma_dest_init() should do rdma cleanup Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210520081148.17001-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Li Zhijian	efb208dc9c	migration/rdma: Fix cm_event used before being initialized A segmentation fault was triggered when i try to abort a postcopy + rdma migration. since rdma_ack_cm_event releases a uninitialized cm_event in these case. like below: 2496 ret = rdma_get_cm_event(rdma->channel, &cm_event); 2497 if (ret) { 2498 perror("rdma_get_cm_event after rdma_connect"); 2499 ERROR(errp, "connecting to destination!"); 2500 rdma_ack_cm_event(cm_event); <<<< cause segmentation fault 2501 goto err_rdma_source_connect; 2502 } Refer to the rdma_get_cm_event() code, cm_event will be updated/changed only if rdma_get_cm_event() returns 0. So it's okey to remove the ack in error patch. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210519064740.10828-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Paolo Bonzini	b02629550d	replication: move include out of root directory The replication.h file is included from migration/colo.c and tests/unit/test-replication.c, so it should be in include/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:46 +02:00
Peter Maydell	9b1e81d1c2	* Replace YAML anchors by extends in the gitlab-CI yaml files * Many small qtest fixes (e.g. to fix issues discovered by Coverity) * Poison more config switches in common code * Fix the failing Travis-CI and Cirrus-CI tasks -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmCeXFMRHHRodXRoQHJl ZGhhdC5jb20ACgkQLtnXdP5wLbUV7g//VRxN74v2pO6zSsSNavYTkMa3jZE1Io3l w9lsOr3DM0HIMhyEMwSJsiygj0s8TNanUtFBHfDfo1k1gmZYd5FiM1sB7ZB/sB/S 9Q9wjmeV2NMG7pcr9zLmiJaEM2LGfGbso44/m0c+gpSyxzg2TeH7sF+38AUZI9/R J6/gOzIs4WWdSV4f8kp+YPPQCyOtxZsfxDuk1z1fKsBgXE5iEBUqYyyZUm4gYJkN 4ALmqc/NVeLszt08qkPHxwXQc488nCJP31psxx6MQ7toKfCAamT07Pp3oi70cakC y+HVfsIHc7SxaZyj8sbJCU3LJHwDd8N82ydJUrv216qDffb38fNb+m6y9mcYy2MH C25uGe9mucTuTP1WIttC6nCKg/MCgi4PWIzqEhkeKj3TxpTUJJP+BUIfSubV38Gc T+XUCNkWWW8sTeRiyE3m9pEZ+gz1MFubaIr/Owephch1SjRYn4zUwJFHHi3I4PY4 7XjEo8y2M9tKckW3pCfYi+aIDzC1DcRtzvXUGtdemX5xVjTAGlKkFLWl/brdCn2U y+JrwrL8cQ3Fy7bXzmK6m9mmhcIW6PcSOBE7RnSESyKzqM5VBSBgnjMqnHcP3FrV FYzihTCDcFKy3ELUe3Kbc1TgG/07stQs9xInGXN7ZFsIwadfHgoNVQ6JpOqq1oQa fvmLPBhq+a4= =QZuG -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/thuth-gitlab/tags/pull-request-2021-05-14' into staging * Replace YAML anchors by extends in the gitlab-CI yaml files * Many small qtest fixes (e.g. to fix issues discovered by Coverity) * Poison more config switches in common code * Fix the failing Travis-CI and Cirrus-CI tasks # gpg: Signature made Fri 14 May 2021 12:17:39 BST # gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5 # gpg: issuer "thuth@redhat.com" # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full] # gpg: aka "Thomas Huth <thuth@redhat.com>" [full] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * remotes/thuth-gitlab/tags/pull-request-2021-05-14: cirrus.yml: Fix the MSYS2 task pc-bios/s390-ccw: Fix inline assembly for older versions of Clang tests/qtest/migration-test: Use g_autofree to avoid leaks on error paths configure: Poison all current target-specific #defines migration: Move populate_vfio_info() into a separate file include/sysemu: Poison all accelerator CONFIG switches in common code tests: Avoid side effects inside g_assert() arguments tests/qtest/rtc-test: Remove pointless NULL check tests/qtest/tpm-util.c: Free memory with correct free function tests/migration-test: Fix "true" vs true tests/qtest/npcm7xx_pwm-test.c: Avoid g_assert_true() for non-test assertions tests/qtest/ahci-test.c: Calculate iso_size with 64-bit arithmetic util/compatfd.c: Replaced a malloc call with g_malloc. libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm docs/devel/qgraph: add troubleshooting information libqos/qgraph: fix "UNAVAILBLE" typo gitlab-ci: Replace YAML anchors by extends (native_test_job) gitlab-ci: Replace YAML anchors by extends (native_build_job) gitlab-ci: Replace YAML anchors by extends (container_job) tests/docker/dockerfiles: Add ccache to containers where it was missing Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-05-14 19:33:23 +01:00
Thomas Huth	43bd0bf30f	migration: Move populate_vfio_info() into a separate file The CONFIG_VFIO switch only works in target specific code. Since migration/migration.c is common code, the #ifdef does not have the intended behavior here. Move the related code to a separate file now which gets compiled via specific_ss instead. Fixes: `3710586caa` ("qapi: Add VFIO devices migration stats in Migration stats") Message-Id: <20210414112004.943383-3-thuth@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-05-14 12:31:51 +02:00
David Hildenbrand	542147f4e5	migration/ram: Use offset_in_ramblock() in range checks We never read or write beyond the used_length of memory blocks when migrating. Make this clearer by using offset_in_ramblock() consistently. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-11-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:14 +01:00
David Hildenbrand	c1668bde5c	migration/multifd: Print used_length of memory block We actually want to print the used_length, against which we check. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-10-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:14 +01:00
David Hildenbrand	898ba906cc	migration/ram: Handle RAM block resizes during postcopy Resizing while migrating is dangerous and does not work as expected. The whole migration code works with the usable_length of a ram block and does not expect this value to change at random points in time. In the case of postcopy, relying on used_length is racy as soon as the guest is running. Also, when used_length changes we might leave the uffd handler registered for some memory regions, reject valid pages when migrating and fail when sending the recv bitmap to the source. Resizing can be trigger after (but not during) a reset in ACPI code by the guest - hw/arm/virt-acpi-build.c:acpi_ram_update() - hw/i386/acpi-build.c:acpi_ram_update() Let's remember the original used_length in a separate variable and use it in relevant postcopy code. Make sure to update it when we resize during precopy, when synchronizing the RAM block sizes with the source. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-9-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:14 +01:00
David Hildenbrand	6a23f6399a	migration/ram: Simplify host page handling in ram_load_postcopy() Add two new helper functions. This will come in come handy once we want to handle ram block resizes while postcopy is active. Note that ram_block_from_stream() will already print proper errors. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-8-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Added brackets in host_page_from_ram_block_offset to cause uintptr_t to cast the sum, to fix armhf-cross build	2021-05-13 18:21:13 +01:00
David Hildenbrand	cc61c703b6	migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init() In case we grow our RAM after ram_postcopy_incoming_init() (e.g., when synchronizing the RAM block state with the migration source), the resized part would not get discarded. Let's perform that when being notified about a resize while postcopy has been advised, but is not listening yet. With precopy, the process is as following: 1. VM created - RAM blocks are created 2. Incomming migration started - Postcopy is advised - All pages in RAM blocks are discarded 3. Precopy starts - RAM blocks are resized to match the size on the migration source. - RAM pages from precopy stream are loaded - Uffd handler is registered, postcopy starts listening 4. Guest started, postcopy running - Pagefaults get resolved, pages get placed Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-7-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
David Hildenbrand	c7c0e72408	migration/ram: Handle RAM block resizes during precopy Resizing while migrating is dangerous and does not work as expected. The whole migration code works on the usable_length of ram blocks and does not expect this to change at random points in time. In the case of precopy, the ram block size must not change on the source, after syncing the RAM block list in ram_save_setup(), so as long as the guest is still running on the source. Resizing can be trigger after (but not during) a reset in ACPI code by the guest - hw/arm/virt-acpi-build.c:acpi_ram_update() - hw/i386/acpi-build.c:acpi_ram_update() Use the ram block notifier to get notified about resizes. Let's simply cancel migration and indicate the reason. We'll continue running on the source. No harm done. Update the documentation. Postcopy will be handled separately. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-5-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Manual merge	2021-05-13 18:21:13 +01:00
Markus Armbruster	372043f389	migration: Drop redundant query-migrate result @blocked Result @blocked is redundant. Unfortunately, we realized this too close to the release to risk dropping it, so we deprecated it instead, in commit `e11ce6c06`. Since it was deprecated from the start, we can delete it without the customary grace period. Do so. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210429140424.2802929-1-armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
Kunkun Jiang	ba1b7c812c	migration/ram: Optimize ram_save_host_page() Starting from pss->page, ram_save_host_page() will check every page and send the dirty pages up to the end of the current host page or the boundary of used_length of the block. If the host page size is a huge page, the step "check" will take a lot of time. It will improve performance to use migration_bitmap_find_dirty(). Tested on Kunpeng 920; VM parameters: 1U 4G (page size 1G) The time of ram_save_host_page() in the last round of ram saving: before optimize: 9250us after optimize: 34us Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20210316125716.1243-3-jiangkunkun@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
Kunkun Jiang	23feba906e	migration/ram: Reduce unnecessary rate limiting When the host page is a huge page and something is sent in the current iteration, migration_rate_limit() should be executed. If not, it can be omitted. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210316125716.1243-2-jiangkunkun@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
David Hildenbrand	1a37352277	migrate/ram: remove "ram_bulk_stage" and "fpo_enabled" The bulk stage is kind of weird: migration_bitmap_find_dirty() will indicate a dirty page, however, ram_save_host_page() will never save it, as migration_bitmap_clear_dirty() detects that it is not dirty. We already fill the bitmap in ram_list_init_bitmaps() with ones, marking everything dirty - it didn't used to be that way, which is why we needed an explicit first bulk stage. Let's simplify: make the bitmap the single source of thuth. Explicitly handle the "xbzrle_enabled after first round" case. Regarding XBZRLE (implicitly handled via "ram_bulk_stage = false" right now), there is now a slight change in behavior: - Colo: When starting, it will be disabled (was implicitly enabled) until the first round actually finishes. - Free page hinting: When starting, XBZRLE will be disabled (was implicitly enabled) until the first round actually finished. - Snapshots: When starting, XBZRLE will be disabled. We essentially only do a single run, so I guess it will never actually get disabled. Postcopy seems to indirectly disable it in ram_save_page(), so there shouldn't be really any change. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Cc: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210216105039.40680-1-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00
Thomas Huth	2068cabd3f	Do not include cpu.h if it's not really necessary Stop including cpu.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-4-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-02 17:24:51 +02:00
Thomas Huth	4c386f8064	Do not include sysemu/sysemu.h if it's not really necessary Stop including sysemu/sysemu.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-2-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-05-02 17:24:50 +02:00
Andrey Gruzdev	82ea3e3b99	migration: Rename 'bs' to 'block' in background snapshot code Rename 'bs' to commonly used 'block' in migration/ram.c background snapshot code. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reported-by: David Hildenbrand <david@redhat.com> Message-Id: <20210401092226.102804-5-andrey.gruzdev@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-04-07 18:37:56 +01:00
Andrey Gruzdev	eeccb99c9d	migration: Pre-fault memory before starting background snasphot This commit solves the issue with userfault_fd WP feature that background snapshot is based on. For any never poluated or discarded memory page, the UFFDIO_WRITEPROTECT ioctl() would skip updating PTE for that page, thereby loosing WP setting for it. So we need to pre-fault pages for each RAM block to be protected before making a userfault_fd wr-protect ioctl(). Fixes: `278e2f551a` (migration: support UFFD write fault processing in ram_save_iterate()) Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20210401092226.102804-4-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Bodged ifdef __linux__ on ram_write_tracking_prepare, should really go in a stub	2021-04-07 18:37:28 +01:00
Andrey Gruzdev	1a8e44a89f	migration: Inhibit virtio-balloon for the duration of background snapshot The same thing as for incoming postcopy - we cannot deal with concurrent RAM discards when using background snapshot feature in outgoing migration. Fixes: `8518278a6a` (migration: implementation of background snapshot thread) Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20210401092226.102804-3-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-04-06 18:56:01 +01:00
Andrey Gruzdev	ecb23efea0	migration: Fix missing qemu_fflush() on buffer file in bg_migration_thread Added missing qemu_fflush() on buffer file holding precopy device state. Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs. Typical configurations often require >200KB for device state and VMDESC. Fixes: `8518278a6a` (migration: implementation of background snapshot thread) Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Message-Id: <20210401092226.102804-2-andrey.gruzdev@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-04-06 18:56:01 +01:00
Peter Maydell	415fa2fe91	For 6.0 misc patches under my radar. V2: - "tests: Add tests for yank with the chardev-change case" updated - drop the readthedoc theme patch -----BEGIN PGP SIGNATURE----- iQJQBAABCAA6FiEEh6m9kz+HxgbSdvYt2ujhCXWWnOUFAmBltIwcHG1hcmNhbmRy ZS5sdXJlYXVAcmVkaGF0LmNvbQAKCRDa6OEJdZac5QzCEACOWNUlT5mTylY52sB4 RXxt+vFFby6aj/M5/tXv7T4EHShWkycV5kEnGjUKiWQXfHfHRfOur6PXkbTMG5zY UBEuVAMWW50O2VQZR51W+kohZxsxNnimK2gnCTNGjDWOiofTFAcDf7Ycfxbg1TYU fsO3m/dl9cy1fBgCsm64+61T60DC5W0JRsxoRCR1qr4vbJtXjoYe9i21GMWOr548 EVZo3XQDe5WYeTRyTpf1lHU0dLPrJqZuKmF6M3IQWXG7+ns7iMA0v/STmwsBwqSr W6vygj2vPKAi1b1X1z/t/IGXP7mOtTZMUZWxhdOcxqEgYyP4rZji02U33CCd0fCi wbD8VOmwvtqPeEHXu/b/dhpacgHis1w8jyJspAcW0MIpFJ+1mn+xtWnmMUlA2cOS Vmgirinycsim9TKA+jS3vTwT+/wwzqtWUY267m09tVhJwxvGOXQH1i+mlRRLoNcs 2vf5iWanRbZgFJme8UYtqYB96pWIJjMa1FkMexJgK3VXgMA+Rjkr4MqIyuPoquyp /3PgoUU1LUmGh8F+mi8m88tpdgad6iM+UWXeRALsP7UFvP1Psjz8f6Fhh8uBeE7E wsdBsdTwwZ3zgLD4DxjpcZdLM+G7PT0nbeodnPWRuwebsYt3FymoCdmkS1CEn9ZT kbQxdeJhTa7QoacZUmQSAoXO6g== =UwXe -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/marcandre/tags/for-6.0-pull-request' into staging For 6.0 misc patches under my radar. V2: - "tests: Add tests for yank with the chardev-change case" updated - drop the readthedoc theme patch # gpg: Signature made Thu 01 Apr 2021 12:54:52 BST # gpg: using RSA key 87A9BD933F87C606D276F62DDAE8E10975969CE5 # gpg: issuer "marcandre.lureau@redhat.com" # gpg: Good signature from "Marc-André Lureau <marcandre.lureau@redhat.com>" [full] # gpg: aka "Marc-André Lureau <marcandre.lureau@gmail.com>" [full] # Primary key fingerprint: 87A9 BD93 3F87 C606 D276 F62D DAE8 E109 7596 9CE5 * remotes/marcandre/tags/for-6.0-pull-request: tests: Add tests for yank with the chardev-change case chardev: Fix yank with the chardev-change case chardev/char.c: Always pass id to chardev_new chardev/char.c: Move object_property_try_add_child out of chardev_new yank: Always link full yank code yank: Remove dependency on qiochannel docs: simplify each section title dbus-vmstate: Increase the size of input stream buffer used during load util: fix use-after-free in module_load_one Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-04-01 17:08:48 +01:00
Lukas Straub	1a92d6d500	yank: Remove dependency on qiochannel Remove dependency on qiochannel by removing yank_generic_iochannel and letting migration and chardev use their own yank function for iochannel. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20ff143fc2db23e27cd41d38043e481376c9cec1.1616521341.git.lukasstraub2@web.de>	2021-04-01 15:27:44 +04:00
Jessica Clarke	76f67bac79	meson: Propagate gnutls dependency to migration Commit `3eacf70bb5` neglected to fix this for softmmu configs, which pull in migration's use of gnutls. This fixes the following compilation failure on Arm-based Macs: In file included from migration/multifd.c:23: In file included from migration/tls.h:25: In file included from include/io/channel-tls.h:26: In file included from include/crypto/tlssession.h:24: include/crypto/tlscreds.h:28:10: fatal error: 'gnutls/gnutls.h' file not found #include <gnutls/gnutls.h> ^~~~~~~~~~~~~~~~~ 1 error generated. (as well as for channel.c and tls.c) Signed-off-by: Jessica Clarke <jrtc27@jrtc27.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20210320171221.37437-1-jrtc27@jrtc27.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-01 09:40:45 +02:00
Vladimir Sementsov-Ogievskiy	4290b4834c	migration/block-dirty-bitmap: make incoming disabled bitmaps busy Incoming enabled bitmaps are busy, because we do bdrv_dirty_bitmap_create_successor() for them. But disabled bitmaps being migrated are not marked busy, and user can remove them during the incoming migration. Then we may crash in cancel_incoming_locked() when try to remove the bitmap that was already removed by user, like this: #0 qemu_mutex_lock_impl (mutex=0x5593d88c50d1, file=0x559680554b20 "../block/dirty-bitmap.c", line=64) at ../util/qemu-thread-posix.c:77 #1 bdrv_dirty_bitmaps_lock (bs=0x5593d88c0ee9) at ../block/dirty-bitmap.c:64 #2 bdrv_release_dirty_bitmap (bitmap=0x5596810e9570) at ../block/dirty-bitmap.c:362 #3 cancel_incoming_locked (s=0x559680be8208 <dbm_state+40>) at ../migration/block-dirty-bitmap.c:918 #4 dirty_bitmap_load (f=0x559681d02b10, opaque=0x559680be81e0 <dbm_state>, version_id=1) at ../migration/block-dirty-bitmap.c:1194 #5 vmstate_load (f=0x559681d02b10, se=0x559680fb5810) at ../migration/savevm.c:908 #6 qemu_loadvm_section_part_end (f=0x559681d02b10, mis=0x559680fb4a30) at ../migration/savevm.c:2473 #7 qemu_loadvm_state_main (f=0x559681d02b10, mis=0x559680fb4a30) at ../migration/savevm.c:2626 #8 postcopy_ram_listen_thread (opaque=0x0) at ../migration/savevm.c:1871 #9 qemu_thread_start (args=0x5596817ccd10) at ../util/qemu-thread-posix.c:521 #10 start_thread () at /lib64/libpthread.so.0 #11 clone () at /lib64/libc.so.6 Note bs pointer taken from bitmap: it's definitely bad aligned. That's because we are in use after free, bitmap is already freed. So, let's make disabled bitmaps (being migrated) busy during incoming migration. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210322094906.5079-2-vsementsov@virtuozzo.com>	2021-03-24 13:41:19 +00:00
Daniel P. Berrangé	cbde7be900	migrate: remove QMP/HMP commands for speed, downtime and cache size The generic 'migrate_set_parameters' command handle all types of param. Only the QMP commands were documented in the deprecations page, but the rationale for deprecating applies equally to HMP, and the replacements exist. Furthermore the HMP commands are just shims to the QMP commands, so removing the latter breaks the former unless they get re-implemented. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-03-18 09:22:55 +00:00
Mahmoud Mandour	373969507a	migration: Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARD Replaced various qemu_mutex_lock calls and their respective qemu_mutex_unlock calls with QEMU_LOCK_GUARD macro. This simplifies the code by eliminating the respective qemu_mutex_unlock calls. Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Message-Id: <20210311031538.5325-7-ma.mandourr@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-03-15 20:01:55 +00:00
Hao Wang	fca676429c	migration/tls: add error handling in multifd_tls_handshake_thread If any error happens during multifd send thread creating (e.g. channel broke because new domain is destroyed by the dst), multifd_tls_handshake_thread may exit silently, leaving main migration thread hanging (ram_save_setup -> multifd_send_sync_main -> qemu_sem_wait(&p->sem_sync)). Fix that by adding error handling in multifd_tls_handshake_thread. Signed-off-by: Hao Wang <wanghao232@huawei.com> Message-Id: <20210209104237.2250941-3-wanghao232@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-03-15 20:01:55 +00:00
Hao Wang	a339149afa	migration/tls: fix inverted semantics in multifd_channel_connect Function multifd_channel_connect() return "true" to indicate failure, which is rather confusing. Fix that. Signed-off-by: Hao Wang <wanghao232@huawei.com> Message-Id: <20210209104237.2250941-2-wanghao232@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-03-15 20:01:55 +00:00
Peter Krempa	6e9f21a2aa	migration: dirty-bitmap: Allow control of bitmap persistence Bitmap's source persistence is transported over the migration stream and the destination mirrors it. In some cases the destination might want to persist bitmaps which are not persistent on the source (e.g. the result of merging bitmaps from a number of layers on the source when migrating into a squashed image) but currently it would need to create another set of persistent bitmaps and merge them. This patch adds a 'transform' property to the alias map which allows overriding the persistence of migrated bitmaps both on the source and destination sides. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <b20afb675917b86f6359ac3591166ac6d4233573.1613150869.git.pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweaks, drop dead conditional] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 15:24:36 -06:00
Peter Krempa	0d1e450c7b	migration: dirty-bitmap: Use struct for alias map inner members Currently the alias mapping hash stores just strings of the target objects internally. In further patches we'll be adding another member which will need to be stored in the map so pass a copy of the whole BitmapMigrationBitmapAlias QAPI struct into the map. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <fc5f27e1fe16cb75e08a248c2d938de3997b9bfb.1613150869.git.pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: adjust long lines] Signed-off-by: Eric Blake <eblake@redhat.com>	2021-02-12 14:50:55 -06:00
Stefan Reiter	e846b74650	migration: only check page size match if RAM postcopy is enabled Postcopy may also be advised for dirty-bitmap migration only, in which case the remote page size will not be available and we'll instead read bogus data, blocking migration with a mismatch error if the VM uses hugepages. Fixes: `58110f0acb` ("migration: split common postcopy out of ram postcopy") Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Message-Id: <20210204163522.13291-1-s.reiter@proxmox.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:52 +00:00
Daniel P. Berrangé	0f0d83a456	migration: introduce snapshot-{save, load, delete} QMP commands savevm, loadvm and delvm are some of the few HMP commands that have never been converted to use QMP. The reasons for the lack of conversion are that they blocked execution of the event thread, and the semantics around choice of disks were ill-defined. Despite this downside, however, libvirt and applications using libvirt have used these commands for as long as QMP has existed, via the "human-monitor-command" passthrough command. IOW, while it is clearly desirable to be able to fix the problems, they are not a blocker to all real world usage. Meanwhile there is a need for other features which involve adding new parameters to the commands. This is possible with HMP passthrough, but it provides no reliable way for apps to introspect features, so using QAPI modelling is highly desirable. This patch thus introduces new snapshot-{load,save,delete} commands to QMP that are intended to replace the old HMP counterparts. The new commands are given different names, because they will be using the new QEMU job framework and thus will have diverging behaviour from the HMP originals. It would thus be misleading to keep the same name. While this design uses the generic job framework, the current impl is still blocking. The intention that the blocking problem is fixed later. None the less applications using these new commands should assume that they are asynchronous and thus wait for the job status change event to indicate completion. In addition to using the job framework, the new commands require the caller to be explicit about all the block device nodes used in the snapshot operations, with no built-in default heuristics in use. Note that the existing "query-named-block-nodes" can be used to query what snapshots currently exist for block nodes. Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-13-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: removed tests for now, the output ordering isn't deterministic	2021-02-08 11:19:52 +00:00
Daniel P. Berrangé	bef7e9e2c7	migration: introduce a delete_snapshot wrapper Make snapshot deletion consistent with the snapshot save and load commands by using a wrapper around the blockdev layer. The main difference is that we get upfront validation of the passed in device list (if any). Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-10-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	f1a9fcdd01	migration: wire up support for snapshot device selection Modify load_snapshot/save_snapshot to accept the device list and vmstate node name parameters previously added to the block layer. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-9-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	f781f84189	migration: control whether snapshots are ovewritten The traditional HMP "savevm" command will overwrite an existing snapshot if it already exists with the requested name. This new flag allows this to be controlled allowing for safer behaviour with a future QMP command. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-8-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	3d3e9b1f66	block: rename and alter bdrv_all_find_snapshot semantics Currently bdrv_all_find_snapshot() will return 0 if it finds a snapshot, -1 if an error occurs, or if it fails to find a snapshot. New callers to be added want to distinguish between the error scenario and failing to find a snapshot. Rename it to bdrv_all_has_snapshot and make it return -1 on error, 0 if no snapshot is found and 1 if snapshot is found. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-7-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	c22d644ca7	block: allow specifying name of block device for vmstate storage Currently the vmstate will be stored in the first block device that supports snapshots. Historically this would have usually been the root device, but with UEFI it might be the variable store. There needs to be a way to override the choice of block device to store the state in. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-6-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	cf3a74c94f	block: add ability to specify list of blockdevs during snapshot When running snapshot operations, there are various rules for which blockdevs are included/excluded. While this provides reasonable default behaviour, there are scenarios that are not well handled by the default logic. Some of the conditions do not have a single correct answer. Thus there needs to be a way for the mgmt app to provide an explicit list of blockdevs to perform snapshots across. This can be achieved by passing a list of node names that should be used. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210204124834.774401-5-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	f61fe11aa6	migration: stop returning errno from load_snapshot() None of the callers care about the errno value since there is a full Error object populated. This gives consistency with save_snapshot() which already just returns a boolean value. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> [PMD: Return false/true instead of -1/0, document function] Acked-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210204124834.774401-4-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Philippe Mathieu-Daudé	7ea14df230	migration: Make save_snapshot() return bool, not 0/-1 Just for consistency, following the example documented since commit `e3fe3988d7` ("error: Document Error API usage rules"), return a boolean value indicating an error is set or not. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210204124834.774401-3-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Daniel P. Berrangé	e26f98e209	block: push error reporting into bdrv_all__snapshot functions The bdrv_all__snapshot functions return a BlockDriverState pointer for the invalid backend, which the callers then use to report an error message. In some cases multiple callers are reporting the same error message, but with slightly different text. In the future there will be more error scenarios for some of these methods, which will benefit from fine grained error message reporting. So it is helpful to push error reporting down a level. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> [PMD: Initialize variables] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210204124834.774401-2-berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Dr. David Alan Gilbert	3af8554bd0	migration: Add blocker information Modify query-migrate so that it has a flag indicating if outbound migration is blocked, and if it is a list of reasons. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210202135522.127380-2-dgilbert@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Markus Armbruster	54270c450a	migration: Fix a few absurdly defective error messages migrate_params_check() has a number of error messages of the form Parameter 'NAME' expects is invalid, it should be ... Fix them to something like Parameter 'NAME' expects a ... Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210202141734.2488076-5-armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Markus Armbruster	7bfc47936e	migration: Fix cache_init()'s "Failed to allocate" error messages cache_init() attempts to handle allocation failure. The two error messages are garbage, as untested error messages commonly are: Parameter 'cache size' expects Failed to allocate cache Parameter 'cache size' expects Failed to allocate page cache Fix them to just Failed to allocate cache Failed to allocate page cache Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210202141734.2488076-4-armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Markus Armbruster	8b9407a09f	migration: Clean up signed vs. unsigned XBZRLE cache-size `73af8dd8d7` "migration: Make xbzrle_cache_size a migration parameter" (v2.11.0) made the new parameter unsigned (QAPI type 'size', uint64_t in C). It neglected to update existing code, which continues to use int64_t. migrate_xbzrle_cache_size() returns the new parameter. Adjust its return type. QMP query-migrate-cache-size returns migrate_xbzrle_cache_size(). Adjust its return type. migrate-set-parameters passes the new parameter to xbzrle_cache_resize(). Adjust its parameter type. xbzrle_cache_resize() passes it on to cache_init(). Adjust its parameter type. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210202141734.2488076-3-armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Andrey Gruzdev	8518278a6a	migration: implementation of background snapshot thread Introducing implementation of 'background' snapshot thread which in overall follows the logic of precopy migration while internally utilizes completely different mechanism to 'freeze' vmstate at the start of snapshot creation. This mechanism is based on userfault_fd with wr-protection support and is Linux-specific. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210129101407.103458-5-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Andrey Gruzdev	278e2f551a	migration: support UFFD write fault processing in ram_save_iterate() In this particular implementation the same single migration thread is responsible for both normal linear dirty page migration and procesing UFFD page fault events. Processing write faults includes reading UFFD file descriptor, finding respective RAM block and saving faulting page to the migration stream. After page has been saved, write protection can be removed. Since asynchronous version of qemu_put_buffer() is expected to be used to save pages, we also have to flush migraion stream prior to un-protecting saved memory range. Write protection is being removed for any previously protected memory chunk that has hit the migration stream. That's valid for pages from linear page scan along with write fault pages. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210129101407.103458-4-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> fixup pagefault.address cast for 32bit	2021-02-08 11:19:51 +00:00
Andrey Gruzdev	6e8c25b4c6	migration: introduce 'background-snapshot' migration capability Add new capability to 'qapi/migration.json' schema. Update migrate_caps_check() to validate enabled capability set against introduced one. Perform checks for required kernel features and compatibility with guest memory backends. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210129101407.103458-2-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Wainer dos Santos Moschetta	1dfafcbd39	migration/qemu-file: Fix maybe uninitialized on qemu_get_buffer_in_place() Fixed error when compiling migration/qemu-file.c with -Werror=maybe-uninitialized as shown here: ../migration/qemu-file.c: In function 'qemu_get_buffer_in_place': ../migration/qemu-file.c:604:18: error: 'src' may be used uninitialized in this function [-Werror=maybe-uninitialized] 604 \| *buf = src; \| ~~~~~^~~~~ cc1: all warnings being treated as errors Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Message-Id: <20210128130625.569900-1-wainersm@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Jinhao Gao	39f633d429	savevm: Fix memory leak of vmstate_configuration When VM migrate VMState of configuration, the fields(name and capabilities) of configuration having a flag of VMS_ALLOC need to allocate memory. If the src doesn't free memory of capabilities in SaveState after save VMState of configuration, or the dst doesn't free memory of name and capabilities in post load of configuration, it may result in memory leak of name and capabilities. We free memory in configuration_post_save and configuration_post_load func, which prevents memory leak. Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Jinhao Gao <gaojinhao@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20201231061020.828-3-gaojinhao@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-02-08 11:19:51 +00:00
Eric Blake	95b3a8c8a8	qapi: More complex uses of QAPI_LIST_APPEND These cases require a bit more thought to review; in each case, the code was appending to a list, but not with a FOOList **tail variable. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210113221013.390592-6-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Flawed change to qmp_guest_network_get_interfaces() dropped] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2021-01-28 08:08:45 +01:00
Lukas Straub	b5eea99ec2	migration: Add yank feature Register yank functions on sockets to shut them down. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <484c6a14cc2506bebedd5a237259b91363ff8f88.1609167865.git.lukasstraub2@web.de> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2021-01-13 10:21:17 +01:00
Peter Maydell	729cc68373	Remove superfluous timer_del() calls This commit is the result of running the timer-del-timer-free.cocci script on the whole source tree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20201215154107.3255-4-peter.maydell@linaro.org	2021-01-08 15:13:38 +00:00
Paolo Bonzini	b1def33d19	zstd: convert to meson Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-01-06 10:21:20 +01:00
Peter Maydell	41192db338	Machine queue, 2020-12-23 Cleanup: * qdev code cleanup (Eduardo Habkost) Bug fix: * hostmem: Free host_nodes list right after visited (Keqian Zhu) -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEWjIv1avE09usz9GqKAeTb5hNxaYFAl/jteYUHGVoYWJrb3N0 QHJlZGhhdC5jb20ACgkQKAeTb5hNxaZUHw//c40nRlYdGSV5j6w3ZCSlmZRFxZTU UiLK51Z3hI9Q9kyLcoIQitEYlQTIbgp0qlIJ6evDd/HvQvZ+P4P0Lzm1UGOZhD0h nJk5+bBkP/mzMh0P9oiN20DSLk6a3Wvdiu/bQR8gm/WdLvTM1Zjek1ns5tL06ZvA MziG6gIypgScu2FeNxD0zC8sDO16oVrzKq7mjZcQe6XYFRsJmYjZw84v+uu/Bdf7 MBxolkA8vYwwBJNdVsAf7I0Gw3BeArgPUOwbWyt8/tuGIOZxYjdKIj55S7j2fuju 524sg8Di+YzxmLZaNAGksEBMj9uY39nwdHGhNElMtWCM9oOPumlps9eyLtpTagfM wmiVrMGWVlXV6c4kZo8R2NSF8hcDr02S7eyrUpITrh09p4nT6fBGG2ufEYiCyNao o9ZqMf7NUO5J60zM5EOfdGxpaN2O0M5pXCCN48NtmqvO0wIAfTc9l/OkCrrfVbEO Q/X1jqbj6ZcilSIl9OeLAPi7Xjx26jMeeLPUQtoZnkqDvpk/Vz6Ka1RgGG86QA5z 2W/KCAoVrg6dO4f9vY3x84rf0Ta5kJtp2LezPgG8d++4bMSf2jN00wYvAQuCyqqW zbm8f57YST3vm8XMHPlmtnlKfiLI4wbVUmrDYu3rNI+JgdvhdXseGoErt15ejAcL B5IH2SK4AwMpSsk= =bnjc -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ehabkost-gl/tags/machine-next-pull-request' into staging Machine queue, 2020-12-23 Cleanup: * qdev code cleanup (Eduardo Habkost) Bug fix: * hostmem: Free host_nodes list right after visited (Keqian Zhu) # gpg: Signature made Wed 23 Dec 2020 21:25:58 GMT # gpg: using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6 # gpg: issuer "ehabkost@redhat.com" # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full] # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost-gl/tags/machine-next-pull-request: bugfix: hostmem: Free host_nodes list right after visited qdev: Avoid unnecessary DeviceState* variable at set_prop_arraylen() qdev: Rename qdev_get_prop_ptr() to object_field_prop_ptr() qdev: Move qdev_prop_tpm declaration to tpm_prop.h qdev: Make qdev_class_add_property() more flexible qdev: Make PropertyInfo.create return ObjectProperty* qdev: Move dev->realized check to qdev_property_set() qdev: Wrap getters and setters in separate helpers qdev: Add name argument to PropertyInfo.create method qdev: Add name parameter to qdev_class_add_property() qdev: Avoid using prop->name unnecessarily qdev: Get just property name at error_set_from_qdev_prop_error() sparc: Use DEFINE_PROP for nwindows property qdev: Reuse DEFINE_PROP in all DEFINE_PROP_* macros qdev: Move softmmu properties to qdev-properties-system.h Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-01-01 22:57:15 +00:00
Peter Maydell	1f7c02797f	QAPI patches patches for 2020-12-19 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl/dynUSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZT3igP/3bWwsKR5vKVsDUTmMfrhcgaFvQiaYoG F29Bond8Xy0Zd0gl7OWh/5jKL0vGlrEVPrKfYLUjMnfkeRec/pOkIB2oOmIxpnPs 9zi4kh2hQ3dEoRBuvSnnZzedetYPTuCpWMIjlztkgfxgcimqm8TPNVSxRaSApjC3 Y8108wGwBWVf2C0rhKO9E2xA51uo6khy05i1psUtqUlC+PuDQ/OwzQHM2dnWdDB6 kUwBDK17nhL6WwsYqCyKLSiDModReYfDiY8GS5MDLo74dzwXiatEefCR7+sbM4xq eX/SBoqoeS1jLPNuCryNeGNKvNA2KAbEJTnbQA2NxBXHgZ9/1SxVZFxuPp4nDMSQ N7BDuDI8YtJE479RjT/ZzRG65xadGBSe/HXkXM9mZwh1zitop8SVZ9fArFBHvNzw Y5zAv3fQd54+87psffg4dYFK0wGmqTabLEEuVzM8KIVqcAdYA2yC2b2EHy+vsxuq GMkr0WaA6Sq2gthXmzdTjmUPuHdan/NIhuV6d66SbPNH2oH31piptFxuznyFWSKV isciFFdUrkg5QrF8DSt2nmdwMFf8QGbszqP8QIGMzhJCCS9GXIiGG8f149++q8X8 HO1lFAdLQJdrDwCYmfx36tOvi2rS/rcoTGgvg66UX3xKko1ruoxR1ZWcS54obJN6 vEQDZ+PxubDg =vGLy -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into staging QAPI patches patches for 2020-12-19 # gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits) qobject: Make QString immutable block: Use GString instead of QString to build filenames keyval: Use GString to accumulate value strings json: Use GString instead of QString to accumulate strings migration: Replace migration's JSON writer by the general one qobject: Factor JSON writer out of qobject_to_json() qobject: Factor quoted_str() out of to_json() qobject: Drop qstring_get_try_str() qobject: Drop qobject_get_try_str() Revert "qobject: let object_property_get_str() use new API" block: Avoid qobject_get_try_str() qmp: Fix tracing of non-string command IDs qobject: Move internals to qobject-internal.h hw/rdma: Replace QList by GQueue Revert "qstring: add qstring_free()" qobject: Change qobject_to_json()'s value to GString qobject: Use GString instead of QString to accumulate JSON qobject: Make qobject_to_json_pretty() take a pretty argument monitor: Use GString instead of QString for output buffer hmp: Simplify how qmp_human_monitor_command() gets output ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-01-01 14:33:03 +00:00
Markus Armbruster	3ddba9a9e9	migration: Replace migration's JSON writer by the general one Commit `8118f0950f` "migration: Append JSON description of migration stream" needs a JSON writer. The existing qobject_to_json() wasn't a good fit, because it requires building a QObject to convert. Instead, migration got its very own JSON writer, in commit `190c882ce2` "QJSON: Add JSON writer". It tacitly limits numbers to int64_t, and strings contents to characters that don't need escaping, unlike qobject_to_json(). The previous commit factored the JSON writer out of qobject_to_json(). Replace migration's JSON writer by it. Cc: Juan Quintela <quintela@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201211171152.146877-17-armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-12-19 10:39:16 +01:00
Eric Blake	54aa3de72e	qapi: Use QAPI_LIST_PREPEND() where possible Anywhere we create a list of just one item or by prepending items (typically because order doesn't matter), we can use QAPI_LIST_PREPEND(). But places where we must keep the list in order by appending remain open-coded until later patches. Note that as a side effect, this also performs a cleanup of two minor issues in qga/commands-posix.c: the old code was performing new = g_malloc0(sizeof(*ret)); which 1) is confusing because you have to verify whether 'new' and 'ret' are variables with the same type, and 2) would conflict with C++ compilation (not an actual problem for this file, but makes copy-and-paste harder). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201113011340.463563-5-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> [Straightforward conflicts due to commit `a8aa94b5f8` "qga: update schema for guest-get-disks 'dependents' field" and commit `a10b453a52` "target/mips: Move mips_cpu_add_definition() from helper.c to cpu.c" resolved. Commit message tweaked.] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-12-19 10:20:14 +01:00
Eric Blake	eaedde5255	migration: Refactor migrate_cap_add Instead of taking a list parameter and returning a new head at a distance, just return the new item for the caller to insert into a list via QAPI_LIST_PREPEND. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201113011340.463563-4-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-12-19 10:15:08 +01:00
Eduardo Habkost	ce35e2295e	qdev: Move softmmu properties to qdev-properties-system.h Move the property types and property macros implemented in qdev-properties-system.c to a new qdev-properties-system.h header. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-12-18 15:20:17 -05:00
Tuguoyi	36d0fe6516	migration: Don't allow migration if vm is in POSTMIGRATE The following steps will cause qemu assertion failure: - pause vm by executing 'virsh suspend' - create external snapshot of memory and disk using 'virsh snapshot-create-as' - doing the above operation again will cause qemu crash The backtrace looks like: at /build/qemu-5.0/migration/savevm.c:1401 at /build/qemu-5.0/migration/savevm.c:1453 When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE flag by bdrv_inactivate_all(), and during the second migration the bdrv_inactivate_recurse assert that the bs->open_flags is already BDRV_O_INACTIVE enabled which cause crash. As Vladimir suggested, this patch makes migrate_prepare check the state of vm and return error if it is in RUN_STATE_POSTMIGRATE state. Signed-off-by: Tuguoyi <tu.guoyi@h3c.com> Message-Id: <6b704294ad2e405781c38fb38d68c744@h3c.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reported-by: Li Zhang <li.zhang@cloud.ionos.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-12-18 10:08:25 +00:00
Tuguoyi	2a909dc430	savevm: Delete snapshots just created in case of error bdrv_all_create_snapshot() can fails with some snapshots created, so it's better to delete those snapshots before returns to the caller Signed-off-by: Tuguoyi <tu.guoyi@h3c.com> Message-Id: <1607410416-13563-3-git-send-email-tu.guoyi@h3c.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-12-18 10:08:24 +00:00
Tuguoyi	80ef0586d3	savevm: Remove dead code in save_snapshot() The snapshot in each bs is deleted at the beginning, so there is no need to find the snapshot again. Signed-off-by: Tuguoyi <tu.guoyi@h3c.com> Message-Id: <1607410416-13563-2-git-send-email-tu.guoyi@h3c.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-12-18 10:08:24 +00:00
Paolo Bonzini	e69d50d621	migration, vl: start migration via qmp_migrate_incoming Make qemu_start_incoming_migration local to migration/migration.c. By using the runstate instead of a separate flag, vl need not do anything to setup deferred incoming migration. qmp_migrate_incoming also does not need the deferred_incoming flag anymore, because "-incoming PROTOCOL" will clear the "once" flag before the main loop starts. Therefore, later invocations of the migrate-incoming command will fail with the existing "The incoming migration has already been started" error message. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-10 12:15:14 -05:00
Paolo Bonzini	e0d17dfd22	vl: move various initialization routines out of qemu_init Some very simple initialization routines can be nested in existing subsystem-level functions, do that to simplify qemu_init. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-12-10 12:15:11 -05:00
Chetan Pant	ef19b50d93	migration: Fix Lesser GPL version number There is no "version 2" of the "Lesser" General Public License. It is either "GPL version 2.0" or "Lesser GPL version 2.1". This patch replaces all occurrences of "Lesser GPL version 2" with "Lesser GPL version 2.1" in comment section. Signed-off-by: Chetan Pant <chetan4windows@gmail.com> Message-Id: <20201023123130.19656-1-chetan4windows@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2020-11-15 16:43:28 +01:00
Longpeng(Mike)	6ba11211bd	migration: handle CANCELLING state in migration_completion() The following sequence may cause the VM abort during migration: 1. RUN_STATE_RUNNING,MIGRATION_STATUS_ACTIVE 2. before call migration_completion(), we send migrate_cancel QMP command, the state machine is changed to: RUN_STATE_RUNNING,MIGRATION_STATUS_CANCELLING 3. call migration_completion(), and the state machine is switch to: RUN_STATE_RUNNING,MIGRATION_STATUS_COMPLETED 4. call migration_iteration_finish(), because the migration status is COMPLETED, so it will try to set the runstate to POSTMIGRATE, but RUNNING-->POSTMIGRATE is an invalid transition, so abort(). The migration_completion() should not change the migration state to COMPLETED if it is already changed to CANCELLING. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Message-Id: <20201105091726.148-1-longpeng2@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 15:52:20 +00:00
Chuan Zheng	9e8424088c	multifd/tls: fix memoryleak of the QIOChannelSocket object when cancelling migration When creating new tls client, the tioc->master will be referenced which results in socket leaking after multifd_save_cleanup if we cancel migration. Fix it by do object_unref() after tls client creation. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Message-Id: <1605104763-118687-1-git-send-email-zhengchuan@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 15:52:20 +00:00
Chuan Zheng	a18ed79b19	migration/dirtyrate: simplify includes in dirtyrate.c Remove redundant blank line which is left by Commit `662770af7c`, also take this opportunity to remove redundant includes in dirtyrate.c. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Message-Id: <1604030281-112946-1-git-send-email-zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 15:52:14 +00:00
Chen Qun	a24292830b	migration: fix uninitialized variable warning in migrate_send_rp_req_pages() After the WITH_QEMU_LOCK_GUARD macro is added, the compiler cannot identify that the statements in the macro must be executed. As a result, some variables assignment statements in the macro may be considered as unexecuted by the compiler. When the -Wmaybe-uninitialized capability is enabled on GCC9,the compiler showed warning: migration/migration.c: In function ‘migrate_send_rp_req_pages’: migration/migration.c:384:8: warning: ‘received’ may be used uninitialized in this function [-Wmaybe-uninitialized] 384 \| if (received) { \| ^ Add a default value for 'received' to prevented the warning. Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201111142203.2359370-6-kuhn.chenqun@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 14:49:16 +00:00
Chuan Zheng	a1af605bd5	migration/multifd: fix hangup with TLS-Multifd due to blocking handshake The qemu main loop could hang up forever when we enable TLS+Multifd. The Src multifd_send_0 invokes tls handshake, it sends hello to sever and wait response. However, the Dst main qemu loop has been waiting recvmsg() for multifd_recv_1. Both of Src and Dst main qemu loop are blocking and waiting for reponse which results in hanging up forever. Src: (multifd_send_0) Dst: (multifd_recv_1) multifd_channel_connect migration_channel_process_incoming multifd_tls_channel_connect migration_tls_channel_process_incoming multifd_tls_channel_connect qio_channel_tls_handshake_task qio_channel_tls_handshake gnutls_handshake qio_channel_tls_handshake_task ... qcrypto_tls_session_handshake ... gnutls_handshake ... ... ... recvmsg (Blocking I/O waiting for response) recvmsg (Blocking I/O waiting for response) Fix this by offloadinig handshake work to a background thread. Reported-by: Yan Jin <jinyan12@huawei.com> Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Message-Id: <1604643893-8223-1-git-send-email-zhengchuan@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 14:35:29 +00:00
Philippe Mathieu-Daudé	af3bbbe984	migration/ram: Fix hexadecimal format string specifier The '%u' conversion specifier is for decimal notation. When prefixing a format with '0x', we want the hexadecimal specifier ('%x'). Inspired-by: Dov Murik <dovmurik@linux.vnet.ibm.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201103112558.2554390-5-philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 14:02:41 +00:00
Rao, Lei	b70cb3b485	Reduce the time of checkpoint for COLO we should set ram_bulk_stage to false after ram_state_init, otherwise the bitmap will be unused in migration_bitmap_find_dirty. all pages in ram cache will be flushed to the ram of secondary guest for each checkpoint. Signed-off-by: Lei Rao <lei.rao@intel.com> Signed-off-by: Derek Su <dereksu@qnap.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2020-11-11 16:52:23 +08:00
Peter Xu	5e77343113	migration: Postpone the kick of the fault thread after recover The new migrate_send_rp_req_pages_pending() call should greatly improve destination responsiveness because it will resync faulted address after postcopy recovery. However it is also the 1st place to initiate the page request from the main thread. One thing is overlooked on that migrate_send_rp_message_req_pages() is not designed to be thread-safe. So if we wake the fault thread before syncing all the faulted pages in the main thread, it means they can race. Postpone the wake up operation after the sync of faulted addresses. Fixes: `0c26781c09` ("migration: Sync requested pages after postcopy recovery") Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201102153010.11979-3-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-02 18:25:48 +00:00
Peter Xu	cc5ab87200	migration: Unify reset of last_rb on destination node when recover When postcopy recover happens, we need to reset last_rb after each return of postcopy_pause_fault_thread() because that means we just got the postcopy migration continued. Unify this reset to the place right before we want to kick the fault thread again, when we get the command MIG_CMD_POSTCOPY_RESUME from source. This is actually more than that - because the main thread on destination will now be able to call migrate_send_rp_req_pages_pending() too, so the fault thread is not the only user of last_rb now. Move the reset earlier will allow the first call to migrate_send_rp_req_pages_pending() to use the reset value even if called from the main thread. (NOTE: this is not a real fix to `0c26781c09` mentioned below, however it is just a mark that when picking up `0c26781c09` we'd better have this one too; the real fix will come later) Fixes: `0c26781c09` ("migration: Sync requested pages after postcopy recovery") Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201102153010.11979-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-02 18:25:39 +00:00
Kirti Wankhede	3710586caa	qapi: Add VFIO devices migration stats in Migration stats Added amount of bytes transferred to the VM at destination by all VFIO devices Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2020-11-01 12:30:51 -07:00
Peter Maydell	d55450df99	migration pull: 2020-10-26 Another go at Peter's postcopy fixes Cleanups from Bihong Yu and Peter Maydell. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAl+W9n8ACgkQBRYzHrxb /ef2uRAAqWTFLXuBF8+evEd1mMq2SM3ZYTuc7QKTY3MzAH6J/OMvJbZ112itqWOb iZ5NuuWH4PvzOhlR/PNNf1Yv3hTfv36HinG+OCh6s+6aqVx9yHOAfdBgmJIdYAeg Sk1jx43dvCyN2FwPs31ir3L6mwsrtfkRsS+2FeyrvRoEl4WE9mOoypCft3vdd9Dw zZea0Pw7vIs454D4n1vpJiQtq6B4eSAlQKpTLfQbglpTm4MgqLERzGvpT6hbQXJR eQyTOqRe08viIOZ+oN0B/+RVO6T9jc4Y1bEl2NSak1v4Tf7NNfDkFpLAjFm07V/1 tIhL/NOOsHdzfHQtrZpzKQgwaceb1N5qo0PfxD6/tRf9HlXY54iw6yY75+5c5Y89 UK8VSIYKnM2yXeVDLShxixIr3A1Z+zA41XydDwaLZczjeV7+nwrAXAjO8a+j6Dox zj4IyN2g5elEOmarC8qkvbDZ+TVvA2tookhWVwoz+D8ChYkcRDKP9eoYomfRwg+e NKRFuLBkyVPb0eEhyOV6HqJbMfTLpHneTM94v6HGz8tiK8IlMZfTTnC2Mr5gTXuS /cgOVhsY7+l+pKpxpGJmU3aUCYRk1CuK6MhXgjYEFMh5Siba8s0ZPZVaEm/BUyO1 rD+tVup87xMiJq3xnmLX+opblYE9G+b67hH1KuPc5vZXiSwuTkQ= =OL0Q -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20201026a' into staging migration pull: 2020-10-26 Another go at Peter's postcopy fixes Cleanups from Bihong Yu and Peter Maydell. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> # gpg: Signature made Mon 26 Oct 2020 16:17:03 GMT # gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full] # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert/tags/pull-migration-20201026a: migration-test: Only hide error if !QTEST_LOG migration/postcopy: Release fd before going into 'postcopy-pause' migration: Sync requested pages after postcopy recovery migration: Maintain postcopy faulted addresses migration: Introduce migrate_send_rp_message_req_pages() migration: Pass incoming state into qemu_ufd_copy_ioctl() migration: using trace_ to replace DPRINTF migration: Delete redundant spaces migration: Open brace '{' following function declarations go on the next line migration: Do not initialise statics and globals to 0 or NULL migration: Add braces {} for if statement migration: Open brace '{' following struct go on the same line migration: Add spaces around operator migration: Don't use '#' flag of printf format migration: Do not use C99 // comments migration: Drop unused VMSTATE_FLOAT64 support Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-27 10:25:42 +00:00
Peter Maydell	091e3e3dbc	bitmaps patches for 2020-10-26 - fix infloop on large bitmap granularity - silence compiler warning -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAl+WuYYACgkQp6FrSiUn Q2owKwgAhXf5jJEoUzSMFIIczxST9eaokL4Jweb65bp2y4Ii014/kzAnwyU+7B1Y DMniSrpySi1F74zng2eOKijgNDgQ1INm1v5P/qeoMa4UpPwL3eaNQl3FwMUCJMdN WVo3jl1wxzqcRDVUWDWiAMD0C03nzRKI8i2uw4USUDXiW12bLC0YQPzOCofZagld xC1HAnqRMCnK10lj4aoY5jOHQL9R05TLtsEcQCdI73q3c87p+sxI1eOlqJcZw1KC vgb6uMOXQrkTPKfVsUK5bJ+aJh97LHhWb/o/Jj65f5IMbBQzaebRby3m38Pgph2W +AVuxe2FBAo7o+heucbOrrFwAUcofw== =ArcV -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-bitmaps-2020-10-26' into staging bitmaps patches for 2020-10-26 - fix infloop on large bitmap granularity - silence compiler warning # gpg: Signature made Mon 26 Oct 2020 11:56:54 GMT # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-bitmaps-2020-10-26: migration/block-dirty-bitmap: fix uninitialized variable warning migration/block-dirty-bitmap: fix larger granularity bitmaps Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-26 22:36:35 +00:00
Peter Xu	d246ea5039	migration/postcopy: Release fd before going into 'postcopy-pause' Logically below race could trigger with the old code: test program migration thread ------------ ---------------- wait_until('postcopy-pause') postcopy_pause() set_state('postcopy-pause') do_postcopy_recover() arm s->to_dst_file with new fd release s->to_dst_file [1] Here [1] could have released the just-installed recoverying channel. Then the migration could hang without really resuming. Instead, it should be very safe to release the fd before setting the state into 'postcopy-pause', because there's no reason for any other thread to touch it during 'postcopy-active'. Dave reported a very rare postcopy recovery hang that the migration-test program waited for the migration to complete in migrate_postcopy_complete(). We do suspect it's the same thing that we're gonna fix here. Hard to tell. However since we've noticed this, fix this irrelevant of the hang report. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-6-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Peter Xu	0c26781c09	migration: Sync requested pages after postcopy recovery We synchronize the requested pages right after a postcopy recovery happens. This helps to synchronize the prioritized pages on source so that the faulted threads can be served faster. Reported-by: Xiaohui Li <xiaohli@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-5-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Peter Xu	8f8bfffcf1	migration: Maintain postcopy faulted addresses Maintain a list of faulted addresses on the destination host for which we're waiting on. This is implemented using a GTree rather than a real list to make sure even there're plenty of vCPUs/threads that are faulting, the lookup will still be fast with O(log(N)) (because we'll do that after placing each page). It should bring a slight overhead, but ideally that shouldn't be a big problem simply because in most cases the requested page list will be short. Actually we did similar things for postcopy blocktime measurements. This patch didn't use that simply because: (1) blocktime measurement is towards vcpu threads only, but here we need to record all faulted addresses, including main thread and external thread (like, DPDK via vhost-user). (2) blocktime measurement will require UFFD_FEATURE_THREAD_ID, but here we don't want to add that extra dependency on the kernel version since not necessary. E.g., we don't need to know which thread faulted on which page, we also don't care about multiple threads faulting on the same page. But we only care about what addresses are faulted so waiting for a page copying from src. (3) blocktime measurement is not enabled by default. However we need this by default especially for postcopy recover. Another thing to mention is that this patch introduced a new mutex to serialize the receivedmap and the page_requested tree, however that serialization does not cover other procedures like UFFDIO_COPY. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-4-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Peter Xu	7a267fc49b	migration: Introduce migrate_send_rp_message_req_pages() This is another layer wrapper for sending a page request to the source VM. The new migrate_send_rp_message_req_pages() will be used elsewhere in coming patches. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-3-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Peter Xu	eef621c4e6	migration: Pass incoming state into qemu_ufd_copy_ioctl() It'll be used in follow up patches to access more fields out of it. Meanwhile fetch the userfaultfd inside the function. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	fe80c0241d	migration: using trace_ to replace DPRINTF Signed-off-by: Bihong Yu <yubihong@huawei.com> Message-Id: <1603179176-5360-1-git-send-email-yubihong@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	0bcae62333	migration: Delete redundant spaces Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-9-git-send-email-yubihong@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	cbfc71b52b	migration: Open brace '{' following function declarations go on the next line Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-8-git-send-email-yubihong@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	49324e939c	migration: Do not initialise statics and globals to 0 or NULL Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-7-git-send-email-yubihong@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	f4c51a6bfd	migration: Add braces {} for if statement Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-6-git-send-email-yubihong@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	f16aee44b4	migration: Open brace '{' following struct go on the same line Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-5-git-send-email-yubihong@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	395cb45009	migration: Add spaces around operator Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-4-git-send-email-yubihong@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	29fccade10	migration: Don't use '#' flag of printf format Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1603163448-27122-3-git-send-email-yubihong@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Bihong Yu	01371c5821	migration: Do not use C99 // comments Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1603163448-27122-2-git-send-email-yubihong@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Peter Maydell	9fe7ef8b66	migration: Drop unused VMSTATE_FLOAT64 support Commit `ef96e3ae96` in January 2019 removed the last user of the VMSTATE_FLOAT64* macros. These were used by targets which defined their floating point register file as an array of 'float64'. We used to try to maintain a stricter distinction between 'float64' (a type for holding an integer representing an IEEE float) and 'uint64_t', including having a debug option for 'float64' being a struct and supposedly mandatory macros for converting between float64 and uint64_t. We no longer think that's a usefully strong distinction to draw and we allow ourselves to freely assume that float64 really is just a 64-bit integer type, so for new targets we would simply recommend use of the uint64_t type for a floating point register file. The float64 type remains as a useful way of documenting in the type signature of helper functions and the like that they expect to receive an IEEE float from the TCG generated code rather than an arbitrary integer. Since the VMSTATE_FLOAT64* macros have no remaining users and we don't recommend new code uses them, delete them. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20201022120830.5938-1-peter.maydell@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-26 16:15:04 +00:00
Chen Qun	a024890a64	migration/block-dirty-bitmap: fix uninitialized variable warning A default value is provided for the variable 'bitmap_name' to avoid a compiler warning. The compiler showed the warning: migration/block-dirty-bitmap.c:1090:13: warning: ‘bitmap_name’ may be used uninitialized in this function [-Wmaybe-uninitialized] g_strlcpy(s->bitmap_name, bitmap_name, sizeof(s->bitmap_name)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Message-Id: <20201014114430.1898684-1-kuhn.chenqun@huawei.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: commit message grammar tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-26 06:56:24 -05:00
Stefan Reiter	ed7b70c27b	migration/block-dirty-bitmap: fix larger granularity bitmaps sectors_per_chunk is a 64 bit integer, but the calculation is done in 32 bits, leading to an overflow for coarse bitmap granularities. If that results in the value 0, it leads to a hang where no progress is made but send_bitmap_bits is constantly called with nr_sectors being 0. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> Message-Id: <20201021144456.1072-1-s.reiter@proxmox.com> Fixes: `b35ebdf07` migration: add postcopy migration of dirty bitmaps Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: Use correct type for 8ULL, use () to avoid overflow] Signed-off-by: Eric Blake <eblake@redhat.com>	2020-10-26 06:55:37 -05:00
Paolo Bonzini	9f2931bc65	machine: remove deprecated -machine enforce-config-section option Deprecated since 3.1 and complicates the initialization sequence, remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-26 07:08:39 -04:00
Philippe Mathieu-Daudé	28af9ba260	qapi: Restrict Xen migration commands to migration.json Restricting xen-set-global-dirty-log and xen-load-devices-state commands migration.json pulls slightly less QAPI-generated code into user-mode and tools. Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201012121536.3381997-6-philmd@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2020-10-21 05:00:44 +02:00
Peter Maydell	96292515c0	Trivial Patches Pull request 20201013 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl+FlGcSHGxhdXJlbnRA dml2aWVyLmV1AAoJEPMMOL0/L748eJgQALXBL9j942qnCFfwntG/lm1ZABAjHxMb 6oRt7J6+iKImOuZrZ8m87ET90sO+dPl3u+L9mq7BfaM2vewarwSSIYFrfyTO2Acf sFzKCfBC/mJJBzI0AqvV3caHGCxFhhzCxPO25JPC2yxgyHxTvG3k0krs5C6Wv9BF nHQPE9PZHguLpvJSH8Wmr70rdCbAOHwaIxkVIN9au/1nVktvPk9vPSjZVFpoMQRA gwIRb+Lo0Chqb9DY2Ino/0AFAMV8CbfopLZt8r8pg3mGFdh2U/KEkmtxDTYdVdbr d2LDAhiNP7C9SNRF7VyFcW21YpWOjWG+vzYnPl5KKces1fAbrseTD8fcNrqf5JZc ont2DdpmGZ+reE3ekyoT2YBk4tz3wGtCDoN19QAwFIvVYRyZW52HLg5zCNb2hq4T 1/J5BQvhPxpbY7hFN6QkQa6i2e6EB2kqtrL/H3pjrw8CLAhQ2ZviCZvyLRpv26w/ OzY5+u2GMdo27+EBqxkbpgZO86GWAhPPzqq4Rnd+wMcttTIHbyqALHIdAaqKz29T 4fVF/nULBJcZL8srz+QdU7xlW9ETbZ+fyTWEo0ZZZnntNlD4ZzQB7HVt8d/kNoC6 wR0I+gt/QAvlPIIP8pWa3MXKYcFNNXGWH2p0WK+jNUDVk7NnNkYkgBQigLPZw973 uOHb/cR9lyrN =qJdy -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-5.2-pull-request' into staging Trivial Patches Pull request 20201013 # gpg: Signature made Tue 13 Oct 2020 12:49:59 BST # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/trivial-branch-for-5.2-pull-request: meson.build: drop duplicate 'sparc64' entry mingw: fix error __USE_MINGW_ANSI_STDIO redefined target/sparc/int32_helper: Remove duplicated 'Tag Overflow' entry goldfish_rtc: change MemoryRegionOps endianness to DEVICE_NATIVE_ENDIAN hw/char/serial: remove duplicate .class_init in serial_mm_info block/blkdebug: fix memory leak hw/pci: Fix typo in PCI hot-plug error message softmmu/memory: Log invalid memory accesses hw/acpi/piix4: Rename piix4_pm_add_propeties() to piix4_pm_add_properties() vmdk: fix maybe uninitialized warnings tests/test-char: Use a proper fallthrough comment hw/block/nvme: Simplify timestamp sum target/i386/cpu: Update comment that mentions Texinfo qemu-img-cmds.hx: Update comment that mentions Texinfo Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-10-13 14:06:22 +01:00
Marc-André Lureau	662770af7c	mingw: fix error __USE_MINGW_ANSI_STDIO redefined Always put osdep.h first, and remove redundant stdlib.h include. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20201008165953.884599-1-marcandre.lureau@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-10-13 13:33:46 +02:00
Philippe Mathieu-Daudé	7e6edef3f8	migration: Move the creation of the library to the main meson.build Be consistent creating all the libraries in the main meson.build file. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201006125602.2311423-6-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-10-12 11:50:20 -04:00
Chuan Zheng	b1a859cfb0	migration/dirtyrate: present dirty rate only when querying the rate has completed Make dirty_rate field optional, present dirty rate only when querying the rate has completed. The qmp results is shown as follow: @unstarted: {"return":{"status":"unstarted","start-time":0,"calc-time":0},"id":"libvirt-12"} @measuring: {"return":{"status":"measuring","start-time":102931,"calc-time":1},"id":"libvirt-85"} @measured: {"return":{"status":"measured","dirty-rate":4,"start-time":150146,"calc-time":1},"id":"libvirt-15"} Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Message-Id: <1601350938-128320-3-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-12 12:39:38 +01:00
Chuan Zheng	aa84b506f7	migration/dirtyrate: record start_time and calc_time while at the measuring state Querying could include both the start-time and the calc-time while at the measuring state, allowing a caller to determine when they should expect to come back looking for a result. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Message-Id: <1601350938-128320-2-git-send-email-zhengchuan@huawei.com> Reviewed-by: David Edmondson <david.edmondson@oracle.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-10-12 12:39:38 +01:00

1 2 3 4 5 ...

1546 Commits