This reverts commit c995b495b9.
From Jan Kiszka:
Ouch, indeed. Moreover, CPU_SAVE_VERSION was not updated (likely the
reason for the breakage). Thanks for debugging this!
Anthony (or whoever), please revert this unneeded commit in qemu.git.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
There are no generic bits remaining in the handling of KVM_EXIT_DEBUG.
So push its logic completely into arch hands, i.e. only x86 so far.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Required for next patch which will access guest debug services from
kvm_arch_handle_exit. No functional changes.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Make the return code of kvm_arch_handle_exit directly usable for
kvm_cpu_exec. This is straightforward for x86 and ppc, just s390
would require more work. Avoid this for now by pushing the return code
translation logic into s390's kvm_arch_handle_exit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Conforming to the Intel spec, set the power-on value of PAT also on
reset, but save it across INIT.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When an external interrupt is pending but IF is cleared, we must not
leave the halt state prematurely.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In Linux kernel HWPoison processing implementation, the virtual
address in processes mapping the error physical memory page is marked
as HWPoison. So that, the further accessing to the virtual
address will kill corresponding processes with SIGBUS.
If the error physical memory page is used by a KVM guest, the SIGBUS
will be sent to QEMU, and QEMU will simulate a MCE to report that
memory error to the guest OS. If the guest OS can not recover from
the error (for example, the page is accessed by kernel code), guest OS
will reboot the system. But because the underlying host virtual
address backing the guest physical memory is still poisoned, if the
guest system accesses the corresponding guest physical memory even
after rebooting, the SIGBUS will still be sent to QEMU and MCE will be
simulated. That is, guest system can not recover via rebooting.
In fact, across rebooting, the contents of guest physical memory page
need not to be kept. We can allocate a new host physical page to
back the corresponding guest physical address.
This patch fixes this issue in QEMU-KVM via calling qemu_ram_remap()
to clear the corresponding page table entry, so that make it possible
to allocate a new page to recover the issue.
[ Jan: rebasing and tiny cleanups]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
There is no reason to continue if the kernel claims to support MCE but
then fails to process our request.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
There is nothing to abstract here. Fold kvm_setup_mce into its caller
and fix up the error reporting (return code of kvm_vcpu_ioctl holds the
error value).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This switches KVM's MCE injection path to cpu_x86_inject_mce, both for
SIGBUS and monitor initiated events. This means we prepare the MCA MSRs
in the VCPUState also for KVM.
We have to drop the MSRs writeback restrictions for this purpose which
is now safe as every uncoordinated MSR injection is removed with this
patch.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We will use the current TCG-only MCE injection path for KVM as well, and
then this read-modify-write of the target VCPU state has to be performed
synchronously in the corresponding thread.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The current way of injecting MCE events without updating of and
synchronizing with the CPUState is broken and causes spurious
corruptions of the MCE-related parts of the CPUState.
As a first step towards a fix, enhance the state writeback code with
support for injecting events that are pending in the CPUState. A pending
exception will then be signaled via cpu_interrupt(CPU_INTERRUPT_MCE).
And, just like for TCG, we need to leave the halt state when
CPU_INTERRUPT_MCE is pending (left broken for the to-be-removed old KVM
code).
This will also allow to unify TCG and KVM injection code.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We will broaden the scope of this function on x86 beyond irqchip events.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Pure function suffling to avoid multiple #ifdef KVM_CAP_MCE sections,
no functional changes. While at it, annotate some #ifdef sections.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Allow to tell cpu_x86_inject_mce that it should ignore Action Optional
MCE events when the target VCPU is still processing another one. This
will be used by KVM soon.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
As this service is used by the human monitor, make sure that errors get
reported to the right channel, and also raise the verbosity.
This requires to move Monitor typedef in qemu-common.h to resolve the
include dependency.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
MCEs can be injected asynchronously, so they can also terminate the halt
state.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Huang Ying <ying.huang@intel.com>
CC: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
CC: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
All implementations are now the same, and there is only one caller,
so inline the function there.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
We have qemu_cpu_self and qemu_thread_self. The latter is retrieving the
current thread, the former is checking for equality (using CPUState). We
also have qemu_thread_equal which is only used like qemu_cpu_self.
This refactors the interfaces, creating qemu_cpu_is_self and
qemu_thread_is_self as well ass qemu_thread_get_self.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
We do not need to abort, but the user should be notified that weird
things go on.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We do not check them, and the only arch with non-empty implementations
always returns 0 (this is also true for qemu-kvm).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Effectively no functional change yet as kvm_irqchip_in_kernel still only
returns 0, but this patch will allow qemu-kvm to adopt the VCPU loop of
upsteam KVM.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Mixing up TCG bits with KVM already led to problems around eflags
emulation on x86. Moreover, quite some code that TCG requires on cpu
enty/exit is useless for KVM. So dispatch between tcg_cpu_exec and
kvm_cpu_exec as early as possible.
The core logic of cpu_halted from cpu_exec is added to
kvm_arch_process_irqchip_events. Moving away from cpu_exec makes
exception_index meaningless for KVM, we can simply pass the exit reason
directly (only "EXCP_DEBUG vs. rest" is relevant).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If the machine is stopped, we should not record two different tsc values
upon a save operation. The same problem happens with kvmclock.
But kvmclock is taking a different diretion, being now seen as a separate
device. Since this is unlikely to happen with the tsc, I am taking the
approach here of simply registering a handler for state change, and
using a per-CPUState variable that prevents double updates for the TSC.
Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
KVM requires to reenter the kernel after IO exits in order to complete
instruction emulation. Failing to do so will leave the kernel state
inconsistently behind. To ensure that we will get back ASAP, we issue a
self-signal that will cause KVM_RUN to return once the pending
operations are completed.
We can move kvm_arch_process_irqchip_events out of the inner VCPU loop.
The only state that mattered at its old place was a pending INIT
request. Catch it in kvm_arch_pre_run and also trigger a self-signal to
process the request on next kvm_cpu_exec.
This patch also fixes the missing exit_request check in kvm_cpu_exec in
the CONFIG_IOTHREAD case.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Provide arch-independent kvm_on_sigbus* stubs to remove the #ifdef'ery
from cpus.c. This patch also fixes --disable-kvm build by providing the
missing kvm_on_sigbus_vcpu kvm-stub.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When broadcasting MCEs, we need to set MCIP and RIPV in mcg_status like
it is done for KVM. Use the symbolic constants at this chance.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
If the kernel does not support KVM_CAP_ASYNC_PF, it also does not know
about the related MSR. So skip it during state synchronization in that
case. Fixes annoying kernel warnings.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
All CPUX86State variables before CPU_COMMON are automatically cleared on
reset. Reorder nmi_injected and nmi_pending to avoid having to touch
them explicitly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In order to support loading BIOSes > 256K, reorder the code, adjusting
the base if the kernel supports moving the identity map.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of splattering the code with #ifdefs and runtime checks for
capabilities we cannot work without anyway, provide central test
infrastructure for verifying their availability both at build and
runtime.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If we lack kvm_para.h, MSR_KVM_ASYNC_PF_EN is not defined. The change in
kvm_arch_init_vcpu is just for consistency reasons.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Make sure to write the cleared MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
and MSR_KVM_ASYNC_PF_EN to the kernel state so that a freshly booted
guest cannot be disturbed by old values.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Simplify kvm_has_msr_star/hsave_pa to booleans and push their one-time
initialization into kvm_arch_init. Also handle potential errors of that
setup procedure.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
For unknown reasons, xcr0 reset ended up in kvm_arch_update_guest_debug
on upstream merge. Fix this and also remove the misleading comment (1 is
THE reset value).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
kvm_arch_reset_vcpu initializes mp_state, and that function is invoked
right after kvm_arch_init_vcpu.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This code path will not yet be taken as we still lack in-kernel irqchip
support. But qemu-kvm can already make use of it and drop its own
mp_state access services.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The ordering doesn't matter in this case, but better keep it consistent.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Introduce the cpu_dump_state flag CPU_DUMP_CODE and implement it for
x86. This writes out the code bytes around the current instruction
pointer. Make use of this feature in KVM to help debugging fatal vm
exits.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Report KVM_EXIT_UNKNOWN, KVM_EXIT_FAIL_ENTRY, and KVM_EXIT_EXCEPTION
with more details to stderr. The latter two are so far x86-only, so move
them into the arch-specific handler. Integrate the Intel real mode
warning on KVM_EXIT_FAIL_ENTRY that qemu-kvm carries, but actually
restrict it to Intel CPUs. Moreover, always dump the CPU state in case
we fail.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Ensure that we stop the guest whenever we face a fatal or unknown exit
reason. If we stop, we also have to enforce a cpu loop exit.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This exit only triggers activity in the common exit path, but we should
accept it in order to be able to detect unknown exit types.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This unbreaks guest debugging when the 4th hardware breakpoint used for
guest debugging is a watchpoint of 4 or 8 byte lenght. The 31st bit of
DR7 is set in that case and used to cause a sign extension to the high
word which was breaking the guest state (vm entry failure).
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>