This was a long pending bug, now revealed by the assert in
phys_page_find that stumbled over the large page index returned by
cpu_get_phys_page_debug for NX-marked pages: We need to mask out NX and
all user-definable bits 52..62 from PDEs and the final PTE to avoid
corrupting physical addresses.
Reviewed-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
valgrind warns about padding fields which are passed
to vcpu ioctls uninitialized.
This is not an error in practice because kvm ignored padding.
Since the ioctls in question are off data path and
the cost is zero anyway, initialize padding to 0
to suppress these errors.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
* 'upstream' of git://qemu.weilnetz.de/qemu:
Move definition of HOST_LONG_BITS to qemu-common.h
target-xtensa: Clean includes
target-unicore32: Clean includes
target-sh4: Clean includes
target-s390x: Clean includes
target-ppc: Clean includes
target-mips: Clean includes
target-microblaze: Clean includes
target-m68k: Clean includes
target-lm32: Clean includes
target-i386: Clean includes
target-cris: Clean includes
target-arm: Clean includes
target-alpha: Clean includes
Remove macro HOST_LONG_SIZE
* qemu-kvm/uq/master:
pc-bios: update kvmvapic.bin
kvmvapic: Use optionrom helpers
optionsrom: Reserve space for checksum
kvmvapic: Simplify mp/up_set_tpr
kvmvapic: Introduce TPR access optimization for Windows guests
kvmvapic: Add option ROM
target-i386: Add infrastructure for reporting TPR MMIO accesses
Allow to use pause_all_vcpus from VCPU context
Process pending work while waiting for initial kick-off in TCG mode
Remove useless casts from cpu iterators
kvm: Set cpu_single_env only once
kvm: Synchronize cpu state in kvm_arch_stop_on_emulation_error()
Move the logic to transform the 48-char model ID into the 12-word model
value into a helper.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Move the logic for setting the stepping field into a helper function.
To make the function self-contained and to prepare for future
unordered/multiple uses, mask out any previous stepping values first.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Move the logic for setting the model and extended model fields
into a helper function.
To make the function self-contained and to prepare for future
unordered/multiple uses, mask out any previous model values first.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Move the logic for setting the family and extended family into a
helper function.
To make the helper self-contained and in preparation of future
unordered/multiple uses, mask out any previous family values first.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Use 'i64' instead of 'lm' and 'xd' instead of 'nx' on Intel models.
The flags have different names on Intel docs, so use those names for clarity.
This is based on a previous patch from John Cooper where this was introduced
with many other changes at the same time. Original John's patch submission is
at Message-ID: <4DDAD5E7.2020002@redhat.com>, <http://marc.info/?l=qemu-devel&m=130618871926030>.
Changes v1 -> v2:
- Rebase patch against latest Qemu git tree
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
pclmulqdq: /proc/cpuinfo on Linux and all documentation I have seen uses
pclmulqdq as the flag name. As the only document using pclmuldq seems to
be the Intel CPUID documentation (Application Note 485), it looks like a
typo and not the correct name for the flag.
ffxsr: AMD docs refer to fxsr_opt as ffxsr, so allow this named to be
used too.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This will allow the APIC core to file a TPR access report. Depending on
the accelerator and kernel irqchip mode, it will either be delivered
right away or queued for later reporting.
In TCG mode, we can restart the triggering instruction and can therefore
forward the event directly. KVM does not allows us to restart, so we
postpone the delivery of events recording in the user space APIC until
the current instruction is completed.
Note that KVM without in-kernel irqchip will report the address after
the instruction that triggered the access.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Call to kvm_cpu_synchronize_state() is missing.
kvm_arch_stop_on_emulation_error may look at outdated registers here.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
To both avoid that kvm_irqchip_in_kernel always has to be paired with
kvm_enabled and that the former ends up in a function call, implement it
like the latter. This means keeping the state in a global variable and
defining kvm_irqchip_in_kernel as a preprocessor macro.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Introduce the KVM-specific machine option kvm_shadow_mem. It allows to
set a custom shadow MMU size for the virtual machine. This is useful for
stress testing e.g.
Only x86 supports this for now, but it is in principle a generic
concept for all targets with shadow MMUs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This introduces the alternative APIC device which makes use of KVM's
in-kernel device model. External NMI injection via LINT1 is emulated by
checking the current state of the in-kernel APIC, only injecting a NMI
into the VCPU if LINT1 is unmasked and configured to DM_NMI.
MSI is not yet supported, so we disable this when the in-kernel model is
in use.
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Add the basic infrastructure to active in-kernel irqchip support, inject
interrupts into these models, and maintain IRQ routes.
Routing is optional and depends on the host arch supporting
KVM_CAP_IRQ_ROUTING. When it's not available on x86, we looe the HPET as
we can't route GSI0 to IOAPIC pin 2.
In-kernel irqchip support will once be controlled by the machine
property 'kernel_irqchip', but this is not yet wired up.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Commit 2355c16e74 introduced a new ldmxcsr
helper taking an i32 argument, but the helper is actually passed a long.
Fix that by truncating the long to i32.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
SSE rounding and flush to zero control has never been implemented. However
given that softfloat-native was using a single state for FPU and SSE and
given that glibc is setting both FPU and SSE state in fesetround(), this
was working correctly up to the switch to softfloat.
Fix that by adding an update_sse_status() function similar to
update_fpu_status(), and callin git on write to mxcsr.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
The helpers implemented dpps and dppd SSE instructions are not passing
the correct argument types to the softfloat functions. While they do
work anyway providing a correct behaviour, this patch fixes that.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
roundps and roundss SSE2 instructions have been broken when switching
target-i386 to softfloat. They use float64_round_to_int to convert a
float32, and while the implicit conversion from float32 to float64 was
correct for softfloat-native, it is not for pure softfloat. Fix that by
using the correct registers and correct functions.
Also fix roundpd and roundsd implementation at the same time, even if
these functions are behaving correctly.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
minpd, minps, minsd, minss and maxpd, maxps, maxsd, maxss SSE2
instructions have been broken when switching target-i386 to softfloat.
It's not possible to use comparison instructions on float types anymore
to softfloat, so use the floatXX_lt function instead, as the
float_XX_min and float_XX_max functions can't be used due to the Intel
specific behaviour.
As it implements the correct NaNs behaviour, let's remove the
corresponding entry from the TODO.
It fixes GDM screen display on Debian Lenny.
Thanks to Peter Maydell and Jason Wessel for their analysis of the
problem.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* qemu-kvm/memory/page_desc: (22 commits)
Remove cpu_get_physical_page_desc()
sparc: avoid cpu_get_physical_page_desc()
virtio-balloon: avoid cpu_get_physical_page_desc()
vhost: avoid cpu_get_physical_page_desc()
kvm: avoid cpu_get_physical_page_desc()
memory: remove CPUPhysMemoryClient
xen: convert to MemoryListener API
memory: temporarily add memory_region_get_ram_addr()
xen, vga: add API for registering the framebuffer
vhost: convert to MemoryListener API
kvm: convert to MemoryListener API
kvm: switch kvm slots to use host virtual address instead of ram_addr_t
memory: add API for observing updates to the physical memory map
memory: replace cpu_physical_sync_dirty_bitmap() with a memory API
framebuffer: drop use of cpu_physical_sync_dirty_bitmap()
loader: remove calls to cpu_get_physical_page_desc()
framebuffer: drop use of cpu_get_physical_page_desc()
memory: introduce memory_region_find()
memory: add memory_region_is_logging()
memory: add memory_region_is_rom()
...
The latter was already commented out, the former is redundant as well.
We always get the latest changes after return from the guest via
kvm_arch_post_run.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Keep a per-VCPU xsave buffer for kvm_put/get_xsave instead of
continuously allocating and freeing it on state sync.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Field 0 (FCW+FSW) and 1 (FTW+FOP) were hard-coded so far.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When the i386 cmpxchg instruction is executed with a memory operand
and the comparison result is "unequal", do the memory write before
changing the accumulator instead of the other way around, because
otherwise the new accumulator value will incorrectly be used in the
comparison when the instruction is restarted after a page fault.
This bug was originally reported on 2010-04-25 as
https://bugs.launchpad.net/qemu/+bug/569760
Signed-off-by: Andreas Gustafsson <gson@gson.org>
cpu_x86_find_by_name() uses strtosz_suffix_unit(), but screws up the
error checking. It detects some failures, but not all. Undetected
failures result in a zero tsc_khz value (error value -1 divided by
1000), which means "no tsc_freq set".
To reproduce, try "-cpu qemu64,tsc_freq=9999999T".
strtosz_suffix_unit() fails, because the value overflows int64_t,
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This reverts commit 66e3dd9282.
From Avi,
"Anthony, I think we should revert that commit and refactor cpuid for
1.1. The logic is spread over too many places which makes it hard to
reason about."
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Fix obvious typos (decrement and off-by-one error) in pcmpestrm and pcmpistrm
which resulted in infinite loop. Reported by Frank Mehnert,
spotted also by Coverity (bug 84752853).
Reported-by: Frank Mehnert <frank.mehnert@oracle.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
One n too many for running, need we say more.
Signed-Off-By: Vagrant Cascadian <vagrant@freegeek.org>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
To reproduce the leak, put two name options into the same [cpudef]
section of target-x86_64.conf.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
The fact that a host cpu supports a feature doesn't mean that QEMU and KVM
will also support it, yet -cpuid host brings host features wholesale.
We need to whitelist each feature separately to make sure we support it.
This patch adds KVM whitelisting (by simply using KVM_GET_SUPPORTED_CPUID
instead of the CPUID instruction).
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
apic id returned to guest kernel in ebx for cpuid(function=1) depends on
CPUX86State->cpuid_apic_id which gets populated after the cpuid information
is cached in the host kernel. This results in broken CPU topology in guest.
Fix this by setting cpuid_apic_id before cpuid information is passed to
the host kernel. This is done by moving the setting of cpuid_apic_id
to cpu_x86_init() where it will work for both KVM as well as TCG modes.
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Bharata B Rao <bharata.rao@gmail.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
It's needed for its default value - bit 0 specifies that "rep movs" is
good enough for memcpy, and Linux may use a slower memcpu if it is not set,
depending on cpu family/model.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
KVM add emulation of lapic tsc deadline timer for guest.
This patch is co-operation work at qemu side.
Use subsections to save/restore the field (mtosatti).
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
KVM add emulation of lapic tsc deadline timer for guest.
This patch is co-operation work at qemu side.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
T0 was already masked to 16 bits when loading it.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Remove also two assert statements which were the last remaining users.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>