Instead of masking the KVM feature bits very late (while building the
KVM_SET_CPUID2 data), mask it out on env->cpuid_kvm_features, at the
same point where the other feature words are masked out.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This is necessary so that x2apic is not improperly enabled when the
in-kernel irqchip is disabled.
This won't generate a warning with "-cpu ...,check" because the current
check/enforce code is broken (it checks the host CPU data directly,
instead of using kvm_arch_get_supported_cpuid()), but it will be
eventually fixed to properly report the missing x2apic flag.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This moves the CPUID_EXT_TSC_DEADLINE_TIMER CPUID flag hacking from
kvm_arch_init_vcpu() to kvm_arch_get_supported_cpuid().
Full git grep for kvm_arch_get_supported_cpuid:
kvm.h:uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
target-i386/cpu.c: x86_cpu_def->cpuid_7_0_ebx_features = kvm_arch_get_supported_cpuid(kvm_state, 0x7, 0, R_EBX);
target-i386/cpu.c: *eax = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EAX);
target-i386/cpu.c: *ebx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EBX);
target-i386/cpu.c: *ecx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_ECX);
target-i386/cpu.c: *edx = kvm_arch_get_supported_cpuid(s, 0xA, count, R_EDX);
target-i386/cpu.c: *eax = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EAX);
target-i386/cpu.c: *ebx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EBX);
target-i386/cpu.c: *ecx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_ECX);
target-i386/cpu.c: *edx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EDX);
target-i386/kvm.c:uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
target-i386/kvm.c: cpuid_1_edx = kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
target-i386/kvm.c: env->cpuid_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_EDX);
* target-i386/kvm.c: env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
target-i386/kvm.c: env->cpuid_ext2_features &= kvm_arch_get_supported_cpuid(s, 0x80000001,
target-i386/kvm.c: env->cpuid_ext3_features &= kvm_arch_get_supported_cpuid(s, 0x80000001,
target-i386/kvm.c: env->cpuid_svm_features &= kvm_arch_get_supported_cpuid(s, 0x8000000A,
target-i386/kvm.c: kvm_arch_get_supported_cpuid(s, KVM_CPUID_FEATURES, 0, R_EAX);
target-i386/kvm.c: kvm_arch_get_supported_cpuid(s, 0xC0000001, 0, R_EDX);
Note that there is only one call for CPUID[1].ECX above (*), and it is
the one that gets hacked to include CPUID_EXT_TSC_DEADLINE_TIMER, so we
can simply make kvm_arch_get_supported_cpuid() set it, to let the rest
of the code know the flag can be safely set by QEMU.
One thing I was worrying about when doing this is that now
kvm_arch_get_supported_cpuid() depends on kvm_irqchip_in_kernel(). But
the 'kvm_kernel_irqchip' global variable is initialized during
kvm_init(), that is called very early, and kvm_init() is already a
requirement to run the GET_SUPPORTED_CPUID ioctl() (as kvm_init() is the
function that initializes the 'kvm_state' global variable).
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Additional fixups will be added, and making them a single 'if/else if'
chain makes it clearer than two nested switch statements.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The reg switch will be moved to a separate function, so store the entry
pointer in a variable.
No behavior change, just code movement.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of a function-specific has_kvm_features variable, simply use a
"found" variable that will be checked in case we have to use the legacy
get_para_features() interface.
No behavior change, just code cleanup.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The for loop will become a separate function, so clean it up so it can
become independent from the bit hacking for R_EDX.
No behavior change[1], just code movement.
[1] Well, only if the kernel returned CPUID leafs 1 or 0x80000001 as
unsupported, but there's no kernel version that does that.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
target_phys_addr_t is unwieldly, violates the C standard (_t suffixes are
reserved) and its purpose doesn't match the name (most target_phys_addr_t
addresses are not target specific). Replace it with a finger-friendly,
standards conformant hwaddr.
Outstanding patchsets can be fixed up with the command
git rebase -i --exec 'find -name "*.[ch]"
| xargs s/target_phys_addr_t/hwaddr/g' origin
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Instea of using a hardcoded hex constant, define CPUID_EXT2_AMD_ALIASES
as the set of CPUID[8000_0001].EDX bits that on AMD are the same as the
bits of CPUID[1].EDX.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-By: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Don Slutz <Don@CloudSwitch.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Bit 10 of CPUID[8000_0001].EDX is not defined as an alias of
CPUID[1].EDX[10], so do not duplicate it on
kvm_arch_get_supported_cpuid().
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-By: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Don Slutz <Don@CloudSwitch.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
These helpers abstract the interaction of upcoming pci-assign with the
KVM kernel services. Put them under i386 only as other archs will
implement device pass-through via VFIO and not this classic interface.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Support get/set of new PV EOI MSR, for migration.
Add an optional section for MSR value - send it
out in case MSR was changed from the default value (0).
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Don't assume having an in-kernel irqchip means that GSI
routing is enabled.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Decouple another x86-specific assumption about what irqchips imply.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of assuming that we can use irqfds if and only if
kvm_irqchip_in_kernel(), add a bool to the KVMState which
indicates this, and is set only on x86 and only if the
irqchip is in the kernel.
The kernel documentation implies that the only thing
you need to use KVM_IRQFD is that KVM_CAP_IRQFD is
advertised, but this seems to be untrue. In particular
the kernel does not (alas) return a sensible error if you
try to set up an irqfd when you haven't created an irqchip.
If it did we could remove all this nonsense and let the
kernel return the error code.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
kvm_allows_irq0_override() is a totally x86 specific concept:
move it to the target-specific source file where it belongs.
This means we need a new header file for the prototype:
kvm_i386.h, in line with the existing kvm_ppc.h.
While we are moving it, fix the return type to be 'bool' rather
than 'int'.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
MP initialization protocol differs between cpu families, and for P6 and
onward models it is up to CPU to decide if it will be BSP using this
protocol, so try to model this. However there is no point in implementing
MP initialization protocol in qemu. Thus first CPU is always marked as BSP.
This patch:
- moves decision to designate BSP from board into cpu, making cpu
self-sufficient in this regard. Later it will allow to cleanup hw/pc.c
and remove cpu_reset and wrappers from there.
- stores flag that CPU is BSP in IA32_APIC_BASE to model behavior
described in Inted SDM vol 3a part 1 chapter 8.4.1
- uses MSR_IA32_APICBASE_BSP flag in apic_base for checking if cpu is BSP
patch is based on Jan Kiszka's proposal:
http://thread.gmane.org/gmane.comp.emulators.qemu/100806
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
KVM performs TPR raising asynchronously to QEMU, specifically outside
QEMU's global lock. When an interrupt is injected into the APIC and TPR
is checked to decide if this can be delivered, a stale TPR value may be
used, causing spurious interrupts in the end.
Fix this by deferring apic_update_irq to the context of the target VCPU.
We introduce a new interrupt flag for this, CPU_INTERRUPT_POLL. When it
is set, the VCPU calls apic_poll_irq before checking for further pending
interrupts. To avoid special-casing KVM, we also implement this logic
for TCG mode.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch exposes tsc deadline timer feature to guest if
1). in-kernel irqchip is used, and
2). kvm has emulated tsc deadline timer, and
3). user authorize the feature exposing via -cpu or +/- tsc-deadline
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Allows to use cpu_reset() in place of cpu_state_reset().
Signed-off-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Scripted conversion:
sed -i "s/CPUState/CPUX86State/g" target-i386/*.[hc]
sed -i "s/#define CPUX86State/#define CPUState/" target-i386/cpu.h
Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Anthony Liguori <aliguori@us.ibm.com>
valgrind warns about padding fields which are passed
to vcpu ioctls uninitialized.
This is not an error in practice because kvm ignored padding.
Since the ioctls in question are off data path and
the cost is zero anyway, initialize padding to 0
to suppress these errors.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This will allow the APIC core to file a TPR access report. Depending on
the accelerator and kernel irqchip mode, it will either be delivered
right away or queued for later reporting.
In TCG mode, we can restart the triggering instruction and can therefore
forward the event directly. KVM does not allows us to restart, so we
postpone the delivery of events recording in the user space APIC until
the current instruction is completed.
Note that KVM without in-kernel irqchip will report the address after
the instruction that triggered the access.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Call to kvm_cpu_synchronize_state() is missing.
kvm_arch_stop_on_emulation_error may look at outdated registers here.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com>
To both avoid that kvm_irqchip_in_kernel always has to be paired with
kvm_enabled and that the former ends up in a function call, implement it
like the latter. This means keeping the state in a global variable and
defining kvm_irqchip_in_kernel as a preprocessor macro.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Introduce the KVM-specific machine option kvm_shadow_mem. It allows to
set a custom shadow MMU size for the virtual machine. This is useful for
stress testing e.g.
Only x86 supports this for now, but it is in principle a generic
concept for all targets with shadow MMUs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This introduces the alternative APIC device which makes use of KVM's
in-kernel device model. External NMI injection via LINT1 is emulated by
checking the current state of the in-kernel APIC, only injecting a NMI
into the VCPU if LINT1 is unmasked and configured to DM_NMI.
MSI is not yet supported, so we disable this when the in-kernel model is
in use.
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Add the basic infrastructure to active in-kernel irqchip support, inject
interrupts into these models, and maintain IRQ routes.
Routing is optional and depends on the host arch supporting
KVM_CAP_IRQ_ROUTING. When it's not available on x86, we looe the HPET as
we can't route GSI0 to IOAPIC pin 2.
In-kernel irqchip support will once be controlled by the machine
property 'kernel_irqchip', but this is not yet wired up.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
* qemu-kvm/memory/page_desc: (22 commits)
Remove cpu_get_physical_page_desc()
sparc: avoid cpu_get_physical_page_desc()
virtio-balloon: avoid cpu_get_physical_page_desc()
vhost: avoid cpu_get_physical_page_desc()
kvm: avoid cpu_get_physical_page_desc()
memory: remove CPUPhysMemoryClient
xen: convert to MemoryListener API
memory: temporarily add memory_region_get_ram_addr()
xen, vga: add API for registering the framebuffer
vhost: convert to MemoryListener API
kvm: convert to MemoryListener API
kvm: switch kvm slots to use host virtual address instead of ram_addr_t
memory: add API for observing updates to the physical memory map
memory: replace cpu_physical_sync_dirty_bitmap() with a memory API
framebuffer: drop use of cpu_physical_sync_dirty_bitmap()
loader: remove calls to cpu_get_physical_page_desc()
framebuffer: drop use of cpu_get_physical_page_desc()
memory: introduce memory_region_find()
memory: add memory_region_is_logging()
memory: add memory_region_is_rom()
...
The latter was already commented out, the former is redundant as well.
We always get the latest changes after return from the guest via
kvm_arch_post_run.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Keep a per-VCPU xsave buffer for kvm_put/get_xsave instead of
continuously allocating and freeing it on state sync.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Field 0 (FCW+FSW) and 1 (FTW+FOP) were hard-coded so far.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
One n too many for running, need we say more.
Signed-Off-By: Vagrant Cascadian <vagrant@freegeek.org>
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
It's needed for its default value - bit 0 specifies that "rep movs" is
good enough for memcpy, and Linux may use a slower memcpu if it is not set,
depending on cpu family/model.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
KVM add emulation of lapic tsc deadline timer for guest.
This patch is co-operation work at qemu side.
Use subsections to save/restore the field (mtosatti).
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
KVM add emulation of lapic tsc deadline timer for guest.
This patch is co-operation work at qemu side.
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Today, when notifying a VM state change with vm_state_notify(),
we pass a VMSTOP macro as the 'reason' argument. This is not ideal
because the VMSTOP macros tell why qemu stopped and not exactly
what the current VM state is.
One example to demonstrate this problem is that vm_start() calls
vm_state_notify() with reason=0, which turns out to be VMSTOP_USER.
This commit fixes that by replacing the VMSTOP macros with a proper
state type called RunState.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Avoid these warnings from clang analyzer:
/src/qemu/target-i386/kvm.c:772:5: warning: Value stored to 'cwd' is never read
cwd = swd = twd = 0;
/src/qemu/target-i386/kvm.c:772:11: warning: Although the value stored to 'swd' is used in the enclosing expression, the value is never actually read from 'swd'
cwd = swd = twd = 0;
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Most changes were made using these commands:
git grep -la '__attribute__((packed))'|xargs perl -pi -e 's/__attribute__\(\(packed\)\)/QEMU_PACKED/'
git grep -la '__attribute__ ((packed))'|xargs perl -pi -e 's/__attribute__ \(\(packed\)\)/QEMU_PACKED/'
git grep -la '__attribute__((__packed__))'|xargs perl -pi -e 's/__attribute__\(\(__packed__\)\)/QEMU_PACKED/'
git grep -la '__attribute__ ((__packed__))'|xargs perl -pi -e 's/__attribute__ \(\(__packed__\)\)/QEMU_PACKED/'
git grep -la '__attribute((packed))'|xargs perl -pi -e 's/__attribute\(\(packed\)\)/QEMU_PACKED/'
Whitespace in linux-user/syscall_defs.h was fixed manually
to avoid warnings from scripts/checkpatch.pl.
Manual changes were also applied to hw/pc.c.
I did not fix indentation with tabs in block/vvfat.c.
The patch will show 4 errors with scripts/checkpatch.pl.
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>