2008-04-16 23:28:09 -05:00
|
|
|
/*
|
|
|
|
* This program is free software; you can redistribute it and/or modify
|
|
|
|
* it under the terms of the GNU General Public License, version 2, as
|
|
|
|
* published by the Free Software Foundation.
|
|
|
|
*
|
|
|
|
* This program is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
|
|
* GNU General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU General Public License
|
|
|
|
* along with this program; if not, write to the Free Software
|
|
|
|
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
|
|
|
|
*
|
|
|
|
* Copyright IBM Corp. 2007
|
|
|
|
*
|
|
|
|
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
|
|
|
|
*/
|
|
|
|
|
2008-04-02 13:04:40 -07:00
|
|
|
#ifndef __LINUX_KVM_POWERPC_H
|
|
|
|
#define __LINUX_KVM_POWERPC_H
|
|
|
|
|
2009-01-31 11:44:45 +05:30
|
|
|
#include <linux/types.h>
|
2008-04-16 23:28:09 -05:00
|
|
|
|
2011-06-29 00:22:41 +00:00
|
|
|
/* Select powerpc specific features in <linux/kvm.h> */
|
|
|
|
#define __KVM_HAVE_SPAPR_TCE
|
KVM: PPC: Allow book3s_hv guests to use SMT processor modes
This lifts the restriction that book3s_hv guests can only run one
hardware thread per core, and allows them to use up to 4 threads
per core on POWER7. The host still has to run single-threaded.
This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
capability. The return value of the ioctl querying this capability
is the number of vcpus per virtual CPU core (vcore), currently 4.
To use this, the host kernel should be booted with all threads
active, and then all the secondary threads should be offlined.
This will put the secondary threads into nap mode. KVM will then
wake them from nap mode and use them for running guest code (while
they are still offline). To wake the secondary threads, we send
them an IPI using a new xics_wake_cpu() function, implemented in
arch/powerpc/sysdev/xics/icp-native.c. In other words, at this stage
we assume that the platform has a XICS interrupt controller and
we are using icp-native.c to drive it. Since the woken thread will
need to acknowledge and clear the IPI, we also export the base
physical address of the XICS registers using kvmppc_set_xics_phys()
for use in the low-level KVM book3s code.
When a vcpu is created, it is assigned to a virtual CPU core.
The vcore number is obtained by dividing the vcpu number by the
number of threads per core in the host. This number is exported
to userspace via the KVM_CAP_PPC_SMT capability. If qemu wishes
to run the guest in single-threaded mode, it should make all vcpu
numbers be multiples of the number of threads per core.
We distinguish three states of a vcpu: runnable (i.e., ready to execute
the guest), blocked (that is, idle), and busy in host. We currently
implement a policy that the vcore can run only when all its threads
are runnable or blocked. This way, if a vcpu needs to execute elsewhere
in the kernel or in qemu, it can do so without being starved of CPU
by the other vcpus.
When a vcore starts to run, it executes in the context of one of the
vcpu threads. The other vcpu threads all go to sleep and stay asleep
until something happens requiring the vcpu thread to return to qemu,
or to wake up to run the vcore (this can happen when another vcpu
thread goes from busy in host state to blocked).
It can happen that a vcpu goes from blocked to runnable state (e.g.
because of an interrupt), and the vcore it belongs to is already
running. In that case it can start to run immediately as long as
the none of the vcpus in the vcore have started to exit the guest.
We send the next free thread in the vcore an IPI to get it to start
to execute the guest. It synchronizes with the other threads via
the vcore->entry_exit_count field to make sure that it doesn't go
into the guest if the other vcpus are exiting by the time that it
is ready to actually enter the guest.
Note that there is no fixed relationship between the hardware thread
number and the vcpu number. Hardware threads are assigned to vcpus
as they become runnable, so we will always use the lower-numbered
hardware threads in preference to higher-numbered threads if not all
the vcpus in the vcore are runnable, regardless of which vcpus are
runnable.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 00:23:08 +00:00
|
|
|
#define __KVM_HAVE_PPC_SMT
|
2011-06-29 00:22:41 +00:00
|
|
|
|
2008-04-16 23:28:09 -05:00
|
|
|
struct kvm_regs {
|
|
|
|
__u64 pc;
|
|
|
|
__u64 cr;
|
|
|
|
__u64 ctr;
|
|
|
|
__u64 lr;
|
|
|
|
__u64 xer;
|
|
|
|
__u64 msr;
|
|
|
|
__u64 srr0;
|
|
|
|
__u64 srr1;
|
|
|
|
__u64 pid;
|
|
|
|
|
|
|
|
__u64 sprg0;
|
|
|
|
__u64 sprg1;
|
|
|
|
__u64 sprg2;
|
|
|
|
__u64 sprg3;
|
|
|
|
__u64 sprg4;
|
|
|
|
__u64 sprg5;
|
|
|
|
__u64 sprg6;
|
|
|
|
__u64 sprg7;
|
|
|
|
|
|
|
|
__u64 gpr[32];
|
|
|
|
};
|
|
|
|
|
2011-04-27 17:24:21 -05:00
|
|
|
#define KVM_SREGS_E_IMPL_NONE 0
|
|
|
|
#define KVM_SREGS_E_IMPL_FSL 1
|
|
|
|
|
|
|
|
#define KVM_SREGS_E_FSL_PIDn (1 << 0) /* PID1/PID2 */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Feature bits indicate which sections of the sregs struct are valid,
|
|
|
|
* both in KVM_GET_SREGS and KVM_SET_SREGS. On KVM_SET_SREGS, registers
|
|
|
|
* corresponding to unset feature bits will not be modified. This allows
|
|
|
|
* restoring a checkpoint made without that feature, while keeping the
|
|
|
|
* default values of the new registers.
|
|
|
|
*
|
|
|
|
* KVM_SREGS_E_BASE contains:
|
|
|
|
* CSRR0/1 (refers to SRR2/3 on 40x)
|
|
|
|
* ESR
|
|
|
|
* DEAR
|
|
|
|
* MCSR
|
|
|
|
* TSR
|
|
|
|
* TCR
|
|
|
|
* DEC
|
|
|
|
* TB
|
|
|
|
* VRSAVE (USPRG0)
|
|
|
|
*/
|
|
|
|
#define KVM_SREGS_E_BASE (1 << 0)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* KVM_SREGS_E_ARCH206 contains:
|
|
|
|
*
|
|
|
|
* PIR
|
|
|
|
* MCSRR0/1
|
|
|
|
* DECAR
|
|
|
|
* IVPR
|
|
|
|
*/
|
|
|
|
#define KVM_SREGS_E_ARCH206 (1 << 1)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Contains EPCR, plus the upper half of 64-bit registers
|
|
|
|
* that are 32-bit on 32-bit implementations.
|
|
|
|
*/
|
|
|
|
#define KVM_SREGS_E_64 (1 << 2)
|
|
|
|
|
|
|
|
#define KVM_SREGS_E_SPRG8 (1 << 3)
|
|
|
|
#define KVM_SREGS_E_MCIVPR (1 << 4)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* IVORs are used -- contains IVOR0-15, plus additional IVORs
|
|
|
|
* in combination with an appropriate feature bit.
|
|
|
|
*/
|
|
|
|
#define KVM_SREGS_E_IVOR (1 << 5)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Contains MAS0-4, MAS6-7, TLBnCFG, MMUCFG.
|
|
|
|
* Also TLBnPS if MMUCFG[MAVN] = 1.
|
|
|
|
*/
|
|
|
|
#define KVM_SREGS_E_ARCH206_MMU (1 << 6)
|
|
|
|
|
|
|
|
/* DBSR, DBCR, IAC, DAC, DVC */
|
|
|
|
#define KVM_SREGS_E_DEBUG (1 << 7)
|
|
|
|
|
|
|
|
/* Enhanced debug -- DSRR0/1, SPRG9 */
|
|
|
|
#define KVM_SREGS_E_ED (1 << 8)
|
|
|
|
|
|
|
|
/* Embedded Floating Point (SPE) -- IVOR32-34 if KVM_SREGS_E_IVOR */
|
|
|
|
#define KVM_SREGS_E_SPE (1 << 9)
|
|
|
|
|
|
|
|
/* External Proxy (EXP) -- EPR */
|
|
|
|
#define KVM_SREGS_EXP (1 << 10)
|
|
|
|
|
|
|
|
/* External PID (E.PD) -- EPSC/EPLC */
|
|
|
|
#define KVM_SREGS_E_PD (1 << 11)
|
|
|
|
|
|
|
|
/* Processor Control (E.PC) -- IVOR36-37 if KVM_SREGS_E_IVOR */
|
|
|
|
#define KVM_SREGS_E_PC (1 << 12)
|
|
|
|
|
|
|
|
/* Page table (E.PT) -- EPTCFG */
|
|
|
|
#define KVM_SREGS_E_PT (1 << 13)
|
|
|
|
|
|
|
|
/* Embedded Performance Monitor (E.PM) -- IVOR35 if KVM_SREGS_E_IVOR */
|
|
|
|
#define KVM_SREGS_E_PM (1 << 14)
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Special updates:
|
|
|
|
*
|
|
|
|
* Some registers may change even while a vcpu is not running.
|
|
|
|
* To avoid losing these changes, by default these registers are
|
|
|
|
* not updated by KVM_SET_SREGS. To force an update, set the bit
|
|
|
|
* in u.e.update_special corresponding to the register to be updated.
|
|
|
|
*
|
|
|
|
* The update_special field is zero on return from KVM_GET_SREGS.
|
|
|
|
*
|
|
|
|
* When restoring a checkpoint, the caller can set update_special
|
|
|
|
* to 0xffffffff to ensure that everything is restored, even new features
|
|
|
|
* that the caller doesn't know about.
|
|
|
|
*/
|
|
|
|
#define KVM_SREGS_E_UPDATE_MCSR (1 << 0)
|
|
|
|
#define KVM_SREGS_E_UPDATE_TSR (1 << 1)
|
|
|
|
#define KVM_SREGS_E_UPDATE_DEC (1 << 2)
|
|
|
|
#define KVM_SREGS_E_UPDATE_DBSR (1 << 3)
|
|
|
|
|
2011-08-08 17:17:09 +02:00
|
|
|
/*
|
|
|
|
* Book3S special bits to indicate contents in the struct by maintaining
|
|
|
|
* backwards compatibility with older structs. If adding a new field,
|
|
|
|
* please make sure to add a flag for that new field */
|
|
|
|
#define KVM_SREGS_S_HIOR (1 << 0)
|
|
|
|
|
2011-04-27 17:24:21 -05:00
|
|
|
/*
|
|
|
|
* In KVM_SET_SREGS, reserved/pad fields must be left untouched from a
|
|
|
|
* previous KVM_GET_REGS.
|
|
|
|
*
|
|
|
|
* Unless otherwise indicated, setting any register with KVM_SET_SREGS
|
|
|
|
* directly sets its value. It does not trigger any special semantics such
|
|
|
|
* as write-one-to-clear. Calling KVM_SET_SREGS on an unmodified struct
|
|
|
|
* just received from KVM_GET_SREGS is always a no-op.
|
|
|
|
*/
|
2008-04-16 23:28:09 -05:00
|
|
|
struct kvm_sregs {
|
2009-10-30 05:47:02 +00:00
|
|
|
__u32 pvr;
|
2009-11-30 03:02:02 +00:00
|
|
|
union {
|
|
|
|
struct {
|
|
|
|
__u64 sdr1;
|
|
|
|
struct {
|
|
|
|
struct {
|
|
|
|
__u64 slbe;
|
|
|
|
__u64 slbv;
|
|
|
|
} slb[64];
|
|
|
|
} ppc64;
|
|
|
|
struct {
|
|
|
|
__u32 sr[16];
|
|
|
|
__u64 ibat[8];
|
|
|
|
__u64 dbat[8];
|
|
|
|
} ppc32;
|
2011-08-08 17:17:09 +02:00
|
|
|
__u64 flags; /* KVM_SREGS_S_ */
|
|
|
|
__u64 hior;
|
2009-11-30 03:02:02 +00:00
|
|
|
} s;
|
2011-04-27 17:24:21 -05:00
|
|
|
struct {
|
|
|
|
union {
|
|
|
|
struct { /* KVM_SREGS_E_IMPL_FSL */
|
|
|
|
__u32 features; /* KVM_SREGS_E_FSL_ */
|
|
|
|
__u32 svr;
|
|
|
|
__u64 mcar;
|
|
|
|
__u32 hid0;
|
|
|
|
|
|
|
|
/* KVM_SREGS_E_FSL_PIDn */
|
|
|
|
__u32 pid1, pid2;
|
|
|
|
} fsl;
|
|
|
|
__u8 pad[256];
|
|
|
|
} impl;
|
|
|
|
|
|
|
|
__u32 features; /* KVM_SREGS_E_ */
|
|
|
|
__u32 impl_id; /* KVM_SREGS_E_IMPL_ */
|
|
|
|
__u32 update_special; /* KVM_SREGS_E_UPDATE_ */
|
|
|
|
__u32 pir; /* read-only */
|
|
|
|
__u64 sprg8;
|
|
|
|
__u64 sprg9; /* E.ED */
|
|
|
|
__u64 csrr0;
|
|
|
|
__u64 dsrr0; /* E.ED */
|
|
|
|
__u64 mcsrr0;
|
|
|
|
__u32 csrr1;
|
|
|
|
__u32 dsrr1; /* E.ED */
|
|
|
|
__u32 mcsrr1;
|
|
|
|
__u32 esr;
|
|
|
|
__u64 dear;
|
|
|
|
__u64 ivpr;
|
|
|
|
__u64 mcivpr;
|
|
|
|
__u64 mcsr; /* KVM_SREGS_E_UPDATE_MCSR */
|
|
|
|
|
|
|
|
__u32 tsr; /* KVM_SREGS_E_UPDATE_TSR */
|
|
|
|
__u32 tcr;
|
|
|
|
__u32 decar;
|
|
|
|
__u32 dec; /* KVM_SREGS_E_UPDATE_DEC */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Userspace can read TB directly, but the
|
|
|
|
* value reported here is consistent with "dec".
|
|
|
|
*
|
|
|
|
* Read-only.
|
|
|
|
*/
|
|
|
|
__u64 tb;
|
|
|
|
|
|
|
|
__u32 dbsr; /* KVM_SREGS_E_UPDATE_DBSR */
|
|
|
|
__u32 dbcr[3];
|
|
|
|
__u32 iac[4];
|
|
|
|
__u32 dac[2];
|
|
|
|
__u32 dvc[2];
|
|
|
|
__u8 num_iac; /* read-only */
|
|
|
|
__u8 num_dac; /* read-only */
|
|
|
|
__u8 num_dvc; /* read-only */
|
|
|
|
__u8 pad;
|
|
|
|
|
|
|
|
__u32 epr; /* EXP */
|
|
|
|
__u32 vrsave; /* a.k.a. USPRG0 */
|
|
|
|
__u32 epcr; /* KVM_SREGS_E_64 */
|
|
|
|
|
|
|
|
__u32 mas0;
|
|
|
|
__u32 mas1;
|
|
|
|
__u64 mas2;
|
|
|
|
__u64 mas7_3;
|
|
|
|
__u32 mas4;
|
|
|
|
__u32 mas6;
|
|
|
|
|
|
|
|
__u32 ivor_low[16]; /* IVOR0-15 */
|
|
|
|
__u32 ivor_high[18]; /* IVOR32+, plus room to expand */
|
|
|
|
|
|
|
|
__u32 mmucfg; /* read-only */
|
|
|
|
__u32 eptcfg; /* E.PT, read-only */
|
|
|
|
__u32 tlbcfg[4];/* read-only */
|
|
|
|
__u32 tlbps[4]; /* read-only */
|
|
|
|
|
|
|
|
__u32 eplc, epsc; /* E.PD */
|
|
|
|
} e;
|
2009-11-30 03:02:02 +00:00
|
|
|
__u8 pad[1020];
|
|
|
|
} u;
|
2008-04-16 23:28:09 -05:00
|
|
|
};
|
|
|
|
|
|
|
|
struct kvm_fpu {
|
|
|
|
__u64 fpr[32];
|
|
|
|
};
|
2008-04-02 13:04:40 -07:00
|
|
|
|
2008-12-15 13:52:10 +01:00
|
|
|
struct kvm_debug_exit_arch {
|
|
|
|
};
|
|
|
|
|
|
|
|
/* for KVM_SET_GUEST_DEBUG */
|
|
|
|
struct kvm_guest_debug_arch {
|
|
|
|
};
|
|
|
|
|
2010-02-19 11:00:29 +01:00
|
|
|
#define KVM_REG_MASK 0x001f
|
|
|
|
#define KVM_REG_EXT_MASK 0xffe0
|
|
|
|
#define KVM_REG_GPR 0x0000
|
|
|
|
#define KVM_REG_FPR 0x0020
|
|
|
|
#define KVM_REG_QPR 0x0040
|
|
|
|
#define KVM_REG_FQPR 0x0060
|
|
|
|
|
2010-03-24 21:48:18 +01:00
|
|
|
#define KVM_INTERRUPT_SET -1U
|
|
|
|
#define KVM_INTERRUPT_UNSET -2U
|
2010-08-30 10:44:15 +02:00
|
|
|
#define KVM_INTERRUPT_SET_LEVEL -3U
|
2010-03-24 21:48:18 +01:00
|
|
|
|
2011-06-29 00:22:41 +00:00
|
|
|
/* for KVM_CAP_SPAPR_TCE */
|
|
|
|
struct kvm_create_spapr_tce {
|
|
|
|
__u64 liobn;
|
|
|
|
__u32 window_size;
|
|
|
|
};
|
|
|
|
|
KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests
This adds infrastructure which will be needed to allow book3s_hv KVM to
run on older POWER processors, including PPC970, which don't support
the Virtual Real Mode Area (VRMA) facility, but only the Real Mode
Offset (RMO) facility. These processors require a physically
contiguous, aligned area of memory for each guest. When the guest does
an access in real mode (MMU off), the address is compared against a
limit value, and if it is lower, the address is ORed with an offset
value (from the Real Mode Offset Register (RMOR)) and the result becomes
the real address for the access. The size of the RMA has to be one of
a set of supported values, which usually includes 64MB, 128MB, 256MB
and some larger powers of 2.
Since we are unlikely to be able to allocate 64MB or more of physically
contiguous memory after the kernel has been running for a while, we
allocate a pool of RMAs at boot time using the bootmem allocator. The
size and number of the RMAs can be set using the kvm_rma_size=xx and
kvm_rma_count=xx kernel command line options.
KVM exports a new capability, KVM_CAP_PPC_RMA, to signal the availability
of the pool of preallocated RMAs. The capability value is 1 if the
processor can use an RMA but doesn't require one (because it supports
the VRMA facility), or 2 if the processor requires an RMA for each guest.
This adds a new ioctl, KVM_ALLOCATE_RMA, which allocates an RMA from the
pool and returns a file descriptor which can be used to map the RMA. It
also returns the size of the RMA in the argument structure.
Having an RMA means we will get multiple KMV_SET_USER_MEMORY_REGION
ioctl calls from userspace. To cope with this, we now preallocate the
kvm->arch.ram_pginfo array when the VM is created with a size sufficient
for up to 64GB of guest memory. Subsequently we will get rid of this
array and use memory associated with each memslot instead.
This moves most of the code that translates the user addresses into
host pfns (page frame numbers) out of kvmppc_prepare_vrma up one level
to kvmppc_core_prepare_memory_region. Also, instead of having to look
up the VMA for each page in order to check the page size, we now check
that the pages we get are compound pages of 16MB. However, if we are
adding memory that is mapped to an RMA, we don't bother with calling
get_user_pages_fast and instead just offset from the base pfn for the
RMA.
Typically the RMA gets added after vcpus are created, which makes it
inconvenient to have the LPCR (logical partition control register) value
in the vcpu->arch struct, since the LPCR controls whether the processor
uses RMA or VRMA for the guest. This moves the LPCR value into the
kvm->arch struct and arranges for the MER (mediated external request)
bit, which is the only bit that varies between vcpus, to be set in
assembly code when going into the guest if there is a pending external
interrupt request.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
2011-06-29 00:25:44 +00:00
|
|
|
/* for KVM_ALLOCATE_RMA */
|
|
|
|
struct kvm_allocate_rma {
|
|
|
|
__u64 rma_size;
|
|
|
|
};
|
|
|
|
|
2008-04-16 23:28:09 -05:00
|
|
|
#endif /* __LINUX_KVM_POWERPC_H */
|