linux

mirror of https://github.com/FEX-Emu/linux.git synced 2024-12-28 04:17:47 +00:00

Author	SHA1	Message	Date
Takuya Yoshikawa	db5b0762f3	KVM: x86 emulator: Use opcode::execute for some instructions Move the following functions to the opcode tables: RET (Far return) : CB IRET : CF JMP (Jump far) : EA SYSCALL : 0F 05 CLTS : 0F 06 SYSENTER : 0F 34 SYSEXIT : 0F 35 Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:15:59 +03:00
Takuya Yoshikawa	e01991e71a	KVM: x86 emulator: Rename emulate_xxx() to em_xxx() The next patch will change these to be called by opcode::execute. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:15:58 +03:00
Takuya Yoshikawa	9d74191ab1	KVM: x86 emulator: Use the pointers ctxt and c consistently We should use the local variables ctxt and c when the emulate_ctxt and decode appears many times. At least, we need to be consistent about how we use these in a function. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 13:15:57 +03:00
Nadav Har'El	2844d84905	KVM: nVMX: Miscellenous small corrections Small corrections of KVM (spelling, etc.) not directly related to nested VMX. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:19 +03:00
Nadav Har'El	7b8050f570	KVM: nVMX: Add VMX to list of supported cpuid features If the "nested" module option is enabled, add the "VMX" CPU feature to the list of CPU features KVM advertises with the KVM_GET_SUPPORTED_CPUID ioctl. Qemu uses this ioctl, and intersects KVM's list with its own list of desired cpu features (depending on the -cpu option given to qemu) to determine the final list of features presented to the guest. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:19 +03:00
Nadav Har'El	7991825b85	KVM: nVMX: Additional TSC-offset handling In the unlikely case that L1 does not capture MSR_IA32_TSC, L0 needs to emulate this MSR write by L2 by modifying vmcs02.tsc_offset. We also need to set vmcs12.tsc_offset, for this change to survive the next nested entry (see prepare_vmcs02()). Additionally, we also need to modify vmx_adjust_tsc_offset: The semantics of this function is that the TSC of all guests on this vcpu, L1 and possibly several L2s, need to be adjusted. To do this, we need to adjust vmcs01's tsc_offset (this offset will also apply to each L2s we enter). We can't set vmcs01 now, so we have to remember this adjustment and apply it when we later exit to L1. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:19 +03:00
Nadav Har'El	36cf24e01e	KVM: nVMX: Further fixes for lazy FPU loading KVM's "Lazy FPU loading" means that sometimes L0 needs to set CR0.TS, even if a guest didn't set it. Moreover, L0 must also trap CR0.TS changes and NM exceptions, even if we have a guest hypervisor (L1) who didn't want these traps. And of course, conversely: If L1 wanted to trap these events, we must let it, even if L0 is not interested in them. This patch fixes some existing KVM code (in update_exception_bitmap(), vmx_fpu_activate(), vmx_fpu_deactivate()) to do the correct merging of L0's and L1's needs. Note that handle_cr() was already fixed in the above patch, and that new code in introduced in previous patches already handles CR0 correctly (see prepare_vmcs02(), prepare_vmcs12(), and nested_vmx_vmexit()). Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:18 +03:00
Nadav Har'El	eeadf9e755	KVM: nVMX: Handling of CR0 and CR4 modifying instructions When L2 tries to modify CR0 or CR4 (with mov or clts), and modifies a bit which L1 asked to shadow (via CR[04]_GUEST_HOST_MASK), we already do the right thing: we let L1 handle the trap (see nested_vmx_exit_handled_cr() in a previous patch). When L2 modifies bits that L1 doesn't care about, we let it think (via CR[04]_READ_SHADOW) that it did these modifications, while only changing (in GUEST_CR[04]) the bits that L0 doesn't shadow. This is needed for corect handling of CR0.TS for lazy FPU loading: L0 may want to leave TS on, while pretending to allow the guest to change it. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:18 +03:00
Nadav Har'El	66c78ae40c	KVM: nVMX: Correct handling of idt vectoring info This patch adds correct handling of IDT_VECTORING_INFO_FIELD for the nested case. When a guest exits while delivering an interrupt or exception, we get this information in IDT_VECTORING_INFO_FIELD in the VMCS. When L2 exits to L1, there's nothing we need to do, because L1 will see this field in vmcs12, and handle it itself. However, when L2 exits and L0 handles the exit itself and plans to return to L2, L0 must inject this event to L2. In the normal non-nested case, the idt_vectoring_info case is discovered after the exit, and the decision to inject (though not the injection itself) is made at that point. However, in the nested case a decision of whether to return to L2 or L1 also happens during the injection phase (see the previous patches), so in the nested case we can only decide what to do about the idt_vectoring_info right after the injection, i.e., in the beginning of vmx_vcpu_run, which is the first time we know for sure if we're staying in L2. Therefore, when we exit L2 (is_guest_mode(vcpu)), we disable the regular vmx_complete_interrupts() code which queues the idt_vectoring_info for injection on next entry - because such injection would not be appropriate if we will decide to exit to L1. Rather, we just save the idt_vectoring_info and related fields in vmcs12 (which is a convenient place to save these fields). On the next entry in vmx_vcpu_run (after the injection phase, potentially exiting to L1 to inject an event requested by user space), if we find ourselves in L1 we don't need to do anything with those values we saved (as explained above). But if we find that we're in L2, or rather still at L2 (it's not nested_run_pending, meaning that this is the first round of L2 running after L1 having just launched it), we need to inject the event saved in those fields - by writing the appropriate VMCS fields. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:18 +03:00
Nadav Har'El	0b6ac343fc	KVM: nVMX: Correct handling of exception injection Similar to the previous patch, but concerning injection of exceptions rather than external interrupts. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:17 +03:00
Nadav Har'El	b6f1250edb	KVM: nVMX: Correct handling of interrupt injection The code in this patch correctly emulates external-interrupt injection while a nested guest L2 is running. Because of this code's relative un-obviousness, I include here a longer-than- usual justification for what it does - much longer than the code itself ;-) To understand how to correctly emulate interrupt injection while L2 is running, let's look first at what we need to emulate: How would things look like if the extra L0 hypervisor layer is removed, and instead of L0 injecting an interrupt, we had hardware delivering an interrupt? Now we have L1 running on bare metal with a guest L2, and the hardware generates an interrupt. Assuming that L1 set PIN_BASED_EXT_INTR_MASK to 1, and VM_EXIT_ACK_INTR_ON_EXIT to 0 (we'll revisit these assumptions below), what happens now is this: The processor exits from L2 to L1, with an external- interrupt exit reason but without an interrupt vector. L1 runs, with interrupts disabled, and it doesn't yet know what the interrupt was. Soon after, it enables interrupts and only at that moment, it gets the interrupt from the processor. when L1 is KVM, Linux handles this interrupt. Now we need exactly the same thing to happen when that L1->L2 system runs on top of L0, instead of real hardware. This is how we do this: When L0 wants to inject an interrupt, it needs to exit from L2 to L1, with external-interrupt exit reason (with an invalid interrupt vector), and run L1. Just like in the bare metal case, it likely can't deliver the interrupt to L1 now because L1 is running with interrupts disabled, in which case it turns on the interrupt window when running L1 after the exit. L1 will soon enable interrupts, and at that point L0 will gain control again and inject the interrupt to L1. Finally, there is an extra complication in the code: when nested_run_pending, we cannot return to L1 now, and must launch L2. We need to remember the interrupt we wanted to inject (and not clear it now), and do it on the next exit. The above explanation shows that the relative strangeness of the nested interrupt injection code in this patch, and the extra interrupt-window exit incurred, are in fact necessary for accurate emulation, and are not just an unoptimized implementation. Let's revisit now the two assumptions made above: If L1 turns off PIN_BASED_EXT_INTR_MASK (no hypervisor that I know does, by the way), things are simple: L0 may inject the interrupt directly to the L2 guest - using the normal code path that injects to any guest. We support this case in the code below. If L1 turns on VM_EXIT_ACK_INTR_ON_EXIT, things look very different from the description above: L1 expects to see an exit from L2 with the interrupt vector already filled in the exit information, and does not expect to be interrupted again with this interrupt. The current code does not (yet) support this case, so we do not allow the VM_EXIT_ACK_INTR_ON_EXIT exit-control to be turned on by L1. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:17 +03:00
Nadav Har'El	644d711aa0	KVM: nVMX: Deciding if L0 or L1 should handle an L2 exit This patch contains the logic of whether an L2 exit should be handled by L0 and then L2 should be resumed, or whether L1 should be run to handle this exit (using the nested_vmx_vmexit() function of the previous patch). The basic idea is to let L1 handle the exit only if it actually asked to trap this sort of event. For example, when L2 exits on a change to CR0, we check L1's CR0_GUEST_HOST_MASK to see if L1 expressed interest in any bit which changed; If it did, we exit to L1. But if it didn't it means that it is we (L0) that wished to trap this event, so we handle it ourselves. The next two patches add additional logic of what to do when an interrupt or exception is injected: Does L0 need to do it, should we exit to L1 to do it, or should we resume L2 and keep the exception to be injected later. We keep a new flag, "nested_run_pending", which can override the decision of which should run next, L1 or L2. nested_run_pending=1 means that we must run L2 next, not L1. This is necessary in particular when L1 did a VMLAUNCH of L2 and therefore expects L2 to be run (and perhaps be injected with an event it specified, etc.). Nested_run_pending is especially intended to avoid switching to L1 in the injection decision-point described above. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:16 +03:00
Nadav Har'El	7c1779384a	KVM: nVMX: vmcs12 checks on nested entry This patch adds a bunch of tests of the validity of the vmcs12 fields, according to what the VMX spec and our implementation allows. If fields we cannot (or don't want to) honor are discovered, an entry failure is emulated. According to the spec, there are two types of entry failures: If the problem was in vmcs12's host state or control fields, the VMLAUNCH instruction simply fails. But a problem is found in the guest state, the behavior is more similar to that of an exit. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:16 +03:00
Nadav Har'El	4704d0befb	KVM: nVMX: Exiting from L2 to L1 This patch implements nested_vmx_vmexit(), called when the nested L2 guest exits and we want to run its L1 parent and let it handle this exit. Note that this will not necessarily be called on every L2 exit. L0 may decide to handle a particular exit on its own, without L1's involvement; In that case, L0 will handle the exit, and resume running L2, without running L1 and without calling nested_vmx_vmexit(). The logic for deciding whether to handle a particular exit in L1 or in L0, i.e., whether to call nested_vmx_vmexit(), will appear in a separate patch below. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:16 +03:00
Nadav Har'El	99e65e805d	KVM: nVMX: No need for handle_vmx_insn function any more Before nested VMX support, the exit handler for a guest executing a VMX instruction (vmclear, vmlaunch, vmptrld, vmptrst, vmread, vmread, vmresume, vmwrite, vmon, vmoff), was handle_vmx_insn(). This handler simply threw a #UD exception. Now that all these exit reasons are properly handled (and emulate the respective VMX instruction), nothing calls this dummy handler and it can be removed. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:15 +03:00
Nadav Har'El	cd232ad02f	KVM: nVMX: Implement VMLAUNCH and VMRESUME Implement the VMLAUNCH and VMRESUME instructions, allowing a guest hypervisor to run its own guests. This patch does not include some of the necessary validity checks on vmcs12 fields before the entry. These will appear in a separate patch below. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:15 +03:00
Nadav Har'El	fe3ef05c75	KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12 This patch contains code to prepare the VMCS which can be used to actually run the L2 guest, vmcs02. prepare_vmcs02 appropriately merges the information in vmcs12 (the vmcs that L1 built for L2) and in vmcs01 (our desires for our own guests). Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:14 +03:00
Nadav Har'El	bf8179a011	KVM: nVMX: Move control field setup to functions Move some of the control field setup to common functions. These functions will also be needed for running L2 guests - L0's desires (expressed in these functions) will be appropriately merged with L1's desires. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:14 +03:00
Nadav Har'El	a3a8ff8ebf	KVM: nVMX: Move host-state field setup to a function Move the setting of constant host-state fields (fields that do not change throughout the life of the guest) from vmx_vcpu_setup to a new common function vmx_set_constant_host_state(). This function will also be used to set the host state when running L2 guests. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:14 +03:00
Nadav Har'El	49f705c532	KVM: nVMX: Implement VMREAD and VMWRITE Implement the VMREAD and VMWRITE instructions. With these instructions, L1 can read and write to the VMCS it is holding. The values are read or written to the fields of the vmcs12 structure introduced in a previous patch. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:14 +03:00
Nadav Har'El	6a4d755060	KVM: nVMX: Implement VMPTRST This patch implements the VMPTRST instruction. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:13 +03:00
Nadav Har'El	63846663ea	KVM: nVMX: Implement VMPTRLD This patch implements the VMPTRLD instruction. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:12 +03:00
Nadav Har'El	27d6c86521	KVM: nVMX: Implement VMCLEAR This patch implements the VMCLEAR instruction. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:12 +03:00
Nadav Har'El	0140caea3b	KVM: nVMX: Success/failure of VMX instructions. VMX instructions specify success or failure by setting certain RFLAGS bits. This patch contains common functions to do this, and they will be used in the following patches which emulate the various VMX instructions. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:12 +03:00
Nadav Har'El	22bd035868	KVM: nVMX: Add VMCS fields to the vmcs12 In this patch we add to vmcs12 (the VMCS that L1 keeps for L2) all the standard VMCS fields. Later patches will enable L1 to read and write these fields using VMREAD/ VMWRITE, and they will be used during a VMLAUNCH/VMRESUME in preparing vmcs02, a hardware VMCS for running L2. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:11 +03:00
Nadav Har'El	ff2f6fe961	KVM: nVMX: Introduce vmcs02: VMCS used to run L2 We saw in a previous patch that L1 controls its L2 guest with a vcms12. L0 needs to create a real VMCS for running L2. We call that "vmcs02". A later patch will contain the code, prepare_vmcs02(), for filling the vmcs02 fields. This patch only contains code for allocating vmcs02. In this version, prepare_vmcs02() sets all of vmcs02's fields each time we enter from L1 to L2, so keeping just one vmcs02 for the vcpu is enough: It can be reused even when L1 runs multiple L2 guests. However, in future versions we'll probably want to add an optimization where vmcs02 fields that rarely change will not be set each time. For that, we may want to keep around several vmcs02s of L2 guests that have recently run, so that potentially we could run these L2s again more quickly because less vmwrites to vmcs02 will be needed. This patch adds to each vcpu a vmcs02 pool, vmx->nested.vmcs02_pool, which remembers the vmcs02s last used to run up to VMCS02_POOL_SIZE L2s. As explained above, in the current version we choose VMCS02_POOL_SIZE=1, I.e., one vmcs02 is allocated (and loaded onto the processor), and it is reused to enter any L2 guest. In the future, when prepare_vmcs02() is optimized not to set all fields every time, VMCS02_POOL_SIZE should be increased. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:11 +03:00
Nadav Har'El	064aea7747	KVM: nVMX: Decoding memory operands of VMX instructions This patch includes a utility function for decoding pointer operands of VMX instructions issued by L1 (a guest hypervisor) Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:11 +03:00
Nadav Har'El	b87a51ae28	KVM: nVMX: Implement reading and writing of VMX MSRs When the guest can use VMX instructions (when the "nested" module option is on), it should also be able to read and write VMX MSRs, e.g., to query about VMX capabilities. This patch adds this support. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:11 +03:00
Nadav Har'El	a9d30f33dd	KVM: nVMX: Introduce vmcs12: a VMCS structure for L1 An implementation of VMX needs to define a VMCS structure. This structure is kept in guest memory, but is opaque to the guest (who can only read or write it with VMX instructions). This patch starts to define the VMCS structure which our nested VMX implementation will present to L1. We call it "vmcs12", as it is the VMCS that L1 keeps for its L2 guest. We will add more content to this structure in later patches. This patch also adds the notion (as required by the VMX spec) of L1's "current VMCS", and finally includes utility functions for mapping the guest-allocated VMCSs in host memory. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:10 +03:00
Nadav Har'El	5e1746d620	KVM: nVMX: Allow setting the VMXE bit in CR4 This patch allows the guest to enable the VMXE bit in CR4, which is a prerequisite to running VMXON. Whether to allow setting the VMXE bit now depends on the architecture (svm or vmx), so its checking has moved to kvm_x86_ops->set_cr4(). This function now returns an int: If kvm_x86_ops->set_cr4() returns 1, __kvm_set_cr4() will also return 1, and this will cause kvm_set_cr4() will throw a #GP. Turning on the VMXE bit is allowed only when the nested VMX feature is enabled, and turning it off is forbidden after a vmxon. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:10 +03:00
Nadav Har'El	ec378aeef9	KVM: nVMX: Implement VMXON and VMXOFF This patch allows a guest to use the VMXON and VMXOFF instructions, and emulates them accordingly. Basically this amounts to checking some prerequisites, and then remembering whether the guest has enabled or disabled VMX operation. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:09 +03:00
Nadav Har'El	801d342432	KVM: nVMX: Add "nested" module option to kvm_intel This patch adds to kvm_intel a module option "nested". This option controls whether the guest can use VMX instructions, i.e., whether we allow nested virtualization. A similar, but separate, option already exists for the SVM module. This option currently defaults to 0, meaning that nested VMX must be explicitly enabled by giving nested=1. When nested VMX matures, the default should probably be changed to enable nested VMX by default - just like nested SVM is currently enabled by default. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:09 +03:00
Takuya Yoshikawa	b5c9ff731f	KVM: x86 emulator: Avoid clearing the whole decode_cache During tracing the emulator, we noticed that init_emulate_ctxt() sometimes took a bit longer time than we expected. This patch is for mitigating the problem by some degree. By looking into the function, we soon notice that it clears the whole decode_cache whose size is about 2.5K bytes now. Furthermore, most of the bytes are taken for the two read_cache arrays, which are used only by a few instructions. Considering the fact that we are not assuming the cache arrays have been cleared when we store actual data, we do not need to clear the arrays: 2K bytes elimination. In addition, we can avoid clearing the fetch_cache and regs arrays. This patch changes the initialization not to clear the arrays. On our 64-bit host, init_emulate_ctxt() becomes 0.3 to 0.5us faster with this patch applied. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 11:45:09 +03:00
Takuya Yoshikawa	adf52235b4	KVM: x86 emulator: Clean up init_emulate_ctxt() Use a local pointer to the emulate_ctxt for simplicity. Then, arrange the hard-to-read mode selection lines neatly. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 11:45:08 +03:00
Jan Kiszka	d780592b99	KVM: Clean up error handling during VCPU creation So far kvm_arch_vcpu_setup is responsible for freeing the vcpu struct if it fails. Move this confusing resonsibility back into the hands of kvm_vm_ioctl_create_vcpu. Only kvm_arch_vcpu_setup of x86 is affected, all other archs cannot fail. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 11:45:08 +03:00
Nadav Har'El	d462b81923	KVM: VMX: Keep list of loaded VMCSs, instead of vcpus In VMX, before we bring down a CPU we must VMCLEAR all VMCSs loaded on it because (at least in theory) the processor might not have written all of its content back to memory. Since a patch from June 26, 2008, this is done using a per-cpu "vcpus_on_cpu" linked list of vcpus loaded on each CPU. The problem is that with nested VMX, we no longer have the concept of a vcpu being loaded on a cpu: A vcpu has multiple VMCSs (one for L1, a pool for L2s), and each of those may be have been last loaded on a different cpu. So instead of linking the vcpus, we link the VMCSs, using a new structure loaded_vmcs. This structure contains the VMCS, and the information pertaining to its loading on a specific cpu (namely, the cpu number, and whether it was already launched on this cpu once). In nested we will also use the same structure to hold L2 VMCSs, and vmx->loaded_vmcs is a pointer to the currently active VMCS. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Acked-by: Acked-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-07-12 11:45:08 +03:00
Avi Kivity	24c82e576b	KVM: Sanitize cpuid Instead of blacklisting known-unsupported cpuid leaves, whitelist known- supported leaves. This is more conservative and prevents us from reporting features we don't support. Also whitelist a few more leaves while at it. Signed-off-by: Avi Kivity <avi@redhat.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:07 +03:00
Xiao Guangrong	bcdd9a93c5	KVM: MMU: cleanup for dropping parent pte Introduce drop_parent_pte to remove the rmap of parent pte and clear parent pte Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:07 +03:00
Xiao Guangrong	38e3b2b28c	KVM: MMU: cleanup for kvm_mmu_page_unlink_children Cleanup the same operation between kvm_mmu_page_unlink_children and mmu_pte_write_zap_pte Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:07 +03:00
Xiao Guangrong	67052b3508	KVM: MMU: remove the arithmetic of parent pte rmap Parent pte rmap and page rmap are very similar, so use the same arithmetic for them Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:07 +03:00
Xiao Guangrong	53c07b1878	KVM: MMU: abstract the operation of rmap Abstract the operation of rmap to spte_list, then we can use it for the reverse mapping of parent pte in the later patch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:06 +03:00
Xiao Guangrong	1249b96e72	KVM: fix uninitialized warning Fix: warning: ‘cs_sel’ may be used uninitialized in this function warning: ‘ss_sel’ may be used uninitialized in this function Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:06 +03:00
Xiao Guangrong	8b0cedff04	KVM: use __copy_to_user/__clear_user to write guest page Simply use __copy_to_user/__clear_user to write guest page since we have already verified the user address when the memslot is set Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:03 +03:00
Xiao Guangrong	332b207d65	KVM: MMU: optimize pte write path if don't have protected sp Simply return from kvm_mmu_pte_write path if no shadow page is write-protected, then we can avoid to walk all shadow pages and hold mmu-lock Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:02 +03:00
Avi Kivity	96304217a7	KVM: VMX: always_inline VMREADs vmcs_readl() and friends are really short, but gcc thinks they are long because of the out-of-line exception handlers. Mark them always_inline to clear the misunderstanding. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:01 +03:00
Avi Kivity	5e520e6278	KVM: VMX: Move VMREAD cleanup to exception handler We clean up a failed VMREAD by clearing the output register. Do it in the exception handler instead of unconditionally. This is worthwhile since there are more than a hundred call sites. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:45:00 +03:00
Takuya Yoshikawa	7b105ca290	KVM: x86 emulator: Stop passing ctxt->ops as arg of emul functions Dereference it in the actual users. This not only cleans up the emulator but also makes it easy to convert the old emulation functions to the new em_xxx() form later. Note: Remove some inline keywords to let the compiler decide inlining. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:44:59 +03:00
Takuya Yoshikawa	ef5d75cc9a	KVM: x86 emulator: Stop passing ctxt->ops as arg of decode helpers Dereference it in the actual users: only do_insn_fetch_byte(). This is consistent with the way __linearize() dereferences it. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:44:57 +03:00
Takuya Yoshikawa	67cbc90db5	KVM: x86 emulator: Place insn_fetch helpers together The two macros need special care to use: Assume rc, ctxt, ops and done exist outside of them. Can goto outside. Considering the fact that these are used only in decode functions, moving these right after do_insn_fetch() seems to be a right thing to improve the readability. We also rename do_fetch_insn_byte() to do_insn_fetch_byte() to be consistent. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-07-12 11:44:56 +03:00
Jiri Kosina	b7e9c223be	Merge branch 'master' into for-next Sync with Linus' tree to be able to apply pending patches that are based on newer code already present upstream.	2011-07-11 14:15:55 +02:00
Avi Kivity	cb16c34876	KVM: x86 emulator: fix %rip-relative addressing with immediate source operand %rip-relative addressing is relative to the first byte of the next instruction, so we need to add %rip only after we've fetched any immediate bytes. Based on original patch by Li Xin <xin.li@intel.com>. Signed-off-by: Avi Kivity <avi@redhat.com> Acked-by: Li Xin <xin.li@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-06-29 10:09:25 +03:00
Vitaliy Ivanov	e44ba033c5	treewide: remove duplicate includes Many stupid corrections of duplicated includes based on the output of scripts/checkincludes.pl. Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-06-20 16:08:19 +02:00
Steve	a0a8eaba16	KVM: MMU: fix opposite condition in mapping_level_dirty_bitmap The condition is opposite, it always maps huge page for the dirty tracked page Reported-by: Steve <stefan.bosak@gmail.com> Signed-off-by: Steve <stefan.bosak@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-06-19 19:23:13 +03:00
Marcelo Tosatti	5233dd51ec	KVM: VMX: do not overwrite uptodate vcpu->arch.cr3 on KVM_SET_SREGS Only decache guest CR3 value if vcpu->arch.cr3 is stale. Fixes loadvm with live guest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Tested-by: Markus Schade <markus.schade@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-06-19 19:23:13 +03:00
Borislav Petkov	b72336355b	KVM: MMU: Fix build warnings in walk_addr_generic() On 3.0-rc1 I get In file included from arch/x86/kvm/mmu.c:2856: arch/x86/kvm/paging_tmpl.h: In function ‘paging32_walk_addr_generic’: arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function In file included from arch/x86/kvm/mmu.c:2852: arch/x86/kvm/paging_tmpl.h: In function ‘paging64_walk_addr_generic’: arch/x86/kvm/paging_tmpl.h:124: warning: ‘ptep_user’ may be used uninitialized in this function caused by `6e2ca7d180`. According to Takuya Yoshikawa, ptep_user won't be used uninitialized so shut up gcc. Cc: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Link: http://lkml.kernel.org/r/20110530094604.GC21833@liondog.tnic Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-06-19 19:23:13 +03:00
Linus Torvalds	58a9a36b54	Merge branch 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/3.0' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Initialize kvm before registering the mmu notifier KVM: x86: use proper port value when checking io instruction permission KVM: add missing void __user * cast to access_ok() call	2011-06-07 19:06:28 -07:00
Marcelo Tosatti	221192bdff	KVM: x86: use proper port value when checking io instruction permission Commit `f6511935f4` moved the permission check for io instructions to the ->check_perm callback. It failed to copy the port value from RDX register for string and "in,out ax,dx" instructions. Fix it by reading RDX register at decode stage when appropriate. Fixes FC8.32 installation. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-06-06 10:52:09 +03:00
Ying Han	1495f230fa	vmscan: change shrinker API by passing shrink_control struct Change each shrinker's API by consolidating the existing parameters into shrink_control struct. This will simplify any further features added w/o touching each file of shrinker. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: fix warning] [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API] [akpm@linux-foundation.org: fix xfs warning] [akpm@linux-foundation.org: update gfs2] Signed-off-by: Ying Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:26 -07:00
Takuya Yoshikawa	c8cfbb555e	KVM: MMU: Use ptep_user for cmpxchg_gpte() The address of the gpte was already calculated and stored in ptep_user before entering cmpxchg_gpte(). This patch makes cmpxchg_gpte() to use that to make it clear that we are using the same address during walk_addr_generic(). Note that the unlikely annotations are used to show that the conditions are something unusual rather than for performance. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-05-22 08:48:14 -04:00
Takuya Yoshikawa	d2f62766d5	KVM: x86 emulator: Make jmp far emulation into a separate function We introduce em_jmp_far(). We also call this from em_grp45() to stop treating modrm_reg == 5 case separately in the group 5 emulation. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:48:06 -04:00
Takuya Yoshikawa	51187683cb	KVM: x86 emulator: Rename emulate_grpX() to em_grpX() The prototypes are changed appropriately. We also replaces "goto grp45;" with simple em_grp45() call. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:48:05 -04:00
Takuya Yoshikawa	3b9be3bf2e	KVM: x86 emulator: Remove unused arg from emulate_pop() The opt of emulate_grp1a() is also removed. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:48:03 -04:00
Takuya Yoshikawa	adddcecf92	KVM: x86 emulator: Remove unused arg from writeback() Remove inline at this chance. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:48:00 -04:00
Takuya Yoshikawa	509cf9fe11	KVM: x86 emulator: Remove unused arg from read_descriptor() Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:47:59 -04:00
Takuya Yoshikawa	c1ed6dea81	KVM: x86 emulator: Remove unused arg from seg_override() In addition, one comma at the end of a statement is replaced with a semicolon. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:47:57 -04:00
Takuya Yoshikawa	fa3d315a4c	KVM: Validate userspace_addr of memslot when registered This way, we can avoid checking the user space address many times when we read the guest memory. Although we can do the same for write if we check which slots are writable, we do not care write now: reading the guest memory happens more often than writing. [avi: change VERIFY_READ to VERIFY_WRITE] Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:47:56 -04:00
Takuya Yoshikawa	12cb814f3b	KVM: MMU: Clean up gpte reading with copy_from_user() When we optimized walk_addr_generic() by not using the generic guest memory reader, we replaced copy_from_user() with get_user(): commit e30d2a170506830d5eef5e9d7990c5aedf1b0a51 KVM: MMU: Optimize guest page table walk commit 15e2ac9a43d4d7d08088e404fddf2533a8e7d52e KVM: MMU: Fix 64-bit paging breakage on x86_32 But as Andi pointed out later, copy_from_user() does the same as get_user() as long as we give a constant size to it. So we use copy_from_user() to clean up the code. The only, noticeable, regression introduced by this is 64-bit gpte reading on x86_32 hosts needed for PAE guests. But this can be mitigated by implementing 8-byte get_user() for x86_32, if needed. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:47:54 -04:00
Avi Kivity	2fb92db1ec	KVM: VMX: Cache vmcs segment fields Since the emulator now checks segment limits and access rights, it generates a lot more accesses to the vmcs segment fields. Undo some of the performance hit by cacheing those fields in a read-only cache (the entire cache is invalidated on any write, or on guest exit). Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:47:45 -04:00
Avi Kivity	1aa366163b	KVM: x86 emulator: consolidate segment accessors Instead of separate accessors for the segment selector and cached descriptor, use one accessor for both. This simplifies the code somewhat. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:47:39 -04:00
Avi Kivity	0a434bb2bf	KVM: VMX: Avoid reading %rip unnecessarily when handling exceptions Avoids a VMREAD. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:40:01 -04:00
Joe Perches	ae8cc05955	KVM: SVM: Make dump_vmcb static, reduce text dump_vmcb isn't used outside this module, make it static. Shrink text and object by ~1% by standardizing formats. $ size arch/x86/kvm/svm.o* text data bss dec hex filename 52910 580 10072 63562 f84a arch/x86/kvm/svm.o.new 53563 580 10072 64215 fad7 arch/x86/kvm/svm.o.old Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:40:00 -04:00
Takuya Yoshikawa	8f74d8e168	KVM: MMU: Fix 64-bit paging breakage on x86_32 Fix regression introduced by commit e30d2a170506830d5eef5e9d7990c5aedf1b0a51 KVM: MMU: Optimize guest page table walk On x86_32, get_user() does not support 64-bit values and we fail to build KVM at the point of 64-bit paging. This patch fixes this by using get_user() twice for that condition. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Reported-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:58 -04:00
BrillyWu@viatech.com.cn	4429d5dc11	KVM: Add CPUID support for VIA CPU The CPUIDs for Centaur are added, and then the features of PadLock hardware engine on VIA CPU, such as "ace", "ace_en" and so on, can be passed into the kvm guest. Signed-off-by: Brilly Wu <brillywu@viatech.com.cn> Signed-off-by: Kary Jin <karyjin@viatech.com.cn> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:55 -04:00
Gleb Natapov	2aab2c5b2b	KVM: call cache_all_regs() only once during instruction emulation Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:54 -04:00
Gleb Natapov	0004c7c257	KVM: Fix compound mmio mmio_index should be taken into account when copying data from userspace. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:52 -04:00
Gleb Natapov	4947e7cd0e	KVM: emulator: Propagate fault in far jump emulation Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:51 -04:00
Gleb Natapov	8d7d810255	KVM: mmio_fault_cr2 is not used Remove unused variable mmio_fault_cr2. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:49 -04:00
Avi Kivity	46561646ce	KVM: x86 emulator: consolidate group handling Move all groups into a single field and handle them in a single place. This saves bits when we add more group types (3 bits -> 7 groups types). Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:48 -04:00
Avi Kivity	781e0743af	KVM: MMU: Add unlikely() annotations to walk_addr_generic() walk_addr_generic() is a hot path and is also hard for the cpu to predict - some of the parameters (fetch_fault in particular) vary wildly from invocation to invocation. Add unlikely() annotations where appropriate; all walk failures are considered unlikely, as are cases where we have to mark the accessed or dirty bit, as they are slow paths both in kvm and on real processors. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:46 -04:00
Takuya Yoshikawa	62aaa2f05a	KVM: x86 emulator: Use opcode::execute for PUSHF/POPF (9C/9D) For this, em_pushf/popf() are introduced. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:45 -04:00
Takuya Yoshikawa	b96a7fad02	KVM: x86 emulator: Use opcode::execute for PUSHA/POPA (60/61) For this, emulate_pusha/popa() are converted to em_pusha/popa(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:43 -04:00
Takuya Yoshikawa	c54fe50469	KVM: x86 emulator: Use opcode::execute for POP reg (58-5F) In addition, the RET emulation is changed to call em_pop() to remove the pop_instruction label. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:41 -04:00
Takuya Yoshikawa	d67fc27ae2	KVM: x86 emulator: Use opcode::execute for Group 1, CMPS and SCAS The following instructions are changed to use opcode::execute. Group 1 (80-83) ADD (00-05), OR (08-0D), ADC (10-15), SBB (18-1D), AND (20-25), SUB (28-2D), XOR (30-35), CMP (38-3D) CMPS (A6-A7), SCAS (AE-AF) The last two do the same as CMP in the emulator, so em_cmp() is used. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:40 -04:00
Takuya Yoshikawa	6e2ca7d180	KVM: MMU: Optimize guest page table walk This patch optimizes the guest page table walk by using get_user() instead of copy_from_user(). With this patch applied, paging64_walk_addr_generic() has become about 0.5us to 1.0us faster on my Phenom II machine with NPT on. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:38 -04:00
Avi Kivity	40e19b519c	KVM: SVM: Get rid of x86_intercept_map::valid By reserving 0 as an invalid x86_intercept_stage, we no longer need to store a valid flag in x86_intercept_map. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:37 -04:00
Avi Kivity	5ef39c71d8	KVM: x86 emulator: Use opcode::execute for 0F 01 opcode Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:35 -04:00
Avi Kivity	68152d8812	KVM: x86 emulator: Don't force #UD for 0F 01 /5 While it isn't defined, no need to force a #UD. If it becomes defined in the future this can cause wierd problems for the guest. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:34 -04:00
Avi Kivity	26d05cc740	KVM: x86 emulator: move 0F 01 sub-opcodes into their own functions Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:32 -04:00
Randy Dunlap	d42244499f	KVM: x86 emulator: fix const value warning on i386 in svm insn RAX check arch/x86/kvm/emulate.c:2598: warning: integer constant is too large for 'long' type Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:30 -04:00
Clemens Noss	cfb223753c	KVM: x86 emulator: avoid calling wbinvd() macro Commit 0b56652e33c72092956c651ab6ceb9f0ad081153 fails to build: CC [M] arch/x86/kvm/emulate.o arch/x86/kvm/emulate.c: In function 'x86_emulate_insn': arch/x86/kvm/emulate.c:4095:25: error: macro "wbinvd" passed 1 arguments, but takes just 0 arch/x86/kvm/emulate.c:4095:3: warning: statement with no effect make[2]: * [arch/x86/kvm/emulate.o] Error 1 make[1]: * [arch/x86/kvm] Error 2 make: *** [arch/x86] Error 2 Work around this for now. Signed-off-by: Clemens Noss <cnoss@gmx.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:29 -04:00
Roedel, Joerg	a78484c60e	KVM: MMU: Make cmpxchg_gpte aware of nesting too This patch makes the cmpxchg_gpte() function aware of the difference between l1-gfns and l2-gfns when nested virtualization is in use. This fixes a potential data-corruption problem in the l1-guest and makes the code work correct (at least as correct as the hardware which is emulated in this code) again. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:26 -04:00
Avi Kivity	13db70eca6	KVM: x86 emulator: drop x86_emulate_ctxt::vcpu No longer used. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:24 -04:00
Avi Kivity	5197b808a7	KVM: Avoid using x86_emulate_ctxt.vcpu We can use container_of() instead. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:22 -04:00
Avi Kivity	bcaf5cc543	KVM: x86 emulator: add new ->wbinvd() callback Instead of calling kvm_emulate_wbinvd() directly. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:20 -04:00
Avi Kivity	d6aa10003b	KVM: x86 emulator: add ->fix_hypercall() callback Artificial, but needed to remove direct calls to KVM. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:18 -04:00
Avi Kivity	6c3287f7c5	KVM: x86 emulator: add new ->halt() callback Instead of reaching into vcpu internals. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:17 -04:00
Avi Kivity	3cb16fe78c	KVM: x86 emulator: make emulate_invlpg() an emulator callback Removing direct calls to KVM. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:15 -04:00
Avi Kivity	2d04a05bd7	KVM: x86 emulator: emulate CLTS internally Avoid using ctxt->vcpu; we can do everything with ->get_cr() and ->set_cr(). A side effect is that we no longer activate the fpu on emulated CLTS; but that should be very rare. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:14 -04:00
Avi Kivity	fd72c41922	KVM: x86 emulator: Replace calls to is_pae() and is_paging with ->get_cr() Avoid use of ctxt->vcpu. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:12 -04:00
Avi Kivity	c2ad2bb3ef	KVM: x86 emulator: drop use of is_long_mode() Requires ctxt->vcpu, which is to be abolished. Replace with open calls to get_msr(). Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:10 -04:00
Avi Kivity	1ac9d0cfb0	KVM: x86 emulator: add and use new callbacks set_idt(), set_gdt() Replacing direct calls to realmode_lgdt(), realmode_lidt(). Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:08 -04:00
Avi Kivity	fe870ab9ce	KVM: x86 emulator: avoid using ctxt->vcpu in check_perm() callbacks Unneeded for register access. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:07 -04:00
Avi Kivity	2953538ebb	KVM: x86 emulator: drop vcpu argument from intercept callback Making the emulator caller agnostic. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:05 -04:00
Avi Kivity	717746e382	KVM: x86 emulator: drop vcpu argument from cr/dr/cpl/msr callbacks Making the emulator caller agnostic. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:39:03 -04:00
Avi Kivity	4bff1e86ad	KVM: x86 emulator: drop vcpu argument from segment/gdt/idt callbacks Making the emulator caller agnostic. [Takuya Yoshikawa: fix typo leading to LDT failures] Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-22 08:35:20 -04:00
Avi Kivity	ca1d4a9e77	KVM: x86 emulator: drop vcpu argument from pio callbacks Making the emulator caller agnostic. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:11 -04:00
Avi Kivity	0f65dd70a4	KVM: x86 emulator: drop vcpu argument from memory read/write callbacks Making the emulator caller agnostic. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:10 -04:00
Avi Kivity	7295261cdd	KVM: x86 emulator: whitespace cleanups Clean up lines longer than 80 columns. No code changes. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:10 -04:00
Nelson Elhage	3d9b938eef	KVM: emulator: Use linearize() when fetching instructions Since segments need to be handled slightly differently when fetching instructions, we add a __linearize helper that accepts a new 'fetch' boolean. [avi: fix oops caused by wrong segmented_address initialization order] Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:10 -04:00
Joerg Roedel	7c4c0f4fd5	KVM: X86: Update last_guest_tsc in vcpu_put The last_guest_tsc is used in vcpu_load to adjust the tsc_offset since tsc-scaling is merged. So the last_guest_tsc needs to be updated in vcpu_put instead of the the last_host_tsc. This is fixed with this patch. Reported-by: Jan Kiszka <jan.kiszka@web.de> Tested-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:10 -04:00
Joerg Roedel	977b2d03e4	KVM: SVM: Fix nested sel_cr0 intercept path with decode-assists This patch fixes a bug in the nested-svm path when decode-assists is available on the machine. After a selective-cr0 intercept is detected the rip is advanced unconditionally. This causes the l1-guest to continue running with an l2-rip. This bug was with the sel_cr0 unit-test on decode-assists capable hardware. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:10 -04:00
Nelson Elhage	0521e4c0bc	KVM: x86 emulator: Handle wraparound in (cs_base + offset) when fetching insns Currently, setting a large (i.e. negative) base address for %cs does not work on a 64-bit host. The "JOS" teaching operating system, used by MIT and other universities, relies on such segments while bootstrapping its way to full virtual memory management. Signed-off-by: Nelson Elhage <nelhage@ksplice.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:09 -04:00
Duan Jiong	49704f2658	KVM: remove useless function declaration kvm_inject_pit_timer_irqs() Just remove useless function define kvm_inject_pit_timer_irqs() from file arch/x86/kvm/i8254.h Signed-off-by:Duan Jiong<djduanjiong@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:09 -04:00
Duan Jiong	1e015968df	KVM: remove useless function declarations from file arch/x86/kvm/irq.h Just remove useless function define kvm_pic_clear_isr_ack() and pit_has_pending_timer() Signed-off-by: Duan Jiong<djduanjiong@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:09 -04:00
Serge E. Hallyn	71f9833bb1	KVM: fix push of wrong eip when doing softint When doing a soft int, we need to bump eip before pushing it to the stack. Otherwise we'll do the int a second time. [apw@canonical.com: merged eip update as per Jan's recommendation.] Signed-off-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:09 -04:00
Takuya Yoshikawa	4487b3b48d	KVM: x86 emulator: Use em_push() instead of emulate_push() em_push() is a simple wrapper of emulate_push(). So this patch replaces emulate_push() with em_push() and removes the unnecessary former. In addition, the unused ops arguments are removed from emulate_pusha() and emulate_grp45(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:08 -04:00
Takuya Yoshikawa	4179bb02fd	KVM: x86 emulator: Make emulate_push() store the value directly PUSH emulation stores the value by calling writeback() after setting the dst operand appropriately in emulate_push(). This writeback() using dst is not needed at all because we know the target is the stack. So this patch makes emulate_push() call, newly introduced, segmented_write() directly. By this, many inlined writeback()'s are removed. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:08 -04:00
Takuya Yoshikawa	575e7c1417	KVM: x86 emulator: Disable writeback for CMP emulation This stops "CMP r/m, reg" to write back the data into memory. Pointed out by Avi. The writeback suppression now covers CMP, CMPS, SCAS. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:08 -04:00
Jan Kiszka	be6d05cfdf	KVM: VMX: Ensure that vmx_create_vcpu always returns proper error In case certain allocations fail, vmx_create_vcpu may return 0 as error instead of a negative value encoded via ERR_PTR. This causes a NULL pointer dereferencing later on in kvm_vm_ioctl_vcpu_create. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-05-11 07:57:08 -04:00
Gleb Natapov	7ae441eac5	KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpu Currently we sync registers back and forth before/after exiting to userspace for IO, but during IO device model shouldn't need to read/write the registers, so we can as well skip those sync points. The only exaception is broken vmware backdor interface. The new code sync registers content during IO only if registers are read from/written to by userspace in the middle of the IO operation and this almost never happens in practise. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-05-11 07:57:08 -04:00
Avi Kivity	618ff15de1	KVM: x86 emulator: implement segment permission checks Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:08 -04:00
Avi Kivity	56697687da	KVM: x86 emulator: move desc_limit_scaled() For reuse later. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:07 -04:00
Avi Kivity	52fd8b445f	KVM: x86 emulator: move linearize() downwards So it can call emulate_gp() without forward declarations. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:07 -04:00
Avi Kivity	83b8795a29	KVM: x86 emulator: pass access size and read/write intent to linearize() Needed for segment read/write checks. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:07 -04:00
Avi Kivity	9fa088f4d2	KVM: x86 emulator: change address linearization to return an error code Preparing to add segment checks. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:07 -04:00
Avi Kivity	38503911b3	KVM: x86 emulator: move invlpg emulation into a function It's going to get more complicated soon. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:06 -04:00
Avi Kivity	3ca3ac4dae	KVM: x86 emulator: Add helpers for memory access using segmented addresses Will help later adding proper segment checks. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:06 -04:00
Joerg Roedel	e3e9ed3d2c	KVM: SVM: Fix fault-rip on vmsave/vmload emulation When the emulation of vmload or vmsave fails because the guest passed an unsupported physical address it gets an #GP with rip pointing to the instruction after vmsave/vmload. This is a bug and fixed by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:06 -04:00
Joerg Roedel	92a1f12d25	KVM: X86: Implement userspace interface to set virtual_tsc_khz This patch implements two new vm-ioctls to get and set the virtual_tsc_khz if the machine supports tsc-scaling. Setting the tsc-frequency is only possible before userspace creates any vcpu. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:06 -04:00
Joerg Roedel	857e40999e	KVM: X86: Delegate tsc-offset calculation to architecture code With TSC scaling in SVM the tsc-offset needs to be calculated differently. This patch propagates this calculation into the architecture specific modules so that this complexity can be handled there. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:05 -04:00
Joerg Roedel	4051b18801	KVM: X86: Implement call-back to propagate virtual_tsc_khz This patch implements a call-back into the architecture code to allow the propagation of changes to the virtual tsc_khz of the vcpu. On SVM it updates the tsc_ratio variable, on VMX it does nothing. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:05 -04:00
Joerg Roedel	8f6055cbaf	KVM: X86: Make tsc_delta calculation a function of guest tsc The calculation of the tsc_delta value to ensure a forward-going tsc for the guest is a function of the host-tsc. This works as long as the guests tsc_khz is equal to the hosts tsc_khz. With tsc-scaling hardware support this is not longer true and the tsc_delta needs to be calculated using guest_tsc values. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:04 -04:00
Joerg Roedel	1e993611d0	KVM: X86: Let kvm-clock report the right tsc frequency This patch changes the kvm_guest_time_update function to use TSC frequency the guest actually has for updating its clock. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:04 -04:00
Joerg Roedel	fbc0db76b7	KVM: SVM: Implement infrastructure for TSC_RATE_MSR This patch enhances the kvm_amd module with functions to support the TSC_RATE_MSR which can be used to set a given tsc frequency for the guest vcpu. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:04 -04:00
Avi Kivity	bfeed29d6d	KVM: x86 emulator: Drop EFER.SVME requirement from VMMCALL VMMCALL requires EFER.SVME to be enabled in the host, not in the guest, which is what check_svme() checks. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:04 -04:00
Avi Kivity	8b18bc3782	KVM: x86 emulator: Re-add VendorSpecific tag to VMMCALL insn VMMCALL needs the VendorSpecific tag so that #UD emulation (called if a guest running on AMD was migrated to an Intel host) is allowed to process the instruction. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:04 -04:00
Xiao Guangrong	7c5625227f	KVM: MMU: remove mmu_seq verification on pte update path The mmu_seq verification can be removed since we get the pfn in the protection of mmu_lock. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:03 -04:00
Gleb Natapov	a0c0ab2feb	KVM: x86 emulator: do not open code return values from the emulator Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:03 -04:00
Justin P. Mattock	0be839bfb4	KVM: Remove base_addresss in kvm_pit since it is unused The patch below removes unsigned long base_addresss; in i8254.h since it is unused. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:03 -04:00
Joerg Roedel	628afd2aeb	KVM: SVM: Remove nested sel_cr0_write handling code This patch removes all the old code which handled the nested selective cr0 write intercepts. This code was only in place as a work-around until the instruction emulator is capable of doing the same. This is the case with this patch-set and so the code can be removed. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:03 -04:00
Joerg Roedel	f6511935f4	KVM: SVM: Add checks for IO instructions This patch adds code to check for IOIO intercepts on instructions decoded by the KVM instruction emulator. [avi: fix build error due to missing #define D2bvIP] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:03 -04:00
Joerg Roedel	bf608f88fa	KVM: SVM: Add intercept checks for one-byte instructions This patch add intercept checks for emulated one-byte instructions to the KVM instruction emulation path. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:02 -04:00
Joerg Roedel	8061252ee0	KVM: SVM: Add intercept checks for remaining twobyte instructions This patch adds intercepts checks for the remaining twobyte instructions to the KVM instruction emulator. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:02 -04:00
Joerg Roedel	d7eb820306	KVM: SVM: Add intercept checks for remaining group7 instructions This patch implements the emulator intercept checks for the RDTSCP, MONITOR, and MWAIT instructions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:02 -04:00
Joerg Roedel	01de8b09e6	KVM: SVM: Add intercept checks for SVM instructions This patch adds the necessary code changes in the instruction emulator and the extensions to svm.c to implement intercept checks for the svm instructions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:02 -04:00
Joerg Roedel	dee6bb70e4	KVM: SVM: Add intercept checks for descriptor table accesses This patch add intercept checks into the KVM instruction emulator to check for the 8 instructions that access the descriptor table addresses. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:02 -04:00
Joerg Roedel	3b88e41a41	KVM: SVM: Add intercept check for accessing dr registers This patch adds the intercept checks for instruction accessing the debug registers. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:01 -04:00
Joerg Roedel	cfec82cb7d	KVM: SVM: Add intercept check for emulated cr accesses This patch adds all necessary intercept checks for instructions that access the crX registers. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:01 -04:00
Joerg Roedel	8a76d7f25f	KVM: x86: Add x86 callback for intercept check This patch adds a callback into kvm_x86_ops so that svm and vmx code can do intercept checks on emulated instructions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:01 -04:00
Joerg Roedel	8ea7d6aef8	KVM: x86 emulator: Add flag to check for protected mode instructions This patch adds a flag for the opcoded to tag instruction which are only recognized in protected mode. The necessary check is added too. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:01 -04:00
Joerg Roedel	d09beabd7c	KVM: x86 emulator: Add check_perm callback This patch adds a check_perm callback for each opcode into the instruction emulator. This will be used to do all necessary permission checks on instructions before checking whether they are intercepted or not. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:01 -04:00
Joerg Roedel	775fde8648	KVM: x86 emulator: Don't write-back cpu-state on X86EMUL_INTERCEPTED This patch prevents the changed CPU state to be written back when the emulator detected that the instruction was intercepted by the guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:00 -04:00
Avi Kivity	3c6e276f22	KVM: x86 emulator: add SVM intercepts Add intercept codes for instructions defined by SVM as interceptable. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:00 -04:00
Avi Kivity	c4f035c60d	KVM: x86 emulator: add framework for instruction intercepts When running in guest mode, certain instructions can be intercepted by hardware. This also holds for nested guests running on emulated virtualization hardware, in particular instructions emulated by kvm itself. This patch adds a framework for intercepting instructions. If an instruction is marked for interception, and if we're running in guest mode, a callback is called to check whether an intercept is needed or not. The callback is called at three points in time: immediately after beginning execution, after checking privilge exceptions, and after checking memory exception. This suits the different interception points defined for different instructions and for the various virtualization instruction sets. In addition, a new X86EMUL_INTERCEPT is defined, which any callback or memory access may define, allowing the more complicated intercepts to be implemented in existing callbacks. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:57:00 -04:00
Avi Kivity	aa97bb4891	KVM: x86 emulator: implement movdqu instruction (f3 0f 6f, f3 0f 7f) Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:59 -04:00
Avi Kivity	1253791df9	KVM: x86 emulator: SSE support Add support for marking an instruction as SSE, switching registers used to the SSE register file. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:59 -04:00
Avi Kivity	0d7cdee83a	KVM: x86 emulator: Specialize decoding for insns with 66/f2/f3 prefixes Most SIMD instructions use the 66/f2/f3 prefixes to distinguish between different variants of the same instruction. Usually the encoding is quite regular, but in some cases (including non-SIMD instructions) the prefixes generate very different instructions. Examples include XCHG/PAUSE, MOVQ/MOVDQA/MOVDQU, and MOVBE/CRC32. Allow the emulator to handle these special cases by splitting such opcodes into groups, with different decode flags and execution functions for different prefixes. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:59 -04:00
Avi Kivity	5037f6f324	KVM: x86 emulator: define callbacks for using the guest fpu within the emulator Needed for emulating fpu instructions. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:58 -04:00
Avi Kivity	1d6b114f20	KVM: x86 emulator: do not munge rep prefix Currently we store a rep prefix as 1 or 2 depending on whether it is a REPE or REPNE. Since sse instructions depend on the prefix value, store it as the original opcode to simplify things further on. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:58 -04:00
Avi Kivity	cef4dea07f	KVM: 16-byte mmio support Since sse instructions can issue 16-byte mmios, we need to support them. We can't increase the kvm_run mmio buffer size to 16 bytes without breaking compatibility, so instead we break the large mmios into two smaller 8-byte ones. Since the bus is 64-bit we aren't breaking any atomicity guarantees. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:58 -04:00
Avi Kivity	5287f194bf	KVM: Split mmio completion into a function Make room for sse mmio completions. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:58 -04:00
Avi Kivity	70252a1053	KVM: extend in-kernel mmio to handle >8 byte transactions Needed for coalesced mmio using sse. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:58 -04:00
Gleb Natapov	1499e54af0	KVM: x86: better fix for race between nmi injection and enabling nmi window Fix race between nmi injection and enabling nmi window in a simpler way. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-05-11 07:56:57 -04:00
Marcelo Tosatti	c761e5868e	Revert "KVM: Fix race between nmi injection and enabling nmi window" This reverts commit `f86368493e`. Simpler fix to follow. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-05-11 07:56:57 -04:00
Glauber Costa	3291892450	KVM: expose async pf through our standard mechanism As Avi recently mentioned, the new standard mechanism for exposing features is KVM_GET_SUPPORTED_CPUID, not spamming CAPs. For some reason async pf missed that. So expose async_pf here. Signed-off-by: Glauber Costa <glommer@redhat.com> CC: Gleb Natapov <gleb@redhat.com> CC: Avi Kivity <avi@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:57 -04:00
Avi Kivity	654f06fc65	KVM: VMX: simplify NMI mask management Use vmx_set_nmi_mask() instead of open-coding management of the hardware bit and the software hint (nmi_known_unmasked). There's a slight change of behaviour when running without hardware virtual NMI support - we now clear the NMI mask if NMI delivery faulted in that case as well. This improves emulation accuracy. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:57 -04:00
Jan Kiszka	89a9fb78b5	KVM: SVM: Remove unused svm_features We use boot_cpu_has now. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:57 -04:00
Avi Kivity	8878647585	KVM: VMX: Use cached VM_EXIT_INTR_INFO in handle_exception vmx_complete_atomic_exit() cached it for us, so we can use it here. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	c5ca8e572c	KVM: VMX: Don't VMREAD VM_EXIT_INTR_INFO unconditionally Only read it if we're going to use it later. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	00eba012d5	KVM: VMX: Refactor vmx_complete_atomic_exit() Move the exit reason checks to the front of the function, for early exit in the common case. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	f9902069c4	KVM: VMX: Qualify check for host NMI Check for the exit reason first; this allows us, later, to avoid a VMREAD for VM_EXIT_INTR_INFO_FIELD. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	9d58b93192	KVM: VMX: Avoid vmx_recover_nmi_blocking() when unneeded When we haven't injected an interrupt, we don't need to recover the nmi blocking state (since the guest can't set it by itself). This allows us to avoid a VMREAD later on. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:56 -04:00
Avi Kivity	69c7302890	KVM: VMX: Cache cpl We may read the cpl quite often in the same vmexit (instruction privilege check, memory access checks for instruction and operands), so we gain a bit if we cache the value. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:54 -04:00
Avi Kivity	f4c63e5d5a	KVM: VMX: Optimize vmx_get_cpl() In long mode, vm86 mode is disallowed, so we need not check for it. Reading rflags.vm may require a VMREAD, so it is expensive. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:54 -04:00
Avi Kivity	6de12732c4	KVM: VMX: Optimize vmx_get_rflags() If called several times within the same exit, return cached results. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:54 -04:00
Avi Kivity	f6e7847589	KVM: Use kvm_get_rflags() and kvm_set_rflags() instead of the raw versions Some rflags bits are owned by the host, not guest, so we need to use kvm_get_rflags() to strip those bits away or kvm_set_rflags() to add them back. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-05-11 07:56:54 -04:00
Andre Przywara	bd22f5cfcf	KVM: move and fix substitue search for missing CPUID entries If KVM cannot find an exact match for a requested CPUID leaf, the code will try to find the closest match instead of simply confessing it's failure. The implementation was meant to satisfy the CPUID specification, but did not properly check for extended and standard leaves and also didn't account for the index subleaf. Beside that this rule only applies to CPUID intercepts, which is not the only user of the kvm_find_cpuid_entry() function. So fix this algorithm and call it from kvm_emulate_cpuid(). This fixes a crash of newer Linux kernels as KVM guests on AMD Bulldozer CPUs, where bogus values were returned in response to a CPUID intercept. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-04-06 13:15:56 +03:00
Andre Przywara	20800bc940	KVM: fix XSAVE bit scanning When KVM scans the 0xD CPUID leaf for propagating the XSAVE save area leaves, it assumes that the leaves are contigious and stops at the first zero one. On AMD hardware there is a gap, though, as LWP uses leaf 62 to announce it's state save area. So lets iterate through all 64 possible leaves and simply skip zero ones to also cover later features. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-04-06 13:15:55 +03:00
Linus Torvalds	f2e1fbb5f2	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Flush TLB if PGD entry is changed in i386 PAE mode x86, dumpstack: Correct stack dump info when frame pointer is available x86: Clean up csum-copy_64.S a bit x86: Fix common misspellings x86: Fix misspelling and align params x86: Use PentiumPro-optimized partial_csum() on VIA C7	2011-03-18 10:45:21 -07:00
Lucas De Marchi	0d2eb44f63	x86: Fix common misspellings They were generated by 'codespell' and then manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: trivial@kernel.org LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-18 10:39:30 +01:00
Gleb Natapov	776e58ea3d	KVM: unbreak userspace that does not sets tss address Commit 6440e5967bc broke old userspaces that do not set tss address before entering vcpu. Unbreak it by setting tss address to a safe value on the first vcpu entry. New userspaces should set tss address, so print warning in case it doesn't. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:35 -03:00
Xiao Guangrong	0f53b5b1c0	KVM: MMU: cleanup pte write path This patch does: - call vcpu->arch.mmu.update_pte directly - use gfn_to_pfn_atomic in update_pte path The suggestion is from Avi. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:35 -03:00
Xiao Guangrong	5d163b1c9d	KVM: MMU: introduce a common function to get no-dirty-logged slot Cleanup the code of pte_prefetch_gfn_to_memslot and mapping_level_dirty_bitmap Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:34 -03:00
Xiao Guangrong	40dcaa9f69	KVM: fix rcu usage in init_rmode_* functions fix: [ 3494.671786] stack backtrace: [ 3494.671789] Pid: 10527, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23 [ 3494.671790] Call Trace: [ 3494.671796] [] ? lockdep_rcu_dereference+0x9d/0xa5 [ 3494.671826] [] ? kvm_memslots+0x6b/0x73 [kvm] [ 3494.671834] [] ? gfn_to_memslot+0x16/0x4f [kvm] [ 3494.671843] [] ? gfn_to_hva+0x16/0x27 [kvm] [ 3494.671851] [] ? kvm_write_guest_page+0x31/0x83 [kvm] [ 3494.671861] [] ? kvm_clear_guest_page+0x1a/0x1c [kvm] [ 3494.671867] [] ? vmx_set_tss_addr+0x83/0x122 [kvm_intel] and: [ 8328.789599] stack backtrace: [ 8328.789601] Pid: 18736, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23 [ 8328.789603] Call Trace: [ 8328.789609] [] ? lockdep_rcu_dereference+0x9d/0xa5 [ 8328.789621] [] ? kvm_memslots+0x6b/0x73 [kvm] [ 8328.789628] [] ? gfn_to_memslot+0x16/0x4f [kvm] [ 8328.789635] [] ? gfn_to_hva+0x16/0x27 [kvm] [ 8328.789643] [] ? kvm_write_guest_page+0x31/0x83 [kvm] [ 8328.789699] [] ? kvm_clear_guest_page+0x1a/0x1c [kvm] [ 8328.789713] [] ? vmx_create_vcpu+0x316/0x3c8 [kvm_intel] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:34 -03:00
Nikola Ciprich	1aa8ceef03	KVM: fix kvmclock regression due to missing clock update commit 387b9f97750444728962b236987fbe8ee8cc4f8c moved kvm_request_guest_time_update(vcpu), breaking 32bit SMP guests using kvm-clock. Fix this by moving (new) clock update function to proper place. Signed-off-by: Nikola Ciprich <nikola.ciprich@linuxbox.cz> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:33 -03:00
Gleb Natapov	399a40c92d	KVM: emulator: Fix permission checking in io permission bitmap Currently if io port + len crosses 8bit boundary in io permission bitmap the check may allow IO that otherwise should not be allowed. The patch fixes that. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:33 -03:00
Gleb Natapov	5601d05b8c	KVM: emulator: Fix io permission checking for 64bit guest Current implementation truncates upper 32bit of TR base address during IO permission bitmap check. The patch fixes this. Reported-and-tested-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:33 -03:00
Avi Kivity	831ca6093c	KVM: SVM: Load %gs earlier if CONFIG_X86_32_LAZY_GS=n With CONFIG_CC_STACKPROTECTOR, we need a valid %gs at all times, so disable lazy reload and do an eager reload immediately after the vmexit. Reported-by: IVAN ANGELOV <ivangotoy@gmail.com> Acked-By: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:33 -03:00
Takuya Yoshikawa	afc20184b7	KVM: x86: Remove useless regs_page pointer from kvm_lapic Access to this page is mostly done through the regs member which holds the address to this page. The exceptions are in vmx_vcpu_reset() and kvm_free_lapic() and these both can easily be converted to using regs. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:33 -03:00
Xiao Guangrong	676646ee4b	KVM: MMU: remove unused macros These macros are not used, so removed Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:32 -03:00
Xiao Guangrong	842f22ed9b	KVM: MMU: cleanup page alloc and free Using __get_free_page instead of alloc_page and page_address, using free_page instead of __free_page and virt_to_page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:32 -03:00
Xiao Guangrong	49b26e26e4	KVM: MMU: do not record gfn in kvm_mmu_pte_write No need to record the gfn to verifier the pte has the same mode as current vcpu, it's because we only speculatively update the pte only if the pte and vcpu have the same mode Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:32 -03:00
Xiao Guangrong	48c0e4e906	KVM: MMU: move mmu pages calculated out of mmu lock kvm_mmu_calculate_mmu_pages need to walk all memslots and it's protected by kvm->slots_lock, so move it out of mmu spinlock Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:32 -03:00
Xiao Guangrong	1b7fd45c32	KVM: MMU: set spte accessed bit properly Set spte accessed bit only if guest_initiated == 1 that means the really accessed Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:32 -03:00
Xiao Guangrong	da8dc75f0c	KVM: MMU: fix kvm_mmu_slot_remove_write_access dropping intermediate W bits Only remove write access in the last sptes. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:32 -03:00
Lai Jiangshan	1260edbe7d	KVM: better readability of efer_reserved_bits use EFER_SCE, EFER_LME and EFER_LMA instead of magic numbers. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:31 -03:00
Lai Jiangshan	d170c41906	KVM: Clear async page fault hash after switching to real mode The hash array of async gfns may still contain some left gfns after kvm_clear_async_pf_completion_queue() called, need to clear them. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:31 -03:00
Gleb Natapov	93ea5388ea	KVM: VMX: Initialize vm86 TSS only once. Currently vm86 task is initialized on each real mode entry and vcpu reset. Initialization is done by zeroing TSS and updating relevant fields. But since all vcpus are using the same TSS there is a race where one vcpu may use TSS while other vcpu is initializing it, so the vcpu that uses TSS will see wrong TSS content and will behave incorrectly. Fix that by initializing TSS only once. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:31 -03:00
Gleb Natapov	a8ba6c2622	KVM: VMX: update live TR selector if it changes in real mode When rmode.vm86 is active TR descriptor is updated with vm86 task values, but selector is left intact. vmx_set_segment() makes sure that if TR register is written into while vm86 is active the new values are saved for use after vm86 is deactivated, but since selector is not updated on vm86 activation/deactivation new value is lost. Fix this by writing new selector into vmcs immediately. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:31 -03:00
Lai Jiangshan	a3b5ba49a8	KVM: VMX: add the __noclone attribute to vmx_vcpu_run The changelog of `104f226` said "adds the __noclone attribute", but it was missing in its patch. I think it is still needed. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:31 -03:00
Jan Kiszka	038f8c110e	KVM: x86: Convert tsc_write_lock to raw_spinlock Code under this lock requires non-preemptibility. Ensure this also over -rt by converting it to raw spinlock. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:30 -03:00
Gleb Natapov	7049467b53	KVM: remove isr_ack logic from PIC isr_ack logic was added by `e48258009d` to avoid unnecessary IPIs. Back then it made sense, but now the code checks that vcpu is ready to accept interrupt before sending IPI, so this logic is no longer needed. The patch removes it. Fixes a regression with Debian/Hurd. Signed-off-by: Gleb Natapov <gleb@redhat.com> Reported-and-tested-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:30 -03:00
Joseph Cihula	23f3e99132	KVM: VMX: fix detection of BIOS disabling VMX This patch fixes the logic used to detect whether BIOS has disabled VMX, for the case where VMX is enabled only under SMX, but tboot is not active. Signed-off-by: Joseph Cihula <joseph.cihula@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:30 -03:00
Jan Kiszka	e935b8372c	KVM: Convert kvm_lock to raw_spinlock Code under this lock requires non-preemptibility. Ensure this also over -rt by converting it to raw spinlock. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:30 -03:00
Avi Kivity	bd3d1ec3d2	KVM: SVM: check for progress after IRET interception When we enable an NMI window, we ask for an IRET intercept, since the IRET re-enables NMIs. However, the IRET intercept happens before the instruction executes, while the NMI window architecturally opens afterwards. To compensate for this mismatch, we only open the NMI window in the following exit, assuming that the IRET has by then executed; however, this assumption is not always correct; we may exit due to a host interrupt or page fault, without having executed the instruction. Fix by checking for forward progress by recording and comparing the IRET's rip. This is somewhat of a hack, since an unchaging rip does not mean that no forward progress has been made, but is the simplest fix for now. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:30 -03:00
Avi Kivity	f86368493e	KVM: Fix race between nmi injection and enabling nmi window The interrupt injection logic looks something like if an nmi is pending, and nmi injection allowed inject nmi if an nmi is pending request exit on nmi window the problem is that "nmi is pending" can be set asynchronously by the PIT; if it happens to fire between the two if statements, we will request an nmi window even though nmi injection is allowed. On SVM, this has disasterous results, since it causes eflags.TF to be set in random guest code. The fix is simple; make nmi_pending synchronous using the standard vcpu->requests mechanism; this ensures the code above is completely synchronous wrt nmi_pending. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:30 -03:00
Avi Kivity	4005996e42	KVM: Drop ad-hoc vendor specific instruction restriction Use the new support in the emulator, and drop the ad-hoc code in x86.c. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:28 -03:00
Avi Kivity	d867162c6d	KVM: x86 emulator: vendor specific instructions Mark some instructions as vendor specific, and allow the caller to request emulation only of vendor specific instructions. This is useful in some circumstances (responding to a #UD fault). Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:28 -03:00
Avi Kivity	3e90943907	KVM: Drop bogus x86_decode_insn() error check x86_decode_insn() doesn't return X86EMUL_* values, so the check for X86EMUL_PROPOGATE_FAULT will always fail. There is a proper check later on, so there is no need for a replacement for this code. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:28 -03:00
Jan Kiszka	0bb8865979	KVM: x86: Drop obsolete warning about INIT on runnable VCPU This warning was once used for debugging QEMU user space. Though uncommon, it is actually possible to send an INIT request to a running VCPU. So better drop this warning before someone misuses it to flood kernel logs this way. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:28 -03:00
Glauber Costa	12f9a48f7b	KVM: x86: release kvmclock page on reset When a vcpu is reset, kvmclock page keeps being written to this days. This is wrong and inconsistent: a cpu reset should take it to its initial state. Signed-off-by: Glauber Costa <glommer@redhat.com> CC: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:28 -03:00
john cooper	91c9c3eda4	KVM: x86: handle guest access to BBL_CR_CTL3 MSR A correction to Intel cpu model CPUID data (patch queued) caused winxp to BSOD when booted with a Penryn model. This was traced to the CPUID "model" field correction from 6 -> 23 (as is proper for a Penryn class of cpu). Only in this case does the problem surface. The cause for this failure is winxp accessing the BBL_CR_CTL3 MSR which is unsupported by current kvm, appears to be a legacy MSR not fully characterized yet existing in current silicon, and is apparently carried forward in MSR space to accommodate vintage code as here. It is not yet conclusive whether this MSR implements any of its legacy functionality or is just an ornamental dud for compatibility. While I found no silicon version specific documentation link to this MSR, a general description exists in Intel's developer's reference which agrees with the functional behavior of other bootloader/kernel code I've examined accessing BBL_CR_CTL3. Regrettably winxp appears to be setting bit #19 called out as "reserved" in the above document. So to minimally accommodate this MSR, kvm msr get will provide the equivalent mock data and kvm msr write will simply toss the guest passed data without interpretation. While this treatment of BBL_CR_CTL3 addresses the immediate problem, the approach may be modified pending clarification from Intel. Signed-off-by: john cooper <john.cooper@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:27 -03:00
Xiao Guangrong	6b7e2d0991	KVM: Add "exiting guest mode" state Currently we keep track of only two states: guest mode and host mode. This patch adds an "exiting guest mode" state that tells us that an IPI will happen soon, so unless we need to wait for the IPI, we can avoid it completely. Also 1: No need atomically to read/write ->mode in vcpu's thread 2: reorganize struct kvm_vcpu to make ->mode and ->requests in the same cache line explicitly Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:26 -03:00
Jan Kiszka	9ca5231831	KVM: x86: Remove user space triggerable MCE error message This case is a pure user space error we do not need to record. Moreover, it can be misused to flood the kernel log. Remove it. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:26 -03:00
Xiao Guangrong	63f42e023e	KVM: fix rcu usage warning in kvm_arch_vcpu_ioctl_set_sregs() Fix: [ 1001.499596] =================================================== [ 1001.499599] [ INFO: suspicious rcu_dereference_check() usage. ] [ 1001.499601] --------------------------------------------------- [ 1001.499604] include/linux/kvm_host.h:301 invoked rcu_dereference_check() without protection! ...... [ 1001.499636] Pid: 6035, comm: qemu-system-x86 Not tainted 2.6.37-rc6+ #62 [ 1001.499638] Call Trace: [ 1001.499644] [] lockdep_rcu_dereference+0x9d/0xa5 [ 1001.499653] [] gfn_to_memslot+0x8d/0xc8 [kvm] [ 1001.499661] [] gfn_to_hva+0x16/0x3f [kvm] [ 1001.499669] [] kvm_read_guest_page+0x1e/0x5e [kvm] [ 1001.499681] [] kvm_read_guest_page_mmu+0x53/0x5e [kvm] [ 1001.499699] [] load_pdptrs+0x3f/0x9c [kvm] [ 1001.499705] [] ? vmx_set_cr0+0x507/0x517 [kvm_intel] [ 1001.499717] [] kvm_arch_vcpu_ioctl_set_sregs+0x1f3/0x3c0 [kvm] [ 1001.499727] [] kvm_vcpu_ioctl+0x6a5/0xbc5 [kvm] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:26 -03:00
Avi Kivity	40712faeb8	KVM: VMX: Avoid atomic operation in vmx_vcpu_run Instead of exchanging the guest and host rcx, have separate storage for each. This allows us to avoid using the xchg instruction, which is is a little slower than normal operations. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:26 -03:00
Avi Kivity	1c696d0e1b	KVM: VMX: Simplify saving guest rcx in vmx_vcpu_run Change push top-of-stack pop guest-rcx pop dummy to pop guest-rcx which is the same thing, only simpler. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:25 -03:00
Rik van Riel	00c25bce02	KVM: VMX: increase ple_gap default to 128 On some CPUs, a ple_gap of 41 is simply insufficient to ever trigger PLE exits, even with the minimalistic PLE test from kvm-unit-tests. http://git.kernel.org/?p=virt/kvm/kvm-unit-tests.git;a=commitdiff;h=eda71b28fa122203e316483b35f37aaacd42f545 For example, the Xeon X5670 CPU needs a ple_gap of at least 48 in order to get pause loop exits: # modprobe kvm_intel ple_gap=47 # taskset 1 /usr/local/bin/qemu-system-x86_64 \ -device testdev,chardev=log -chardev stdio,id=log \ -kernel x86/vmexit.flat -append ple-round-robin -smp 2 VNC server running on `::1:5900' enabling apic enabling apic ple-round-robin 58298446 # rmmod kvm_intel # modprobe kvm_intel ple_gap=48 # taskset 1 /usr/local/bin/qemu-system-x86_64 \ -device testdev,chardev=log -chardev stdio,id=log \ -kernel x86/vmexit.flat -append ple-round-robin -smp 2 VNC server running on `::1:5900' enabling apic enabling apic ple-round-robin 36616 Increase the ple_gap to 128 to be on the safe side. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Zhai, Edwin <edwin.zhai@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:25 -03:00
Joerg Roedel	3781c01c15	KVM: SVM: Add support for perf-kvm This patch adds the necessary code to run perf-kvm on AMD machines. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-03-17 13:08:25 -03:00
Avi Kivity	a917949935	KVM: VMX: Avoid leaking fake realmode state to userspace When emulating real mode, we fake some state: - tr.base points to a fake vm86 tss - segment registers are made to conform to vm86 restrictions change vmx_get_segment() not to expose this fake state to userspace; instead, return the original state. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:25 -03:00
Avi Kivity	d0ba64f9b4	KVM: VMX: Save and restore tr selector across mode switches When emulating real mode we play with tr hidden state, but leave tr.selector alone. That works well, except for save/restore, since loading TR writes it to the hidden state in vmx->rmode. Fix by also saving and restoring the tr selector; this makes things more consistent and allows migration to work during the early boot stages of Windows XP. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:25 -03:00
Avi Kivity	8234b22e1c	KVM: MMU: Don't flush shadow when enabling dirty tracking Instead, drop large mappings, which were the reason we dropped shadow. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-03-17 13:08:24 -03:00
David Sharp	d5bf2ff072	tracing: Fix event alignment: kvm:kvm_hv_hypercall Acked-by: Avi Kivity <avi@redhat.com> Signed-off-by: David Sharp <dhsharp@google.com> LKML-Reference: <1291421609-14665-8-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-03-10 10:34:24 -05:00
Joerg Roedel	2c46d2aec0	KVM: SVM: Advance instruction pointer in dr_intercept In the dr_intercept function a new cpu-feature called decode-assists is implemented and used when available. This code-path does not advance the guest-rip causing the guest to dead-loop over mov-dr instructions. This is fixed by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-02-22 16:01:44 +02:00
Joerg Roedel	893a5ab6ee	KVM: SVM: Make sure KERNEL_GS_BASE is valid when loading gs_index The gs_index loading code uses the swapgs instruction to switch to the user gs_base temporarily. This is unsave in an lightweight exit-path in KVM on AMD because the KERNEL_GS_BASE MSR is switches lazily. An NMI happening in the critical path of load_gs_index may use the wrong GS_BASE value then leading to unpredictable behavior, e.g. a triple-fault. This patch fixes the issue by making sure that load_gs_index is called only with a valid KERNEL_GS_BASE value loaded in KVM. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-02-09 18:31:36 +02:00
Andrea Arcangeli	8ee53820ed	thp: mmu_notifier_test_young For GRU and EPT, we need gup-fast to set referenced bit too (this is why it's correct to return 0 when shadow_access_mask is zero, it requires gup-fast to set the referenced bit). qemu-kvm access already sets the young bit in the pte if it isn't zero-copy, if it's zero copy or a shadow paging EPT minor fault we relay on gup-fast to signal the page is in use... We also need to check the young bits on the secondary pagetables for NPT and not nested shadow mmu as the data may never get accessed again by the primary pte. Without this closer accuracy, we'd have to remove the heuristic that avoids collapsing hugepages in hugepage virtual regions that have not even a single subpage in use. ->test_young is full backwards compatible with GRU and other usages that don't have young bits in pagetables set by the hardware and that should nuke the secondary mmu mappings when ->clear_flush_young runs just like EPT does. Removing the heuristic that checks the young bit in khugepaged/collapse_huge_page completely isn't so bad either probably but I thought it was worth it and this makes it reliable. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:46 -08:00
Andrea Arcangeli	936a5fe6e6	thp: kvm mmu transparent hugepage support This should work for both hugetlbfs and transparent hugepages. [akpm@linux-foundation.org: bring forward PageTransCompound() addition for bisectability] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:41 -08:00
Linus Torvalds	55065bc527	Merge branch 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits) KVM: Initialize fpu state in preemptible context KVM: VMX: when entering real mode align segment base to 16 bytes KVM: MMU: handle 'map_writable' in set_spte() function KVM: MMU: audit: allow audit more guests at the same time KVM: Fetch guest cr3 from hardware on demand KVM: Replace reads of vcpu->arch.cr3 by an accessor KVM: MMU: only write protect mappings at pagetable level KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear() KVM: MMU: Initialize base_role for tdp mmus KVM: VMX: Optimize atomic EFER load KVM: VMX: Add definitions for more vm entry/exit control bits KVM: SVM: copy instruction bytes from VMCB KVM: SVM: implement enhanced INVLPG intercept KVM: SVM: enhance mov DR intercept handler KVM: SVM: enhance MOV CR intercept handler KVM: SVM: add new SVM feature bit names KVM: cleanup emulate_instruction KVM: move complete_insn_gp() into x86.c KVM: x86: fix CR8 handling KVM guest: Fix kvm clock initialization when it's configured out ...	2011-01-13 10:14:24 -08:00
Avi Kivity	e5c3014282	KVM: Initialize fpu state in preemptible context init_fpu() (which is indirectly called by the fpu switching code) assumes it is in process context. Rather than makeing init_fpu() use an atomic allocation, which can cause a task to be killed, make sure the fpu is already initialized when we enter the run loop. KVM-Stable-Tag. Reported-and-tested-by: Kirill A. Shutemov <kas@openvz.org> Acked-by: Pekka Enberg <penberg@kernel.org> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 12:02:26 +02:00
Gleb Natapov	444e863d13	KVM: VMX: when entering real mode align segment base to 16 bytes VMX checks that base is equal segment shifted 4 bits left. Otherwise guest entry fails. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:31:20 +02:00
Xiao Guangrong	f8e453b00c	KVM: MMU: handle 'map_writable' in set_spte() function Move the operation of 'writable' to set_spte() to clean up code [avi: remove unneeded booleanification] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:31:19 +02:00
Xiao Guangrong	b034cf0105	KVM: MMU: audit: allow audit more guests at the same time It only allows to audit one guest in the system since: - 'audit_point' is a glob variable - mmu_audit_disable() is called in kvm_mmu_destroy(), so audit is disabled after a guest exited this patch fix those issues then allow to audit more guests at the same time Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:31:17 +02:00
Avi Kivity	aff48baa34	KVM: Fetch guest cr3 from hardware on demand Instead of syncing the guest cr3 every exit, which is expensince on vmx with ept enabled, sync it only on demand. [sheng: fix incorrect cr3 seen by Windows XP] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:31:16 +02:00
Avi Kivity	9f8fe5043f	KVM: Replace reads of vcpu->arch.cr3 by an accessor This allows us to keep cr3 in the VMCS, later on. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:31:15 +02:00
Marcelo Tosatti	e49146dce8	KVM: MMU: only write protect mappings at pagetable level If a pagetable contains a writeable large spte, all of its sptes will be write protected, including non-leaf ones, leading to endless pagefaults. Do not write protect pages above PT_PAGE_TABLE_LEVEL, as the spte fault paths assume non-leaf sptes are writable. Tested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:13 +02:00
Avi Kivity	16d8f72f70	KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear() 'error' is byte sized, so use a byte register constraint. Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:12 +02:00
Avi Kivity	c445f8ef43	KVM: MMU: Initialize base_role for tdp mmus Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:11 +02:00
Avi Kivity	110312c84b	KVM: VMX: Optimize atomic EFER load When NX is enabled on the host but not on the guest, we use the entry/exit msr load facility, which is slow. Optimize it to use entry/exit efer load, which is ~1200 cycles faster. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:09 +02:00
Andre Przywara	dc25e89e07	KVM: SVM: copy instruction bytes from VMCB In case of a nested page fault or an intercepted #PF newer SVM implementations provide a copy of the faulting instruction bytes in the VMCB. Use these bytes to feed the instruction emulator and avoid the costly guest instruction fetch in this case. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:07 +02:00
Andre Przywara	df4f310856	KVM: SVM: implement enhanced INVLPG intercept When the DecodeAssist feature is available, the linear address is provided in the VMCB on INVLPG intercepts. Use it directly to avoid any decoding and emulation. This is only useful for shadow paging, though. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:05 +02:00
Andre Przywara	cae3797a46	KVM: SVM: enhance mov DR intercept handler Newer SVM implementations provide the GPR number in the VMCB, so that the emulation path is no longer necesarry to handle debug register access intercepts. Implement the handling in svm.c and use it when the info is provided. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:04 +02:00
Andre Przywara	7ff76d58a9	KVM: SVM: enhance MOV CR intercept handler Newer SVM implementations provide the GPR number in the VMCB, so that the emulation path is no longer necesarry to handle CR register access intercepts. Implement the handling in svm.c and use it when the info is provided. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:03 +02:00
Andre Przywara	ddce97aac5	KVM: SVM: add new SVM feature bit names the recent APM Vol.2 and the recent AMD CPUID specification describe new CPUID features bits for SVM. Name them here for later usage. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:02 +02:00
Andre Przywara	51d8b66199	KVM: cleanup emulate_instruction emulate_instruction had many callers, but only one used all parameters. One parameter was unused, another one is now hidden by a wrapper function (required for a future addition anyway), so most callers use now a shorter parameter list. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:31:00 +02:00
Andre Przywara	db8fcefaa7	KVM: move complete_insn_gp() into x86.c move the complete_insn_gp() helper function out of the VMX part into the generic x86 part to make it usable by SVM. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:30:59 +02:00
Andre Przywara	eea1cff9ab	KVM: x86: fix CR8 handling The handling of CR8 writes in KVM is currently somewhat cumbersome. This patch makes it look like the other CR register handlers and fixes a possible issue in VMX, where the RIP would be incremented despite an injected #GP. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:30:58 +02:00
Takuya Yoshikawa	175504cdbf	KVM: Take missing slots_lock for kvm_io_bus_unregister_dev() In KVM_CREATE_IRQCHIP, kvm_io_bus_unregister_dev() is called without taking slots_lock in the error handling path. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:30:55 +02:00
Lai Jiangshan	a355c85c5f	KVM: return true when user space query KVM_CAP_USER_NMI extension userspace may check this extension in runtime. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:30:54 +02:00
Avi Kivity	61cfab2e83	KVM: Correct kvm_pio tracepoint count field Currently, we record '1' for count regardless of the real count. Fix. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:30:52 +02:00
Avi Kivity	d3c422bd33	KVM: MMU: Fix incorrect direct page write protection due to ro host page If KVM sees a read-only host page, it will map it as read-only to prevent breaking a COW. However, if the page was part of a large guest page, KVM incorrectly extends the write protection to the entire large page frame instead of limiting it to the normal host page. This results in the instantiation of a new shadow page with read-only access. If this happens for a MOVS instruction that moves memory between two normal pages, within a single large page frame, and mapped within the guest as a large page, and if, in addition, the source operand is not writeable in the host (perhaps due to KSM), then KVM will instantiate a read-only direct shadow page, instantiate an spte for the source operand, then instantiate a new read/write direct shadow page and instantiate an spte for the destination operand. Since these two sptes are in different shadow pages, MOVS will never see them at the same time and the guest will not make progress. Fix by mapping the direct shadow page read/write, and only marking the host page read-only. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:30:51 +02:00

... 3 4 5 6 7 ...

2114 Commits