Commit Graph

80604 Commits

Author SHA1 Message Date
Carsten Otte 043405e100 KVM: Move x86 msr handling to new files x86.[ch]
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:51 +02:00
Izik Eidus 6fc138d227 KVM: Support assigning userspace memory to the guest
Instead of having the kernel allocate memory to the guest, let userspace
allocate it and pass the address to the kernel.

This is required for s390 support, but also enables features like memory
sharing and using hugetlbfs backed memory.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:51 +02:00
Mike Day d77c26fce9 KVM: CodingStyle cleanup
Signed-off-by: Mike D. Day <ncmike@ncultra.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:50 +02:00
Rusty Russell 7e620d16b8 KVM: Remove gratuitous casts from lapic.c
Since vcpu->apic is of the correct type, there's not need to cast.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:50 +02:00
Rusty Russell 76fafa5e22 KVM: Hoist kvm_create_lapic() into kvm_vcpu_init()
Move kvm_create_lapic() into kvm_vcpu_init(), rather than having svm
and vmx do it.  And make it return the error rather than a fairly
random -ENOMEM.

This also solves the problem that neither svm.c nor vmx.c actually
handles the error path properly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:50 +02:00
Rusty Russell d589444e92 KVM: Add kvm_free_lapic() to pair with kvm_create_lapic()
Instead of the asymetry of kvm_free_apic, implement kvm_free_lapic().
And guess what?  I found a minor bug: we don't need to hrtimer_cancel()
from kvm_main.c, because we do that in kvm_free_apic().

Also:
1) kvm_vcpu_uninit should be the reverse order from kvm_vcpu_init.
2) Don't set apic->regs_page to zero before freeing apic.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:50 +02:00
Izik Eidus 82ce2c9683 KVM: Allow dynamic allocation of the mmu shadow cache size
The user is now able to set how many mmu pages will be allocated to the guest.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:50 +02:00
Izik Eidus 195aefde9c KVM: Add general accessors to read and write guest memory
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:50 +02:00
Izik Eidus 290fc38da8 KVM: Remove the usage of page->private field by rmap
When kvm uses user-allocated pages in the future for the guest, we won't
be able to use page->private for rmap, since page->rmap is reserved for
the filesystem.  So we move the rmap base pointers to the memory slot.

A side effect of this is that we need to store the gfn of each gpte in
the shadow pages, since the memory slot is addressed by gfn, instead of
hfn like struct page.

Signed-off-by: Izik Eidus <izik@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:50 +02:00
Avi Kivity f566e09fc2 KVM: VMX: Simplify vcpu_clear()
Now that smp_call_function_single() knows how to call a function on the
current cpu, there's no need to check explicitly.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:49 +02:00
Avi Kivity eae5ecb5b9 KVM: VMX: Don't clear the vmcs if the vcpu is not loaded on any processor
Noted by Eddie Dong.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:49 +02:00
Laurent Vivier b4c6abfef4 KVM: x86 emulator: Any legacy prefix after a REX prefix nullifies its effect
This patch modifies the management of REX prefix according behavior
I saw in Xen 3.1.  In Xen, this modification has been introduced by
Jan Beulich.

http://lists.xensource.com/archives/html/xen-changelog/2007-01/msg00081.html

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:49 +02:00
Laurent Vivier a22436b7b8 KVM: Purify x86_decode_insn() error case management
The only valid case is on protected page access, other cases are errors.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:49 +02:00
Qing He e4f8e03956 KVM: x86_emulator: no writeback for bt
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:49 +02:00
Laurent Vivier a01af5ec51 KVM: x86 emulator: Remove no_wb, use dst.type = OP_NONE instead
Remove no_wb, use dst.type = OP_NONE instead, idea stollen from xen-3.1

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:49 +02:00
Laurent Vivier 05f086f87e KVM: x86 emulator: remove _eflags and use directly ctxt->eflags.
Remove _eflags and use directly ctxt->eflags. Caching eflags is not needed as
it is restored to vcpu by kvm_main.c:emulate_instruction() from ctxt->eflags
only if emulation doesn't fail.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:48 +02:00
Laurent Vivier 8cdbd2c9bf KVM: x86 emulator: split some decoding into functions for readability
To improve readability, move push, writeback, and grp 1a/2/3/4/5/9 emulation
parts into functions.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:48 +02:00
Ryan Harper 217648638c KVM: MMU: Ignore reserved bits in cr3 in non-pae mode
This patch removes the fault injected when the guest attempts to set reserved
bits in cr3.  X86 hardware doesn't generate a fault when setting reserved bits.
The result of this patch is that vmware-server, running within a kvm guest,
boots and runs memtest from an iso.

Signed-off-by: Ryan Harper <ryanh@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:48 +02:00
Avi Kivity 12b7d28fc1 KVM: MMU: Make flooding detection work when guest page faults are bypassed
When we allow guest page faults to reach the guests directly, we lose
the fault tracking which allows us to detect demand paging.  So we provide
an alternate mechnism by clearing the accessed bit when we set a pte, and
checking it later to see if the guest actually used it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:48 +02:00
Avi Kivity c7addb9020 KVM: Allow not-present guest page faults to bypass kvm
There are two classes of page faults trapped by kvm:
 - host page faults, where the fault is needed to allow kvm to install
   the shadow pte or update the guest accessed and dirty bits
 - guest page faults, where the guest has faulted and kvm simply injects
   the fault back into the guest to handle

The second class, guest page faults, is pure overhead.  We can eliminate
some of it on vmx using the following evil trick:
 - when we set up a shadow page table entry, if the corresponding guest pte
   is not present, set up the shadow pte as not present
 - if the guest pte _is_ present, mark the shadow pte as present but also
   set one of the reserved bits in the shadow pte
 - tell the vmx hardware not to trap faults which have the present bit clear

With this, normal page-not-present faults go directly to the guest,
bypassing kvm entirely.

Unfortunately, this trick only works on Intel hardware, as AMD lacks a
way to discriminate among page faults based on error code.  It is also
a little risky since it uses reserved bits which might become unreserved
in the future, so a module parameter is provided to disable it.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:48 +02:00
Avi Kivity 51c6cf662b KVM: VMX: Further reduce efer reloads
KVM avoids reloading the efer msr when the difference between the guest
and host values consist of the long mode bits (which are switched by
hardware) and the NX bit (which is emulated by the KVM MMU).

This patch also allows KVM to ignore SCE (syscall enable) when the guest
is running in 32-bit mode.  This is because the syscall instruction is
not available in 32-bit mode on Intel processors, so the SCE bit is
effectively meaningless.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:47 +02:00
Laurent Vivier 3427318fd2 KVM: Call x86_decode_insn() only when needed
Move emulate_ctxt to kvm_vcpu to keep emulate context when we exit from kvm
module. Call x86_decode_insn() only when needed. Modify x86_emulate_insn() to
not modify the context if it must be re-entered.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:47 +02:00
Laurent Vivier 1be3aa4718 KVM: emulate_instruction() calls now x86_decode_insn() and x86_emulate_insn()
emulate_instruction() calls now x86_decode_insn() and x86_emulate_insn().
x86_emulate_insn() is x86_emulate_memop() without the decoding part.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:47 +02:00
Laurent Vivier 8b4caf6650 KVM: x86 emulator: move all decoding process to function x86_decode_insn()
Split the decoding process into a new function x86_decode_insn().

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:47 +02:00
Laurent Vivier e4e03deda8 KVM: x86 emulator: move all x86_emulate_memop() to a structure
Move all x86_emulate_memop() common variables between decode and execute to a
structure decode_cache.  This will help in later separating decode and
emulate.

            struct decode_cache {
                u8 twobyte;
                u8 b;
                u8 lock_prefix;
                u8 rep_prefix;
                u8 op_bytes;
                u8 ad_bytes;
                struct operand src;
                struct operand dst;
                unsigned long *override_base;
                unsigned int d;
                unsigned long regs[NR_VCPU_REGS];
                unsigned long eip;
                /* modrm */
                u8 modrm;
                u8 modrm_mod;
                u8 modrm_reg;
                u8 modrm_rm;
                u8 use_modrm_ea;
                unsigned long modrm_ea;
                unsigned long modrm_val;
           };

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-01-30 17:52:47 +02:00