kernel

mirror of https://github.com/ukui/kernel.git synced 2026-03-09 10:07:04 -07:00

Author	SHA1	Message	Date
Jan Kiszka	d2be1651b7	KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP This marks the guest single-step API improvement of `94fe45da` and `91586a3b` with a capability flag to allow reliable detection by user space. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: stable@kernel.org (2.6.33) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:36:14 -03:00
Zhai, Edwin	ab9f4ecbb6	KVM: enable PCI multiple-segments for pass-through device Enable optional parameter (default 0) - PCI segment (or domain) besides BDF, when assigning PCI device to guest. Signed-off-by: Zhai Edwin <edwin.zhai@intel.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-03-01 12:36:06 -03:00
Gleb Natapov	c25bc1638a	KVM: Implement NotifyLongSpinWait HYPER-V hypercall Windows issues this hypercall after guest was spinning on a spinlock for too many iterations. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:36:00 -03:00
Gleb Natapov	10388a0716	KVM: Add HYPER-V apic access MSRs Implement HYPER-V apic MSRs. Spec defines three MSRs that speed-up access to EOI/TPR/ICR apic registers for PV guests. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:36:00 -03:00
Gleb Natapov	55cd8e5a4e	KVM: Implement bare minimum of HYPER-V MSRs Minimum HYPER-V implementation should have GUEST_OS_ID, HYPERCALL and VP_INDEX MSRs. [avi: fix build on i386] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-03-01 12:35:57 -03:00
Marcelo Tosatti	bc6678a33d	KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update Use two steps for memslot deletion: mark the slot invalid (which stops instantiation of new shadow pages for that slot, but allows destruction), then instantiate the new empty slot. Also simplifies kvm_handle_hva locking. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-03-01 12:35:44 -03:00
Benjamin Herrenschmidt	bcd6acd51f	Merge commit 'origin/master' into next Conflicts: include/linux/kvm.h	2009-12-09 17:14:38 +11:00
Alexander Graf	e15a113700	powerpc/kvm: Sync guest visible MMU state Currently userspace has no chance to find out which virtual address space we're in and resolve addresses. While that is a big problem for migration, it's also unpleasent when debugging, as gdb and the monitor don't work on virtual addresses. This patch exports enough of the MMU segment state to userspace to make debugging work and thus also includes the groundwork for migration. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2009-12-08 16:02:50 +11:00
Carsten Otte	d7b0b5eb30	KVM: s390: Make psw available on all exits, not just a subset This patch moves s390 processor status word into the base kvm_run struct and keeps it up-to date on all userspace exits. The userspace ABI is broken by this, however there are no applications in the wild using this. A capability check is provided so users can verify the updated API exists. Cc: stable@kernel.org Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:25 +02:00
Jan Kiszka	3cfc3092f4	KVM: x86: Add KVM_GET/SET_VCPU_EVENTS This new IOCTL exports all yet user-invisible states related to exceptions, interrupts, and NMIs. Together with appropriate user space changes, this fixes sporadic problems of vmsave/restore, live migration and system reset. [avi: future-proof abi by adding a flags field] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:25 +02:00
Avi Kivity	65ac726404	KVM: VMX: Report unexpected simultaneous exceptions as internal errors These happen when we trap an exception when another exception is being delivered; we only expect these with MCEs and page faults. If something unexpected happens, things probably went south and we're better off reporting an internal error and freezing. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:24 +02:00
Avi Kivity	a9c7399d6c	KVM: Allow internal errors reported to userspace to carry extra data Usually userspace will freeze the guest so we can inspect it, but some internal state is not available. Add extra data to internal error reporting so we can expose it to the debugger. Extra data is specific to the suberror. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:24 +02:00
Jan Kiszka	c54d2aba27	KVM: Reorder IOCTLs in main kvm.h Obviously, people tend to extend this header at the bottom - more or less blindly. Ensure that deprecated stuff gets its own corner again by moving things to the top. Also add some comments and reindent IOCTLs to make them more readable and reduce the risk of number collisions. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:24 +02:00
Glauber Costa	afbcf7ab8d	KVM: allow userspace to adjust kvmclock offset When we migrate a kvm guest that uses pvclock between two hosts, we may suffer a large skew. This is because there can be significant differences between the monotonic clock of the hosts involved. When a new host with a much larger monotonic time starts running the guest, the view of time will be significantly impacted. Situation is much worse when we do the opposite, and migrate to a host with a smaller monotonic clock. This proposed ioctl will allow userspace to inform us what is the monotonic clock value in the source host, so we can keep the time skew short, and more importantly, never goes backwards. Userspace may also need to trigger the current data, since from the first migration onwards, it won't be reflected by a simple call to clock_gettime() anymore. [marcelo: future-proof abi with a flags field] [jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it] Signed-off-by: Glauber Costa <glommer@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:19 +02:00
Ed Swierk	ffde22ac53	KVM: Xen PV-on-HVM guest support Support for Xen PV-on-HVM guests can be implemented almost entirely in userspace, except for handling one annoying MSR that maps a Xen hypercall blob into guest address space. A generic mechanism to delegate MSR writes to userspace seems overkill and risks encouraging similar MSR abuse in the future. Thus this patch adds special support for the Xen HVM MSR. I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell KVM which MSR the guest will write to, as well as the starting address and size of the hypercall blobs (one each for 32-bit and 64-bit) that userspace has loaded from files. When the guest writes to the MSR, KVM copies one page of the blob from userspace to the guest. I've tested this patch with a hacked-up version of Gerd's userspace code, booting a number of guests (CentOS 5.3 i386 and x86_64, and FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices. [jan: fix i386 build warning] [avi: future proof abi with a flags field] Signed-off-by: Ed Swierk <eswierk@aristanetworks.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-12-03 09:32:18 +02:00
Sheng Yang	b927a3cec0	KVM: VMX: Introduce KVM_SET_IDENTITY_MAP_ADDR ioctl Now KVM allow guest to modify guest's physical address of EPT's identity mapping page. (change from v1, discard unnecessary check, change ioctl to accept parameter address rather than value) Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2009-09-10 08:33:16 +03:00
Gregory Haskins	d34e6b175e	KVM: add ioeventfd support ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd signal when written to by a guest. Host userspace can register any arbitrary IO address with a corresponding eventfd and then pass the eventfd to a specific end-point of interest for handling. Normal IO requires a blocking round-trip since the operation may cause side-effects in the emulated model or may return data to the caller. Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM "heavy-weight" exit back to userspace, and is ultimately serviced by qemu's device model synchronously before returning control back to the vcpu. However, there is a subclass of IO which acts purely as a trigger for other IO (such as to kick off an out-of-band DMA request, etc). For these patterns, the synchronous call is particularly expensive since we really only want to simply get our notification transmitted asychronously and return as quickly as possible. All the sychronous infrastructure to ensure proper data-dependencies are met in the normal IO case are just unecessary overhead for signalling. This adds additional computational load on the system, as well as latency to the signalling path. Therefore, we provide a mechanism for registration of an in-kernel trigger point that allows the VCPU to only require a very brief, lightweight exit just long enough to signal an eventfd. This also means that any clients compatible with the eventfd interface (which includes userspace and kernelspace equally well) can now register to be notified. The end result should be a more flexible and higher performance notification API for the backend KVM hypervisor and perhipheral components. To test this theory, we built a test-harness called "doorbell". This module has a function called "doorbell_ring()" which simply increments a counter for each time the doorbell is signaled. It supports signalling from either an eventfd, or an ioctl(). We then wired up two paths to the doorbell: One via QEMU via a registered io region and through the doorbell ioctl(). The other is direct via ioeventfd. You can download this test harness here: ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2 The measured results are as follows: qemu-mmio: 110000 iops, 9.09us rtt ioeventfd-mmio: 200100 iops, 5.00us rtt ioeventfd-pio: 367300 iops, 2.72us rtt I didn't measure qemu-pio, because I have to figure out how to register a PIO region with qemu's device model, and I got lazy. However, for now we can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO, and -350ns for HC, we get: qemu-pio: 153139 iops, 6.53us rtt ioeventfd-hc: 412585 iops, 2.37us rtt these are just for fun, for now, until I can gather more data. Here is a graph for your convenience: http://developer.novell.com/wiki/images/7/76/Iofd-chart.png The conclusion to draw is that we save about 4us by skipping the userspace hop. -------------------- Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:12 +03:00
Beth Kon	e9f4275732	KVM: PIT support for HPET legacy mode When kvm is in hpet_legacy_mode, the hpet is providing the timer interrupt and the pit should not be. So in legacy mode, the pit timer is destroyed, but the state of the pit is maintained. So if kvm or the guest tries to modify the state of the pit, this modification is accepted, except that the timer isn't actually started. When we exit hpet_legacy_mode, the current state of the pit (which is up to date since we've been accepting modifications) is used to restart the pit timer. The saved_mode code in kvm_pit_load_count temporarily changes mode to 0xff in order to destroy the timer, but then restores the actual value, again maintaining "current" state of the pit for possible later reenablement. [avi: add some reserved storage in the ioctl; make SET_PIT2 IOW] [marcelo: fix memory corruption due to reserved storage] Signed-off-by: Beth Kon <eak@us.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:12 +03:00
Marcelo Tosatti	2023a29cbe	KVM: remove old KVMTRACE support code Return EOPNOTSUPP for KVM_TRACE_ENABLE/PAUSE/DISABLE ioctls. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:03 +03:00
Avi Kivity	3f5d18a965	KVM: Return to userspace on emulation failure Instead of mindlessly retrying to execute the instruction, report the failure to userspace. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:52 +03:00
Gleb Natapov	73880c80aa	KVM: Break dependency between vcpu index in vcpus array and vcpu_id. Archs are free to use vcpu_id as they see fit. For x86 it is used as vcpu's apic id. New ioctl is added to configure boot vcpu id that was assumed to be 0 till now. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:52 +03:00
Avi Kivity	6a4a983973	KVM: Reorder ioctls in kvm.h Somehow the VM ioctls got unsorted; resort. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:50 +03:00
Sheng Yang	e733339140	KVM: Downsize max support MSI-X entry to 256 We only trap one page for MSI-X entry now, so it's 4k/(128/8) = 256 entries at most. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:43 +03:00
Jan Kiszka	c5ff41ce66	KVM: Allow PIT emulation without speaker port The in-kernel speaker emulation is only a dummy and also unneeded from the performance point of view. Rather, it takes user space support to generate sound output on the host, e.g. console beeps. To allow this, introduce KVM_CREATE_PIT2 which controls in-kernel speaker port emulation via a flag passed along the new IOCTL. It also leaves room for future extensions of the PIT configuration interface. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:41 +03:00
Gregory Haskins	721eecbf4f	KVM: irqfd KVM provides a complete virtual system environment for guests, including support for injecting interrupts modeled after the real exception/interrupt facilities present on the native platform (such as the IDT on x86). Virtual interrupts can come from a variety of sources (emulated devices, pass-through devices, etc) but all must be injected to the guest via the KVM infrastructure. This patch adds a new mechanism to inject a specific interrupt to a guest using a decoupled eventfd mechnanism: Any legal signal on the irqfd (using eventfd semantics from either userspace or kernel) will translate into an injected interrupt in the guest at the next available interrupt window. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:32:41 +03:00

1 2 3 4 5 ...

133 Commits