Commit Graph

250 Commits

Author SHA1 Message Date
Thomas Meyer 6da64fdb8c KVM: Use kmemdup rather than duplicating its implementation
Use kmemdup rather than duplicating its implementation

 The semantic patch that makes this change is available
 in scripts/coccinelle/api/memdup.cocci.

 More information about semantic patching is available at
 http://coccinelle.lip6.fr/

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-12-27 11:17:11 +02:00
Sasha Levin 743eeb0b01 KVM: Intelligent device lookup on I/O bus
Currently the method of dealing with an IO operation on a bus (PIO/MMIO)
is to call the read or write callback for each device registered
on the bus until we find a device which handles it.

Since the number of devices on a bus can be significant due to ioeventfds
and coalesced MMIO zones, this leads to a lot of overhead on each IO
operation.

Instead of registering devices, we now register ranges which points to
a device. Lookup is done using an efficient bsearch instead of a linear
search.

Performance test was conducted by comparing exit count per second with
200 ioeventfds created on one byte and the guest is trying to access a
different byte continuously (triggering usermode exits).
Before the patch the guest has achieved 259k exits per second, after the
patch the guest does 274k exits per second.

Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-09-25 19:17:59 +03:00
Xiao Guangrong ce88decffd KVM: MMU: mmio page fault support
The idea is from Avi:

| We could cache the result of a miss in an spte by using a reserved bit, and
| checking the page fault error code (or seeing if we get an ept violation or
| ept misconfiguration), so if we get repeated mmio on a page, we don't need to
| search the slot list/tree.
| (https://lkml.org/lkml/2011/2/22/221)

When the page fault is caused by mmio, we cache the info in the shadow page
table, and also set the reserved bits in the shadow page table, so if the mmio
is caused again, we can quickly identify it and emulate it directly

Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it
can be reduced by this feature, and also avoid walking guest page table for
soft mmu.

[jan: fix operator precedence issue]

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24 11:50:40 +03:00
Xiao Guangrong fce92dce79 KVM: MMU: filter out the mmio pfn from the fault pfn
If the page fault is caused by mmio, the gfn can not be found in memslots, and
'bad_pfn' is returned on gfn_to_hva path, so we can use 'bad_pfn' to identify
the mmio page fault.
And, to clarify the meaning of mmio pfn, we return fault page instead of bad
page when the gfn is not allowd to prefetch

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-24 11:50:34 +03:00
Gleb Natapov e03b644fe6 KVM: introduce kvm_read_guest_cached
Introduce kvm_read_guest_cached() function in addition to write one we
already have.

[ by glauber: export function signature in kvm header ]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Glauber Costa <glommer@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Eric Munson <emunson@mgebm.net>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:17:01 +03:00
Alexander Graf 1dda606c5f KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK
KVM has an ioctl to define which signal mask should be used while running
inside VCPU_RUN. At least for big endian systems, this mask is different
on 32-bit and 64-bit systems (though the size is identical).

Add a compat wrapper that converts the mask to whatever the kernel accepts,
allowing 32-bit kvm user space to set signal masks.

This patch fixes qemu with --enable-io-thread on ppc64 hosts when running
32-bit user land.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 13:16:17 +03:00
Jan Kiszka d780592b99 KVM: Clean up error handling during VCPU creation
So far kvm_arch_vcpu_setup is responsible for freeing the vcpu struct if
it fails. Move this confusing resonsibility back into the hands of
kvm_vm_ioctl_create_vcpu. Only kvm_arch_vcpu_setup of x86 is affected,
all other archs cannot fail.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-07-12 11:45:08 +03:00
Xiao Guangrong 8b0cedff04 KVM: use __copy_to_user/__clear_user to write guest page
Simply use __copy_to_user/__clear_user to write guest page since we have
already verified the user address when the memslot is set

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-07-12 11:45:03 +03:00
Mike Waychison 74b5c5bfff KVM: Initialize kvm before registering the mmu notifier
It doesn't make sense to ever see a half-initialized kvm structure on
mmu notifier callbacks.  Previously, 85722cda changed the ordering to
ensure that the mmu_lock was initialized before mmu notifier
registration, but there is still a race where the mmu notifier could
come in and try accessing other portions of struct kvm before they are
intialized.

Solve this by moving the mmu notifier registration to occur after the
structure is completely initialized.

Google-Bug-Id: 452199
Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-06-06 11:27:52 +03:00
Heiko Carstens 9e3bb6b6f6 KVM: add missing void __user * cast to access_ok() call
fa3d315a "KVM: Validate userspace_addr of memslot when registered" introduced
this new warning onn s390:

kvm_main.c: In function '__kvm_set_memory_region':
kvm_main.c:654:7: warning: passing argument 1 of '__access_ok' makes pointer from integer without a cast
arch/s390/include/asm/uaccess.h:53:19: note: expected 'const void *' but argument is of type '__u64'

Add the missing cast to get rid of it again...

Cc: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-26 02:41:44 -04:00
OGAWA Hirofumi 85722cda30 KVM: Fix kvm mmu_notifier initialization order
Like the following, mmu_notifier can be called after registering
immediately. So, kvm have to initialize kvm->mmu_lock before it.

BUG: spinlock bad magic on CPU#0, kswapd0/342
 lock: ffff8800af8c4000, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
Pid: 342, comm: kswapd0 Not tainted 2.6.39-rc5+ #1
Call Trace:
 [<ffffffff8118ce61>] spin_bug+0x9c/0xa3
 [<ffffffff8118ce91>] do_raw_spin_lock+0x29/0x13c
 [<ffffffff81024923>] ? flush_tlb_others_ipi+0xaf/0xfd
 [<ffffffff812e22f3>] _raw_spin_lock+0x9/0xb
 [<ffffffffa0582325>] kvm_mmu_notifier_clear_flush_young+0x2c/0x66 [kvm]
 [<ffffffff810d3ff3>] __mmu_notifier_clear_flush_young+0x2b/0x57
 [<ffffffff810c8761>] page_referenced_one+0x88/0xea
 [<ffffffff810c89bf>] page_referenced+0x1fc/0x256
 [<ffffffff810b2771>] shrink_page_list+0x187/0x53a
 [<ffffffff810b2ed7>] shrink_inactive_list+0x1e0/0x33d
 [<ffffffff810acf95>] ? determine_dirtyable_memory+0x15/0x27
 [<ffffffff812e90ee>] ? call_function_single_interrupt+0xe/0x20
 [<ffffffff810b3356>] shrink_zone+0x322/0x3de
 [<ffffffff810a9587>] ? zone_watermark_ok_safe+0xe2/0xf1
 [<ffffffff810b3928>] kswapd+0x516/0x818
 [<ffffffff810b3412>] ? shrink_zone+0x3de/0x3de
 [<ffffffff81053d17>] kthread+0x7d/0x85
 [<ffffffff812e9394>] kernel_thread_helper+0x4/0x10
 [<ffffffff81053c9a>] ? __init_kthread_worker+0x37/0x37
 [<ffffffff812e9390>] ? gs_change+0xb/0xb

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:48:12 -04:00
Takuya Yoshikawa fa3d315a4c KVM: Validate userspace_addr of memslot when registered
This way, we can avoid checking the user space address many times when
we read the guest memory.

Although we can do the same for write if we check which slots are
writable, we do not care write now: reading the guest memory happens
more often than writing.

[avi: change VERIFY_READ to VERIFY_WRITE]

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-22 08:47:56 -04:00
Xiao Guangrong 0ee8dcb87e KVM: cleanup memslot_id function
We can get memslot id from memslot->id directly

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:53 -04:00
Gleb Natapov 0857b9e95c KVM: Enable async page fault processing
If asynchronous hva_to_pfn() is requested call GUP with FOLL_NOWAIT to
avoid sleeping on IO. Check for hwpoison is done at the same time,
otherwise check_user_page_hwpoison() will call GUP again and will put
vcpu to sleep.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-04-06 13:15:55 +03:00
Linus Torvalds 16c29dafcc Merge branch 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  Introduce ARCH_NO_SYSDEV_OPS config option (v2)
  cpufreq: Use syscore_ops for boot CPU suspend/resume (v2)
  KVM: Use syscore_ops instead of sysdev class and sysdev
  PCI / Intel IOMMU: Use syscore_ops instead of sysdev class and sysdev
  timekeeping: Use syscore_ops instead of sysdev class and sysdev
  x86: Use syscore_ops instead of sysdev classes and sysdevs
2011-03-25 21:07:59 -07:00
Akinobu Mita cd7e48c5de kvm: use little-endian bitops
As a preparation for removing ext2 non-atomic bit operations from
asm/bitops.h.  This converts ext2 non-atomic bit operations to
little-endian bit operations.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23 19:46:16 -07:00
Akinobu Mita 5140a357ea kvm: stop including asm-generic/bitops/le.h directly
asm-generic/bitops/le.h is only intended to be included directly from
asm-generic/bitops/ext2-non-atomic.h or asm-generic/bitops/minix-le.h
which implements generic ext2 or minix bit operations.

This stops including asm-generic/bitops/le.h directly and use ext2
non-atomic bit operations instead.

It seems odd to use ext2_set_bit() on kvm, but it will replaced with
__set_bit_le() after introducing little endian bit operations for all
architectures.  This indirect step is necessary to maintain bisectability
for some architectures which have their own little-endian bit operations.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-23 19:46:10 -07:00
Rafael J. Wysocki fb3600cc50 KVM: Use syscore_ops instead of sysdev class and sysdev
KVM uses a sysdev class and a sysdev for executing kvm_suspend()
after interrupts have been turned off on the boot CPU (during system
suspend) and for executing kvm_resume() before turning on interrupts
on the boot CPU (during system resume).  However, since both of these
functions ignore their arguments, the entire mechanism may be
replaced with a struct syscore_ops object which is simpler.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Avi Kivity <avi@redhat.com>
2011-03-23 22:16:23 +01:00
Jan Kiszka e935b8372c KVM: Convert kvm_lock to raw_spinlock
Code under this lock requires non-preemptibility. Ensure this also over
-rt by converting it to raw spinlock.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:30 -03:00
Rik van Riel 217ece6129 KVM: use yield_to instead of sleep in kvm_vcpu_on_spin
Instead of sleeping in kvm_vcpu_on_spin, which can cause gigantic
slowdowns of certain workloads, we instead use yield_to to get
another VCPU in the same KVM guest to run sooner.

This seems to give a 10-15% speedup in certain workloads.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:29 -03:00
Rik van Riel 34bb10b79d KVM: keep track of which task is running a KVM vcpu
Keep track of which task is running a KVM vcpu.  This helps us
figure out later what task to wake up if we want to boost a
vcpu that got preempted.

Unfortunately there are no guarantees that the same task
always keeps the same vcpu, so we can only track the task
across a single "run" of the vcpu.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:29 -03:00
Huang Ying fafc3dbaac KVM: Replace is_hwpoison_address with __get_user_pages
is_hwpoison_address only checks whether the page table entry is
hwpoisoned, regardless the memory page mapped.  While __get_user_pages
will check both.

QEMU will clear the poisoned page table entry (via unmap/map) to make
it possible to allocate a new memory page for the virtual address
across guest rebooting.  But it is also possible that the underlying
memory page is kept poisoned even after the corresponding page table
entry is cleared, that is, a new memory page can not be allocated.
__get_user_pages can catch these situations.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-17 13:08:27 -03:00
Xiao Guangrong 3cba41307a KVM: make make_all_cpus_request() lockless
Now, we have 'vcpu->mode' to judge whether need to send ipi to other
cpus, this way is very exact, so checking request bit is needless,
then we can drop the spinlock let it's collateral

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:26 -03:00
Xiao Guangrong 6b7e2d0991 KVM: Add "exiting guest mode" state
Currently we keep track of only two states: guest mode and host
mode.  This patch adds an "exiting guest mode" state that tells
us that an IPI will happen soon, so unless we need to wait for the
IPI, we can avoid it completely.

Also
1: No need atomically to read/write ->mode in vcpu's thread

2: reorganize struct kvm_vcpu to make ->mode and ->requests
   in the same cache line explicitly

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:26 -03:00
Heiko Carstens d48ead8b0b KVM: fix build warning within __kvm_set_memory_region() on s390
Get rid of this warning:

  CC      arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c:596:12: warning: 'kvm_create_dirty_bitmap' defined but not used

The only caller of the function is within a !CONFIG_S390 section, so add the
same ifdef around kvm_create_dirty_bitmap() as well.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-17 13:08:26 -03:00