Commit Graph

223 Commits

Author SHA1 Message Date
Andrea Arcangeli 8ee53820ed thp: mmu_notifier_test_young
For GRU and EPT, we need gup-fast to set referenced bit too (this is why
it's correct to return 0 when shadow_access_mask is zero, it requires
gup-fast to set the referenced bit).  qemu-kvm access already sets the
young bit in the pte if it isn't zero-copy, if it's zero copy or a shadow
paging EPT minor fault we relay on gup-fast to signal the page is in
use...

We also need to check the young bits on the secondary pagetables for NPT
and not nested shadow mmu as the data may never get accessed again by the
primary pte.

Without this closer accuracy, we'd have to remove the heuristic that
avoids collapsing hugepages in hugepage virtual regions that have not even
a single subpage in use.

->test_young is full backwards compatible with GRU and other usages that
don't have young bits in pagetables set by the hardware and that should
nuke the secondary mmu mappings when ->clear_flush_young runs just like
EPT does.

Removing the heuristic that checks the young bit in
khugepaged/collapse_huge_page completely isn't so bad either probably but
I thought it was worth it and this makes it reliable.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:46 -08:00
Andrea Arcangeli 936a5fe6e6 thp: kvm mmu transparent hugepage support
This should work for both hugetlbfs and transparent hugepages.

[akpm@linux-foundation.org: bring forward PageTransCompound() addition for bisectability]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-13 17:32:41 -08:00
Avi Kivity b7c4145ba2 KVM: Don't spin on virt instruction faults during reboot
Since vmx blocks INIT signals, we disable virtualization extensions during
reboot.  This leads to virtualization instructions faulting; we trap these
faults and spin while the reboot continues.

Unfortunately spinning on a non-preemptible kernel may block a task that
reboot depends on; this causes the reboot to hang.

Fix by skipping over the instruction and hoping for the best.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:30:18 +02:00
Xiao Guangrong a4ee1ca4a3 KVM: MMU: delay flush all tlbs on sync_page path
Quote from Avi:
| I don't think we need to flush immediately; set a "tlb dirty" bit somewhere
| that is cleareded when we flush the tlb.  kvm_mmu_notifier_invalidate_page()
| can consult the bit and force a flush if set.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:29:51 +02:00
Takuya Yoshikawa 75b7127c38 KVM: rename hardware_[dis|en]able() to *_nolock() and add locking wrappers
The naming convension of hardware_[dis|en]able family is little bit confusing
because only hardware_[dis|en]able_all are using _nolock suffix.

Renaming current hardware_[dis|en]able() to *_nolock() and using
hardware_[dis|en]able() as wrapper functions which take kvm_lock for them
reduces extra confusion.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:29:29 +02:00
Takuya Yoshikawa 97e91e28fa KVM: take kvm_lock for hardware_disable() during cpu hotplug
In kvm_cpu_hotplug(), only CPU_STARTING case is protected by kvm_lock.
This patch adds missing protection for CPU_DYING case.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:29:28 +02:00
Jan Kiszka d89f5eff70 KVM: Clean up vm creation and release
IA64 support forces us to abstract the allocation of the kvm structure.
But instead of mixing this up with arch-specific initialization and
doing the same on destruction, split both steps. This allows to move
generic destruction calls into generic code.

It also fixes error clean-up on failures of kvm_create_vm for IA64.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:29:09 +02:00
Jan Kiszka 57e7fbee1d KVM: Refactor srcu struct release on early errors
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:29:05 +02:00
Takuya Yoshikawa 2653503769 KVM: replace vmalloc and memset with vzalloc
Let's use newly introduced vzalloc().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:28:55 +02:00
Heiko Carstens aac8763697 KVM: get rid of warning within kvm_dev_ioctl_create_vm
Fixes this:

  CC      arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_dev_ioctl_create_vm':
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1828:10: warning: unused variable 'r'

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:28:50 +02:00
Heiko Carstens 3bcc8a8c6c KVM: add cast within kvm_clear_guest_page to fix warning
Fixes this:

  CC      arch/s390/kvm/../../../virt/kvm/kvm_main.o
arch/s390/kvm/../../../virt/kvm/kvm_main.c: In function 'kvm_clear_guest_page':
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1224:2: warning: passing argument 3 of 'kvm_write_guest_page' makes pointer from integer without a cast
arch/s390/kvm/../../../virt/kvm/kvm_main.c:1185:5: note: expected 'const void *' but argument is of type 'long unsigned int'

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:28:49 +02:00
Takuya Yoshikawa 6f9e5c1702 KVM: use kmalloc() for small dirty bitmaps
Currently we are using vmalloc() for all dirty bitmaps even if
they are small enough, say less than K bytes.

We use kmalloc() if dirty bitmap size is less than or equal to
PAGE_SIZE so that we can avoid vmalloc area usage for VGA.

This will also make the logging start/stop faster.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:28:48 +02:00
Takuya Yoshikawa 515a01279a KVM: pre-allocate one more dirty bitmap to avoid vmalloc()
Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by
vmalloc() which will be used in the next logging and this has been causing
bad effect to VGA and live-migration: vmalloc() consumes extra systime,
triggers tlb flush, etc.

This patch resolves this issue by pre-allocating one more bitmap and switching
between two bitmaps during dirty logging.

Performance improvement:
  I measured performance for the case of VGA update by trace-cmd.
  The result was 1.5 times faster than the original one.

  In the case of live migration, the improvement ratio depends on the workload
  and the guest memory size. In general, the larger the memory size is the more
  benefits we get.

Note:
  This does not change other architectures's logic but the allocation size
  becomes twice. This will increase the actual memory consumption only when
  the new size changes the number of pages allocated by vmalloc().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:28:46 +02:00
Takuya Yoshikawa a36a57b1a1 KVM: introduce wrapper functions for creating/destroying dirty bitmaps
This makes it easy to change the way of allocating/freeing dirty bitmaps.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:28:45 +02:00
Gleb Natapov 64be500706 KVM: x86: trace "exit to userspace" event
Add tracepoint for userspace exit.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:28:44 +02:00
Marcelo Tosatti 612819c3c6 KVM: propagate fault r/w information to gup(), allow read-only memory
As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
to writable if host pte allows it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-01-12 11:28:40 +02:00
Gleb Natapov 8030089f9e KVM: improve hva_to_pfn() readability
Improve vma handling code readability in hva_to_pfn() and fix
async pf handling code to properly check vma returned by find_vma().

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:23:23 +02:00
Gleb Natapov 49c7754ce5 KVM: Add memory slot versioning and use it to provide fast guest write interface
Keep track of memslots changes by keeping generation number in memslots
structure. Provide kvm_write_guest_cached() function that skips
gfn_to_hva() translation if memslots was not changed since previous
invocation.

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:23:08 +02:00
Gleb Natapov af585b921e KVM: Halt vcpu if page it tries to access is swapped out
If a guest accesses swapped out memory do not swap it in from vcpu thread
context. Schedule work to do swapping and put vcpu into halted state
instead.

Interrupts will still be delivered to the guest and if interrupt will
cause reschedule guest will continue to run another task.

[avi: remove call to get_user_pages_noio(), nacked by Linus; this
      makes everything synchrnous again]

Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-01-12 11:21:39 +02:00
Linus Torvalds 1765a1fe5d Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits)
  KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages
  KVM: Fix signature of kvm_iommu_map_pages stub
  KVM: MCE: Send SRAR SIGBUS directly
  KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED
  KVM: fix typo in copyright notice
  KVM: Disable interrupts around get_kernel_ns()
  KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address
  KVM: MMU: move access code parsing to FNAME(walk_addr) function
  KVM: MMU: audit: check whether have unsync sps after root sync
  KVM: MMU: audit: introduce audit_printk to cleanup audit code
  KVM: MMU: audit: unregister audit tracepoints before module unloaded
  KVM: MMU: audit: fix vcpu's spte walking
  KVM: MMU: set access bit for direct mapping
  KVM: MMU: cleanup for error mask set while walk guest page table
  KVM: MMU: update 'root_hpa' out of loop in PAE shadow path
  KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn()
  KVM: x86: Fix constant type in kvm_get_time_scale
  KVM: VMX: Add AX to list of registers clobbered by guest switch
  KVM guest: Move a printk that's using the clock before it's ready
  KVM: x86: TSC catchup mode
  ...
2010-10-24 12:47:25 -07:00
Jan Kiszka 2a31339aa0 KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages
We also have to call kvm_iommu_map_pages for CONFIG_AMD_IOMMU. So drop
the dependency on Intel IOMMU, kvm_iommu_map_pages will be a nop anyway
if CONFIG_IOMMU_API is not defined.

KVM-Stable-Tag.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:15 +02:00
Nicolas Kaiser 9611c18777 KVM: fix typo in copyright notice
Fix typo in copyright notice.

Signed-off-by: Nicolas Kaiser <nikai@nikai.net>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:14 +02:00
Avi Kivity 624d84cfe6 KVM: cpu_relax() during spin waiting for reboot
It doesn't really matter, but if we spin, we should spin in a more relaxed
manner.  This way, if something goes wrong at least it won't contribute to
global warming.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:53:03 +02:00
Xiao Guangrong 365fb3fdf6 KVM: MMU: rewrite audit_mappings_page() function
There is a bugs in this function, we call gfn_to_pfn() and kvm_mmu_gva_to_gpa_read() in
atomic context(kvm_mmu_audit() is called under the spinlock(mmu_lock)'s protection).

This patch fix it by:
- introduce gfn_to_pfn_atomic instead of gfn_to_pfn
- get the mapping gfn from kvm_mmu_page_get_gfn()

And it adds 'notrap' ptes check in unsync/direct sps

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-10-24 10:51:48 +02:00
Xiao Guangrong 48987781eb KVM: MMU: introduce gfn_to_page_many_atomic() function
Introduce this function to get consecutive gfn's pages, it can reduce
gup's overload, used by later patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-10-24 10:51:26 +02:00