You've already forked linux-apfs
mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini: "On the x86 side, there are some optimizations and documentation updates. The big ARM/KVM change for 3.11, support for AArch64, will come through Catalin Marinas's tree. s390 and PPC have misc cleanups and bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits) KVM: PPC: Ignore PIR writes KVM: PPC: Book3S PR: Invalidate SLB entries properly KVM: PPC: Book3S PR: Allow guest to use 1TB segments KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry KVM: PPC: Book3S PR: Fix proto-VSID calculations KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL KVM: Fix RTC interrupt coalescing tracking kvm: Add a tracepoint write_tsc_offset KVM: MMU: Inform users of mmio generation wraparound KVM: MMU: document fast invalidate all mmio sptes KVM: MMU: document fast invalidate all pages KVM: MMU: document fast page fault KVM: MMU: document mmio page fault KVM: MMU: document write_flooding_count KVM: MMU: document clear_spte_count KVM: MMU: drop kvm_mmu_zap_mmio_sptes KVM: MMU: init kvm generation close to mmio wrap-around value KVM: MMU: add tracepoint for check_mmio_spte KVM: MMU: fast invalidate all mmio sptes ...
This commit is contained in:
@@ -2278,7 +2278,7 @@ return indicates the attribute is implemented. It does not necessarily
|
||||
indicate that the attribute can be read or written in the device's
|
||||
current state. "addr" is ignored.
|
||||
|
||||
4.77 KVM_ARM_VCPU_INIT
|
||||
4.82 KVM_ARM_VCPU_INIT
|
||||
|
||||
Capability: basic
|
||||
Architectures: arm, arm64
|
||||
@@ -2304,7 +2304,7 @@ Possible features:
|
||||
Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
|
||||
|
||||
|
||||
4.78 KVM_GET_REG_LIST
|
||||
4.83 KVM_GET_REG_LIST
|
||||
|
||||
Capability: basic
|
||||
Architectures: arm, arm64
|
||||
@@ -2324,7 +2324,7 @@ This ioctl returns the guest registers that are supported for the
|
||||
KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
|
||||
|
||||
|
||||
4.80 KVM_ARM_SET_DEVICE_ADDR
|
||||
4.84 KVM_ARM_SET_DEVICE_ADDR
|
||||
|
||||
Capability: KVM_CAP_ARM_SET_DEVICE_ADDR
|
||||
Architectures: arm, arm64
|
||||
@@ -2362,7 +2362,7 @@ must be called after calling KVM_CREATE_IRQCHIP, but before calling
|
||||
KVM_RUN on any of the VCPUs. Calling this ioctl twice for any of the
|
||||
base addresses will return -EEXIST.
|
||||
|
||||
4.82 KVM_PPC_RTAS_DEFINE_TOKEN
|
||||
4.85 KVM_PPC_RTAS_DEFINE_TOKEN
|
||||
|
||||
Capability: KVM_CAP_PPC_RTAS
|
||||
Architectures: ppc
|
||||
|
||||
@@ -191,12 +191,12 @@ Shadow pages contain the following information:
|
||||
A counter keeping track of how many hardware registers (guest cr3 or
|
||||
pdptrs) are now pointing at the page. While this counter is nonzero, the
|
||||
page cannot be destroyed. See role.invalid.
|
||||
multimapped:
|
||||
Whether there exist multiple sptes pointing at this page.
|
||||
parent_pte/parent_ptes:
|
||||
If multimapped is zero, parent_pte points at the single spte that points at
|
||||
this page's spt. Otherwise, parent_ptes points at a data structure
|
||||
with a list of parent_ptes.
|
||||
parent_ptes:
|
||||
The reverse mapping for the pte/ptes pointing at this page's spt. If
|
||||
parent_ptes bit 0 is zero, only one spte points at this pages and
|
||||
parent_ptes points at this single spte, otherwise, there exists multiple
|
||||
sptes pointing at this page and (parent_ptes & ~0x1) points at a data
|
||||
structure with a list of parent_ptes.
|
||||
unsync:
|
||||
If true, then the translations in this page may not match the guest's
|
||||
translation. This is equivalent to the state of the tlb when a pte is
|
||||
@@ -210,6 +210,24 @@ Shadow pages contain the following information:
|
||||
A bitmap indicating which sptes in spt point (directly or indirectly) at
|
||||
pages that may be unsynchronized. Used to quickly locate all unsychronized
|
||||
pages reachable from a given page.
|
||||
mmu_valid_gen:
|
||||
Generation number of the page. It is compared with kvm->arch.mmu_valid_gen
|
||||
during hash table lookup, and used to skip invalidated shadow pages (see
|
||||
"Zapping all pages" below.)
|
||||
clear_spte_count:
|
||||
Only present on 32-bit hosts, where a 64-bit spte cannot be written
|
||||
atomically. The reader uses this while running out of the MMU lock
|
||||
to detect in-progress updates and retry them until the writer has
|
||||
finished the write.
|
||||
write_flooding_count:
|
||||
A guest may write to a page table many times, causing a lot of
|
||||
emulations if the page needs to be write-protected (see "Synchronized
|
||||
and unsynchronized pages" below). Leaf pages can be unsynchronized
|
||||
so that they do not trigger frequent emulation, but this is not
|
||||
possible for non-leafs. This field counts the number of emulations
|
||||
since the last time the page table was actually used; if emulation
|
||||
is triggered too frequently on this page, KVM will unmap the page
|
||||
to avoid emulation in the future.
|
||||
|
||||
Reverse map
|
||||
===========
|
||||
@@ -258,14 +276,26 @@ This is the most complicated event. The cause of a page fault can be:
|
||||
|
||||
Handling a page fault is performed as follows:
|
||||
|
||||
- if the RSV bit of the error code is set, the page fault is caused by guest
|
||||
accessing MMIO and cached MMIO information is available.
|
||||
- walk shadow page table
|
||||
- check for valid generation number in the spte (see "Fast invalidation of
|
||||
MMIO sptes" below)
|
||||
- cache the information to vcpu->arch.mmio_gva, vcpu->arch.access and
|
||||
vcpu->arch.mmio_gfn, and call the emulator
|
||||
- If both P bit and R/W bit of error code are set, this could possibly
|
||||
be handled as a "fast page fault" (fixed without taking the MMU lock). See
|
||||
the description in Documentation/virtual/kvm/locking.txt.
|
||||
- if needed, walk the guest page tables to determine the guest translation
|
||||
(gva->gpa or ngpa->gpa)
|
||||
- if permissions are insufficient, reflect the fault back to the guest
|
||||
- determine the host page
|
||||
- if this is an mmio request, there is no host page; call the emulator
|
||||
to emulate the instruction instead
|
||||
- if this is an mmio request, there is no host page; cache the info to
|
||||
vcpu->arch.mmio_gva, vcpu->arch.access and vcpu->arch.mmio_gfn
|
||||
- walk the shadow page table to find the spte for the translation,
|
||||
instantiating missing intermediate page tables as necessary
|
||||
- If this is an mmio request, cache the mmio info to the spte and set some
|
||||
reserved bit on the spte (see callers of kvm_mmu_set_mmio_spte_mask)
|
||||
- try to unsynchronize the page
|
||||
- if successful, we can let the guest continue and modify the gpte
|
||||
- emulate the instruction
|
||||
@@ -351,6 +381,51 @@ causes its write_count to be incremented, thus preventing instantiation of
|
||||
a large spte. The frames at the end of an unaligned memory slot have
|
||||
artificially inflated ->write_counts so they can never be instantiated.
|
||||
|
||||
Zapping all pages (page generation count)
|
||||
=========================================
|
||||
|
||||
For the large memory guests, walking and zapping all pages is really slow
|
||||
(because there are a lot of pages), and also blocks memory accesses of
|
||||
all VCPUs because it needs to hold the MMU lock.
|
||||
|
||||
To make it be more scalable, kvm maintains a global generation number
|
||||
which is stored in kvm->arch.mmu_valid_gen. Every shadow page stores
|
||||
the current global generation-number into sp->mmu_valid_gen when it
|
||||
is created. Pages with a mismatching generation number are "obsolete".
|
||||
|
||||
When KVM need zap all shadow pages sptes, it just simply increases the global
|
||||
generation-number then reload root shadow pages on all vcpus. As the VCPUs
|
||||
create new shadow page tables, the old pages are not used because of the
|
||||
mismatching generation number.
|
||||
|
||||
KVM then walks through all pages and zaps obsolete pages. While the zap
|
||||
operation needs to take the MMU lock, the lock can be released periodically
|
||||
so that the VCPUs can make progress.
|
||||
|
||||
Fast invalidation of MMIO sptes
|
||||
===============================
|
||||
|
||||
As mentioned in "Reaction to events" above, kvm will cache MMIO
|
||||
information in leaf sptes. When a new memslot is added or an existing
|
||||
memslot is changed, this information may become stale and needs to be
|
||||
invalidated. This also needs to hold the MMU lock while walking all
|
||||
shadow pages, and is made more scalable with a similar technique.
|
||||
|
||||
MMIO sptes have a few spare bits, which are used to store a
|
||||
generation number. The global generation number is stored in
|
||||
kvm_memslots(kvm)->generation, and increased whenever guest memory info
|
||||
changes. This generation number is distinct from the one described in
|
||||
the previous section.
|
||||
|
||||
When KVM finds an MMIO spte, it checks the generation number of the spte.
|
||||
If the generation number of the spte does not equal the global generation
|
||||
number, it will ignore the cached MMIO information and handle the page
|
||||
fault through the slow path.
|
||||
|
||||
Since only 19 bits are used to store generation-number on mmio spte, all
|
||||
pages are zapped when there is an overflow.
|
||||
|
||||
|
||||
Further reading
|
||||
===============
|
||||
|
||||
|
||||
Reference in New Issue
Block a user