Commit Graph

950373 Commits

Author SHA1 Message Date
Ben Gardon
faaf05b00a kvm: x86/mmu: Support zapping SPTEs in the TDP MMU
Add functions to zap SPTEs to the TDP MMU. These are needed to tear down
TDP MMU roots properly and implement other MMU functions which require
tearing down mappings. Future patches will add functions to populate the
page tables, but as for this patch there will not be any work for these
functions to do.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-8-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 18:17:02 -04:00
Peter Xu
9e9eb226b9 KVM: Cache as_id in kvm_memory_slot
Cache the address space ID just like the slot ID.  It will be used in
order to fill in the dirty ring entries.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201014182700.2888246-7-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 18:17:01 -04:00
Ben Gardon
2f2fad0897 kvm: x86/mmu: Add functions to handle changed TDP SPTEs
The existing bookkeeping done by KVM when a PTE is changed is spread
around several functions. This makes it difficult to remember all the
stats, bitmaps, and other subsystems that need to be updated whenever a
PTE is modified. When a non-leaf PTE is marked non-present or becomes a
leaf PTE, page table memory must also be freed. To simplify the MMU and
facilitate the use of atomic operations on SPTEs in future patches, create
functions to handle some of the bookkeeping required as a result of
a change.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 18:17:01 -04:00
Ben Gardon
02c00b3a2f kvm: x86/mmu: Allocate and free TDP MMU roots
The TDP MMU must be able to allocate paging structure root pages and track
the usage of those pages. Implement a similar, but separate system for root
page allocation to that of the x86 shadow paging implementation. When
future patches add synchronization model changes to allow for parallel
page faults, these pages will need to be handled differently from the
x86 shadow paging based MMU's root pages.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 18:17:00 -04:00
Ben Gardon
fe5db27d36 kvm: x86/mmu: Init / Uninit the TDP MMU
The TDP MMU offers an alternative mode of operation to the x86 shadow
paging based MMU, optimized for running an L1 guest with TDP. The TDP MMU
will require new fields that need to be initialized and torn down. Add
hooks into the existing KVM MMU initialization process to do that
initialization / cleanup. Currently the initialization and cleanup
fucntions do not do very much, however more operations will be added in
future patches.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-4-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 18:17:00 -04:00
Ben Gardon
c9180b7291 kvm: x86/mmu: Introduce tdp_iter
The TDP iterator implements a pre-order traversal of a TDP paging
structure. This iterator will be used in future patches to create
an efficient implementation of the KVM MMU for the TDP case.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 18:17:00 -04:00
Paolo Bonzini
5a9624affe KVM: mmu: extract spte.h and spte.c
The SPTE format will be common to both the shadow and the TDP MMU.

Extract code that implements the format to a separate module, as a
first step towards adding the TDP MMU and putting mmu.c on a diet.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 18:16:59 -04:00
Paolo Bonzini
cb3eedab45 KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp
The TDP MMU's own function for the changed-PTE notifier will need to be
update a PTE in the exact same way as the shadow MMU.  Rather than
re-implementing this logic, factor the SPTE creation out of kvm_set_pte_rmapp.

Extracted out of a patch by Ben Gardon. <bgardon@google.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 18:16:59 -04:00
Ben Gardon
799a4190e7 kvm: x86/mmu: Separate making SPTEs from set_spte
Separate the functions for generating leaf page table entries from the
function that inserts them into the paging structure. This refactoring
will facilitate changes to the MMU sychronization model to use atomic
compare / exchanges (which are not guaranteed to succeed) instead of a
monolithic MMU lock.

No functional change expected.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This commit introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 18:16:58 -04:00
Ben Gardon
cc4674d0de kvm: mmu: Separate making non-leaf sptes from link_shadow_page
The TDP MMU page fault handler will need to be able to create non-leaf
SPTEs to build up the paging structures. Rather than re-implementing the
function, factor the SPTE creation out of link_shadow_page.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538

Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20200925212302.3979661-9-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 18:16:58 -04:00
Paolo Bonzini
c0623f5e5d Merge branch 'kvm-fixes' into 'next'
Pick up bugfixes from 5.9, otherwise various tests fail.
2020-10-21 18:05:58 -04:00
Joe Perches
a4f1d94e6b KVM: PPC: Book3S HV: Make struct kernel_param_ops definition const
This should be const, so make it so.

Signed-off-by: Joe Perches <joe@perches.com>
Message-Id: <d130e88dd4c82a12d979da747cc0365c72c3ba15.1601770305.git.joe@perches.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:48:51 -04:00
Lai Jiangshan
30031c2b05 KVM: x86: Let the guest own CR4.FSGSBASE
Add FSGSBASE to the set of possible guest-owned CR4 bits, i.e. let the
guest own it on VMX.  KVM never queries the guest's CR4.FSGSBASE value,
thus there is no reason to force VM-Exit on FSGSBASE being toggled.

Note, because FSGSBASE is conditionally available, this is dependent on
recent changes to intercept reserved CR4 bits and to update the CR4
guest/host mask in response to guest CPUID changes.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
[sean: added justification in changelog]
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200930041659.28181-6-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:48:50 -04:00
Sean Christopherson
2ed41aa631 KVM: VMX: Intercept guest reserved CR4 bits to inject #GP fault
Intercept CR4 bits that are guest reserved so that KVM correctly injects
a #GP fault if the guest attempts to set a reserved bit.  If a feature
is supported by the CPU but is not exposed to the guest, and its
associated CR4 bit is not intercepted by KVM by default, then KVM will
fail to inject a #GP if the guest sets the CR4 bit without triggering
an exit, e.g. by toggling only the bit in question.

Note, KVM doesn't give the guest direct access to any CR4 bits that are
also dependent on guest CPUID.  Yet.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200930041659.28181-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:48:50 -04:00
Sean Christopherson
a6337a3542 KVM: x86: Move call to update_exception_bitmap() into VMX code
Now that vcpu_after_set_cpuid() and update_exception_bitmap() are called
back-to-back, subsume the exception bitmap update into the common CPUID
update.  Drop the SVM invocation entirely as SVM's exception bitmap
doesn't vary with respect to guest CPUID.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200930041659.28181-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:48:50 -04:00
Sean Christopherson
c44d9b3470 KVM: x86: Invoke vendor's vcpu_after_set_cpuid() after all common updates
Move the call to kvm_x86_ops.vcpu_after_set_cpuid() to the very end of
kvm_vcpu_after_set_cpuid() to allow the vendor implementation to react
to changes made by the common code.  In the near future, this will be
used by VMX to update its CR4 guest/host masks to account for reserved
bits.  In the long term, SGX support will update the allowed XCR0 mask
for enclaves based on the vCPU's allowed XCR0.

vcpu_after_set_cpuid() (nee kvm_update_cpuid()) was originally added by
commit 2acf923e38 ("KVM: VMX: Enable XSAVE/XRSTOR for guest"), and was
called separately after kvm_x86_ops.vcpu_after_set_cpuid() (nee
kvm_x86_ops->cpuid_update()).  There is no indication that the placement
of the common code updates after the vendor updates was anything more
than a "new function at the end" decision.

Inspection of the current code reveals no dependency on kvm_x86_ops'
vcpu_after_set_cpuid() in kvm_vcpu_after_set_cpuid() or any of its
helpers.  The bulk of the common code depends only on the guest's CPUID
configuration, kvm_mmu_reset_context() does not consume dynamic vendor
state, and there are no collisions between kvm_pmu_refresh() and VMX's
update of PT state.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200930041659.28181-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:48:49 -04:00
Lai Jiangshan
6e1d849fa3 KVM: x86: Intercept LA57 to inject #GP fault when it's reserved
Unconditionally intercept changes to CR4.LA57 so that KVM correctly
injects a #GP fault if the guest attempts to set CR4.LA57 when it's
supported in hardware but not exposed to the guest.

Long term, KVM needs to properly handle CR4 bits that can be under guest
control but also may be reserved from the guest's perspective.  But, KVM
currently sets the CR4 guest/host mask only during vCPU creation, and
reworking flows to change that will take a bit of elbow grease.

Even if/when generic support for intercepting reserved bits exists, it's
probably not worth letting the guest set CR4.LA57 directly.  LA57 can't
be toggled while long mode is enabled, thus it's all but guaranteed to
be set once (maybe twice, e.g. by BIOS and kernel) during boot and never
touched again.  On the flip side, letting the guest own CR4.LA57 may
incur extra VMREADs.  In other words, this temporary "hack" is probably
also the right long term fix.

Fixes: fd8cb43373 ("KVM: MMU: Expose the LA57 feature to VM.")
Cc: stable@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
[sean: rewrote changelog]
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200930041659.28181-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:48:49 -04:00
Suravee Suthikulpanit
f6426ab9c9 KVM: SVM: Initialize prev_ga_tag before use
The function amd_ir_set_vcpu_affinity makes use of the parameter struct
amd_iommu_pi_data.prev_ga_tag to determine if it should delete struct
amd_iommu_pi_data from a list when not running in AVIC mode.

However, prev_ga_tag is initialized only when AVIC is enabled. The non-zero
uninitialized value can cause unintended code path, which ends up making
use of the struct vcpu_svm.ir_list and ir_list_lock without being
initialized (since they are intended only for the AVIC case).

This triggers NULL pointer dereference bug in the function vm_ir_list_del
with the following call trace:

    svm_update_pi_irte+0x3c2/0x550 [kvm_amd]
    ? proc_create_single_data+0x41/0x50
    kvm_arch_irq_bypass_add_producer+0x40/0x60 [kvm]
    __connect+0x5f/0xb0 [irqbypass]
    irq_bypass_register_producer+0xf8/0x120 [irqbypass]
    vfio_msi_set_vector_signal+0x1de/0x2d0 [vfio_pci]
    vfio_msi_set_block+0x77/0xe0 [vfio_pci]
    vfio_pci_set_msi_trigger+0x25c/0x2f0 [vfio_pci]
    vfio_pci_set_irqs_ioctl+0x88/0xb0 [vfio_pci]
    vfio_pci_ioctl+0x2ea/0xed0 [vfio_pci]
    ? alloc_file_pseudo+0xa5/0x100
    vfio_device_fops_unl_ioctl+0x26/0x30 [vfio]
    ? vfio_device_fops_unl_ioctl+0x26/0x30 [vfio]
    __x64_sys_ioctl+0x96/0xd0
    do_syscall_64+0x37/0x80
    entry_SYSCALL_64_after_hwframe+0x44/0xa9

Therefore, initialize prev_ga_tag to zero before use. This should be safe
because ga_tag value 0 is invalid (see function avic_vm_init).

Fixes: dfa20099e2 ("KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20201003232707.4662-1-suravee.suthikulpanit@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:48:49 -04:00
Maxim Levitsky
2fcf4876ad KVM: nSVM: implement on demand allocation of the nested state
This way we don't waste memory on VMs which don't use nesting
virtualization even when the host enabled it for them.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201001112954.6258-5-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:48:48 -04:00
Maxim Levitsky
72f211ecaa KVM: x86: allow kvm_x86_ops.set_efer to return an error value
This will be used to signal an error to the userspace, in case
the vendor code failed during handling of this msr. (e.g -ENOMEM)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201001112954.6258-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:48:48 -04:00
Maxim Levitsky
7dffecaf4e KVM: x86: report negative values from wrmsr emulation to userspace
This will allow the KVM to report such errors (e.g -ENOMEM)
to the userspace.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201001112954.6258-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:36:34 -04:00
Maxim Levitsky
36385ccc9b KVM: x86: xen_hvm_config: cleanup return values
Return 1 on errors that are caused by wrong guest behavior
(which will inject #GP to the guest)

And return a negative error value on issues that are
the kernel's fault (e.g -ENOMEM)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201001112954.6258-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:36:34 -04:00
Joe Perches
d5d6c18dc4 kvm x86/mmu: Make struct kernel_param_ops definitions const
These should be const, so make it so.

Signed-off-by: Joe Perches <joe@perches.com>
Message-Id: <ed95eef4f10fc1317b66936c05bc7dd8f943a6d5.1601770305.git.joe@perches.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:36:33 -04:00
Vitaly Kuznetsov
3f4e3eb417 KVM: x86: bump KVM_MAX_CPUID_ENTRIES
As vcpu->arch.cpuid_entries is now allocated dynamically, the only
remaining use for KVM_MAX_CPUID_ENTRIES is to check KVM_SET_CPUID/
KVM_SET_CPUID2 input for sanity. Since it was reported that the
current limit (80) is insufficient for some CPUs, bump
KVM_MAX_CPUID_ENTRIES and use an arbitrary value '256' as the new
limit.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201001130541.1398392-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:36:33 -04:00
Vitaly Kuznetsov
255cbecfe0 KVM: x86: allocate vcpu->arch.cpuid_entries dynamically
The current limit for guest CPUID leaves (KVM_MAX_CPUID_ENTRIES, 80)
is reported to be insufficient but before we bump it let's switch to
allocating vcpu->arch.cpuid_entries[] array dynamically. Currently,
'struct kvm_cpuid_entry2' is 40 bytes so vcpu->arch.cpuid_entries is
3200 bytes which accounts for 1/4 of the whole 'struct kvm_vcpu_arch'
but having it pre-allocated (for all vCPUs which we also pre-allocate)
gives us no real benefits.

Another plus of the dynamic allocation is that we now do kvm_check_cpuid()
check before we assign anything to vcpu->arch.cpuid_nent/cpuid_entries so
no changes are made in case the check fails.

Opportunistically remove unneeded 'out' labels from
kvm_vcpu_ioctl_set_cpuid()/kvm_vcpu_ioctl_set_cpuid2() and return
directly whenever possible.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201001130541.1398392-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
2020-10-21 17:36:33 -04:00