Commit Graph

2524 Commits

Author SHA1 Message Date
Linus Torvalds
61307b7be4 Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull mm updates from Andrew Morton:
 "The usual shower of singleton fixes and minor series all over MM,
  documented (hopefully adequately) in the respective changelogs.
  Notable series include:

   - Lucas Stach has provided some page-mapping cleanup/consolidation/
     maintainability work in the series "mm/treewide: Remove pXd_huge()
     API".

   - In the series "Allow migrate on protnone reference with
     MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
     MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
     one test.

   - In their series "Memory allocation profiling" Kent Overstreet and
     Suren Baghdasaryan have contributed a means of determining (via
     /proc/allocinfo) whereabouts in the kernel memory is being
     allocated: number of calls and amount of memory.

   - Matthew Wilcox has provided the series "Various significant MM
     patches" which does a number of rather unrelated things, but in
     largely similar code sites.

   - In his series "mm: page_alloc: freelist migratetype hygiene"
     Johannes Weiner has fixed the page allocator's handling of
     migratetype requests, with resulting improvements in compaction
     efficiency.

   - In the series "make the hugetlb migration strategy consistent"
     Baolin Wang has fixed a hugetlb migration issue, which should
     improve hugetlb allocation reliability.

   - Liu Shixin has hit an I/O meltdown caused by readahead in a
     memory-tight memcg. Addressed in the series "Fix I/O high when
     memory almost met memcg limit".

   - In the series "mm/filemap: optimize folio adding and splitting"
     Kairui Song has optimized pagecache insertion, yielding ~10%
     performance improvement in one test.

   - Baoquan He has cleaned up and consolidated the early zone
     initialization code in the series "mm/mm_init.c: refactor
     free_area_init_core()".

   - Baoquan has also redone some MM initializatio code in the series
     "mm/init: minor clean up and improvement".

   - MM helper cleanups from Christoph Hellwig in his series "remove
     follow_pfn".

   - More cleanups from Matthew Wilcox in the series "Various
     page->flags cleanups".

   - Vlastimil Babka has contributed maintainability improvements in the
     series "memcg_kmem hooks refactoring".

   - More folio conversions and cleanups in Matthew Wilcox's series:
	"Convert huge_zero_page to huge_zero_folio"
	"khugepaged folio conversions"
	"Remove page_idle and page_young wrappers"
	"Use folio APIs in procfs"
	"Clean up __folio_put()"
	"Some cleanups for memory-failure"
	"Remove page_mapping()"
	"More folio compat code removal"

   - David Hildenbrand chipped in with "fs/proc/task_mmu: convert
     hugetlb functions to work on folis".

   - Code consolidation and cleanup work related to GUP's handling of
     hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".

   - Rick Edgecombe has developed some fixes to stack guard gaps in the
     series "Cover a guard gap corner case".

   - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
     series "mm/ksm: fix ksm exec support for prctl".

   - Baolin Wang has implemented NUMA balancing for multi-size THPs.
     This is a simple first-cut implementation for now. The series is
     "support multi-size THP numa balancing".

   - Cleanups to vma handling helper functions from Matthew Wilcox in
     the series "Unify vma_address and vma_pgoff_address".

   - Some selftests maintenance work from Dev Jain in the series
     "selftests/mm: mremap_test: Optimizations and style fixes".

   - Improvements to the swapping of multi-size THPs from Ryan Roberts
     in the series "Swap-out mTHP without splitting".

   - Kefeng Wang has significantly optimized the handling of arm64's
     permission page faults in the series
	"arch/mm/fault: accelerate pagefault when badaccess"
	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"

   - GUP cleanups from David Hildenbrand in "mm/gup: consistently call
     it GUP-fast".

   - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
     path to use struct vm_fault".

   - selftests build fixes from John Hubbard in the series "Fix
     selftests/mm build without requiring "make headers"".

   - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
     series "Improved Memory Tier Creation for CPUless NUMA Nodes".
     Fixes the initialization code so that migration between different
     memory types works as intended.

   - David Hildenbrand has improved follow_pte() and fixed an errant
     driver in the series "mm: follow_pte() improvements and acrn
     follow_pte() fixes".

   - David also did some cleanup work on large folio mapcounts in his
     series "mm: mapcount for large folios + page_mapcount() cleanups".

   - Folio conversions in KSM in Alex Shi's series "transfer page to
     folio in KSM".

   - Barry Song has added some sysfs stats for monitoring multi-size
     THP's in the series "mm: add per-order mTHP alloc and swpout
     counters".

   - Some zswap cleanups from Yosry Ahmed in the series "zswap
     same-filled and limit checking cleanups".

   - Matthew Wilcox has been looking at buffer_head code and found the
     documentation to be lacking. The series is "Improve buffer head
     documentation".

   - Multi-size THPs get more work, this time from Lance Yang. His
     series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
     optimizes the freeing of these things.

   - Kemeng Shi has added more userspace-visible writeback
     instrumentation in the series "Improve visibility of writeback".

   - Kemeng Shi then sent some maintenance work on top in the series
     "Fix and cleanups to page-writeback".

   - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
     the series "Improve anon_vma scalability for anon VMAs". Intel's
     test bot reported an improbable 3x improvement in one test.

   - SeongJae Park adds some DAMON feature work in the series
	"mm/damon: add a DAMOS filter type for page granularity access recheck"
	"selftests/damon: add DAMOS quota goal test"

   - Also some maintenance work in the series
	"mm/damon/paddr: simplify page level access re-check for pageout"
	"mm/damon: misc fixes and improvements"

   - David Hildenbrand has disabled some known-to-fail selftests ni the
     series "selftests: mm: cow: flag vmsplice() hugetlb tests as
     XFAIL".

   - memcg metadata storage optimizations from Shakeel Butt in "memcg:
     reduce memory consumption by memcg stats".

   - DAX fixes and maintenance work from Vishal Verma in the series
     "dax/bus.c: Fixups for dax-bus locking""

* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
  memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
  selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
  selftests: cgroup: add tests to verify the zswap writeback path
  mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
  mm/damon/core: fix return value from damos_wmark_metric_value
  mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
  selftests: cgroup: remove redundant enabling of memory controller
  Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
  Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
  Docs/mm/damon/design: use a list for supported filters
  Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
  Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
  selftests/damon: classify tests for functionalities and regressions
  selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
  selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
  selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
  mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
  selftests/damon: add a test for DAMOS quota goal
  ...
2024-05-19 09:21:03 -07:00
Paolo Bonzini
7d41e24da2 Merge tag 'kvm-x86-misc-6.10' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.10:

 - Advertise the max mappable GPA in the "guest MAXPHYADDR" CPUID field, which
   is unused by hardware, so that KVM can communicate its inability to map GPAs
   that set bits 51:48 due to lack of 5-level paging.  Guest firmware is
   expected to use the information to safely remap BARs in the uppermost GPA
   space, i.e to avoid placing a BAR at a legal, but unmappable, GPA.

 - Use vfree() instead of kvfree() for allocations that always use vcalloc()
   or __vcalloc().

 - Don't completely ignore same-value writes to immutable feature MSRs, as
   doing so results in KVM failing to reject accesses to MSR that aren't
   supposed to exist given the vCPU model and/or KVM configuration.

 - Don't mark APICv as being inhibited due to ABSENT if APICv is disabled
   KVM-wide to avoid confusing debuggers (KVM will never bother clearing the
   ABSENT inhibit, even if userspace enables in-kernel local APIC).
2024-05-12 03:18:44 -04:00
Paolo Bonzini
f4bc1373d5 Merge tag 'kvm-x86-generic-6.10' of https://github.com/kvm-x86/linux into HEAD
KVM cleanups for 6.10:

 - Misc cleanups extracted from the "exit on missing userspace mapping" series,
   which has been put on hold in anticipation of a "KVM Userfault" approach,
   which should provide a superset of functionality.

 - Remove kvm_make_all_cpus_request_except(), which got added to hack around an
   AVIC bug, and then became dead code when a more robust fix came along.

 - Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation.
2024-05-12 03:16:47 -04:00
Paolo Bonzini
e5f62e27b1 Merge tag 'kvmarm-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 6.10

- Move a lot of state that was previously stored on a per vcpu
  basis into a per-CPU area, because it is only pertinent to the
  host while the vcpu is loaded. This results in better state
  tracking, and a smaller vcpu structure.

- Add full handling of the ERET/ERETAA/ERETAB instructions in
  nested virtualisation. The last two instructions also require
  emulating part of the pointer authentication extension.
  As a result, the trap handling of pointer authentication has
  been greattly simplified.

- Turn the global (and not very scalable) LPI translation cache
  into a per-ITS, scalable cache, making non directly injected
  LPIs much cheaper to make visible to the vcpu.

- A batch of pKVM patches, mostly fixes and cleanups, as the
  upstreaming process seems to be resuming. Fingers crossed!

- Allocate PPIs and SGIs outside of the vcpu structure, allowing
  for smaller EL2 mapping and some flexibility in implementing
  more or less than 32 private IRQs.

- Purge stale mpidr_data if a vcpu is created after the MPIDR
  map has been created.

- Preserve vcpu-specific ID registers across a vcpu reset.

- Various minor cleanups and improvements.
2024-05-12 03:15:53 -04:00
Paolo Bonzini
4232da23d7 Merge tag 'loongarch-kvm-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.10

1. Add ParaVirt IPI support.
2. Add software breakpoint support.
3. Add mmio trace events support.
2024-05-10 13:20:18 -04:00
David Hildenbrand
29ae7d96d1 mm: pass VMA instead of MM to follow_pte()
... and centralize the VM_IO/VM_PFNMAP sanity check in there. We'll
now also perform these sanity checks for direct follow_pte()
invocations.

For generic_access_phys(), we might now check multiple times: nothing to
worry about, really.

Link: https://lkml.kernel.org/r/20240410155527.474777-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Sean Christopherson <seanjc@google.com>	[KVM]
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Fei Li <fei1.li@intel.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Yonghua Huang <yonghua.huang@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-05 17:53:27 -07:00
Venkatesh Srinivas
82e9c84d87 KVM: Remove kvm_make_all_cpus_request_except()
Remove kvm_make_all_cpus_request_except() as it effectively has no users,
and arguably should never have been added in the first place.

Commit 54163a346d ("KVM: Introduce kvm_make_all_cpus_request_except()")
added the "except" variation for use in SVM's AVIC update path, which used
it to skip sending a request to the current vCPU (commit 7d611233b0
("KVM: SVM: Disable AVIC before setting V_IRQ")).

But the AVIC usage of kvm_make_all_cpus_request_except() was essentially a
hack-a-fix that simply squashed the most likely scenario of a racy WARN
without addressing the underlying problem(s).  Commit f1577ab214 ("KVM:
SVM: svm_set_vintr don't warn if AVIC is active but is about to be
deactivated") eventually fixed the WARN itself, and the "except" usage was
subsequently dropped by df63202fe5 ("KVM: x86: APICv: drop immediate
APICv disablement on current vCPU").

That kvm_make_all_cpus_request_except() hasn't gained any users in the
last ~3 years isn't a coincidence.  If a VM-wide broadcast *needs* to skip
the current vCPU, then odds are very good that there is underlying bug
that could be better fixed elsewhere.

Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org>
Link: https://lore.kernel.org/r/20240404232651.1645176-1-venkateshs@chromium.org
[sean: rewrite changelog with --verbose]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-05-02 07:47:03 -07:00
Oliver Upton
ea54dd3742 KVM: Treat the device list as an rculist
A subsequent change to KVM/arm64 will necessitate walking the device
list outside of the kvm->lock. Prepare by converting to an rculist. This
has zero effect on the VM destruction path, as it is expected every
reader is backed by a reference on the kvm struct.

On the other hand, ensure a given device is completely destroyed before
dropping the kvm->lock in the release() path, as certain devices expect
to be a singleton (e.g. the vfio-kvm device).

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20240422200158.2606761-2-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-04-25 13:19:55 +01:00
Sean Christopherson
c23e2b7103 KVM: Allow page-sized MMU caches to be initialized with custom 64-bit values
Add support to MMU caches for initializing a page with a custom 64-bit
value, e.g. to pre-fill an entire page table with non-zero PTE values.
The functionality will be used by x86 to support Intel's TDX, which needs
to set bit 63 in all non-present PTEs in order to prevent !PRESENT page
faults from getting reflected into the guest (Intel's EPT Violation #VE
architecture made the less than brilliant decision of having the per-PTE
behavior be opt-out instead of opt-in).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Message-Id: <5919f685f109a1b0ebc6bd8fc4536ee94bcc172d.1705965635.git.isaku.yamahata@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-19 12:15:18 -04:00
Sean Christopherson
eefb85b3f0 KVM: Drop unused @may_block param from gfn_to_pfn_cache_invalidate_start()
Remove gfn_to_pfn_cache_invalidate_start()'s unused @may_block parameter,
which was leftover from KVM's abandoned (for now) attempt to support guest
usage of gfn_to_pfn caches.

Fixes: a4bff3df51 ("KVM: pfncache: remove KVM_GUEST_USES_PFN usage")
Reported-by: Like Xu <like.xu.linux@gmail.com>
Cc: Paul Durrant <paul@xen.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240305003742.245767-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-11 12:58:53 -07:00
Paolo Bonzini
5257de954c KVM: remove unused argument of kvm_handle_hva_range()
The only user was kvm_mmu_notifier_change_pte(), which is now gone.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240405115815.3226315-3-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-11 13:18:35 -04:00
Paolo Bonzini
f3b65bbaed KVM: delete .change_pte MMU notifier callback
The .change_pte() MMU notifier callback was intended as an
optimization. The original point of it was that KSM could tell KVM to flip
its secondary PTE to a new location without having to first zap it. At
the time there was also an .invalidate_page() callback; both of them were
*not* bracketed by calls to mmu_notifier_invalidate_range_{start,end}(),
and .invalidate_page() also doubled as a fallback implementation of
.change_pte().

Later on, however, both callbacks were changed to occur within an
invalidate_range_start/end() block.

In the case of .change_pte(), commit 6bdb913f0a ("mm: wrap calls to
set_pte_at_notify with invalidate_range_start and invalidate_range_end",
2012-10-09) did so to remove the fallback from .invalidate_page() to
.change_pte() and allow sleepable .invalidate_page() hooks.

This however made KVM's usage of the .change_pte() callback completely
moot, because KVM unmaps the sPTEs during .invalidate_range_start()
and therefore .change_pte() has no hope of finding a sPTE to change.
Drop the generic KVM code that dispatches to kvm_set_spte_gfn(), as
well as all the architecture specific implementations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Anup Patel <anup@brainfault.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Message-ID: <20240405115815.3226315-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-11 13:18:27 -04:00
Anish Moorthy
f588557ac4 KVM: Simplify error handling in __gfn_to_pfn_memslot()
KVM_HVA_ERR_RO_BAD satisfies kvm_is_error_hva(), so there's no need to
duplicate the "if (writable)" block. Fix this by bringing all
kvm_is_error_hva() cases under one conditional.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
Link: https://lore.kernel.org/r/20240215235405.368539-5-amoorthy@google.com
[sean: use ternary operator]
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09 14:10:08 -07:00
Anish Moorthy
a3bd2f7ead KVM: Add function comments for __kvm_read/write_guest_page()
The (gfn, data, offset, len) order of parameters is a little strange
since "offset" applies to "gfn" rather than to "data". Add function
comments to make things perfectly clear.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
Link: https://lore.kernel.org/r/20240215235405.368539-3-amoorthy@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09 14:00:48 -07:00
Anish Moorthy
ed2f049fc1 KVM: Clarify meaning of hva_to_pfn()'s 'atomic' parameter
The current description can be read as "atomic -> allowed to sleep,"
when in fact the intended statement is "atomic -> NOT allowed to sleep."
Make that clearer in the docstring.

Signed-off-by: Anish Moorthy <amoorthy@google.com>
Link: https://lore.kernel.org/r/20240215235405.368539-2-amoorthy@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09 14:00:48 -07:00
Li RongQing
a952d608f0 KVM: Use vfree for memory allocated by vcalloc()/__vcalloc()
commit 37b2a6510a48("KVM: use __vcalloc for very large allocations")
replaced kvzalloc()/kvcalloc() with vcalloc(), but didn't replace kvfree()
with vfree().

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Link: https://lore.kernel.org/r/20240131012357.53563-1-lirongqing@baidu.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-09 12:18:38 -07:00
Sean Christopherson
fc62a4e8de KVM: Explicitly disallow activatating a gfn_to_pfn_cache with INVALID_GPA
Explicit disallow activating a gfn_to_pfn_cache with an error gpa, i.e.
INVALID_GPA, to ensure that KVM doesn't mistake a GPA-based cache for an
HVA-based cache (KVM uses INVALID_GPA as a magic value to differentiate
between GPA-based and HVA-based caches).

WARN if KVM attempts to activate a cache with INVALID_GPA, purely so that
new caches need to at least consider what to do with a "bad" GPA, as all
existing usage of kvm_gpc_activate() guarantees gpa != INVALID_GPA.  I.e.
removing the WARN in the future is completely reasonable if doing so would
yield cleaner/better code overall.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Link: https://lore.kernel.org/r/20240320001542.3203871-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08 13:20:24 -07:00
Sean Christopherson
5c9ca4ed89 KVM: Check validity of offset+length of gfn_to_pfn_cache prior to activation
When activating a gfn_to_pfn_cache, verify that the offset+length is sane
and usable before marking the cache active.  Letting __kvm_gpc_refresh()
detect the problem results in a cache being marked active without setting
the GPA (or any other fields), which in turn results in KVM trying to
refresh a cache with INVALID_GPA.

Attempting to refresh a cache with INVALID_GPA isn't functionally
problematic, but it runs afoul of the sanity check that exactly one of
GPA or userspace HVA is valid, i.e. that a cache is either GPA-based or
HVA-based.

Reported-by: syzbot+106a4f72b0474e1d1b33@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000005fa5cc0613f1cebd@google.com
Fixes: 721f5b0dda ("KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA")
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Paul Durrant <paul@xen.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240320001542.3203871-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08 13:20:24 -07:00
Sean Christopherson
18f06e9769 KVM: Add helpers to consolidate gfn_to_pfn_cache's page split check
Add a helper to check that the incoming length for a gfn_to_pfn_cache is
valid with respect to the cache's GPA and/or HVA.  To avoid activating a
cache with a bogus GPA, a future fix will fork the page split check in
the inner refresh path into activate() and the public rerfresh() APIs, at
which point KVM will check the length in three separate places.

Deliberately keep the "page offset" logic open coded, as the only other
path that consumes the offset, __kvm_gpc_refresh(), already needs to
differentiate between GPA-based and HVA-based caches, and it's not obvious
that using a helper is a net positive in overall code readability.

Note, for GPA-based caches, this has a subtle side effect of using the GPA
instead of the resolved HVA in the check() path, but that should be a nop
as the HVA offset is derived from the GPA, i.e. the two offsets are
identical, barring a KVM bug.

Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Link: https://lore.kernel.org/r/20240320001542.3203871-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-04-08 13:20:23 -07:00
Paolo Bonzini
e9a2bba476 Merge tag 'kvm-x86-xen-6.9' of https://github.com/kvm-x86/linux into HEAD
KVM Xen and pfncache changes for 6.9:

 - Rip out the half-baked support for using gfn_to_pfn caches to manage pages
   that are "mapped" into guests via physical addresses.

 - Add support for using gfn_to_pfn caches with only a host virtual address,
   i.e. to bypass the "gfn" stage of the cache.  The primary use case is
   overlay pages, where the guest may change the gfn used to reference the
   overlay page, but the backing hva+pfn remains the same.

 - Add an ioctl() to allow mapping Xen's shared_info page using an hva instead
   of a gpa, so that userspace doesn't need to reconfigure and invalidate the
   cache/mapping if the guest changes the gpa (but userspace keeps the resolved
   hva the same).

 - When possible, use a single host TSC value when computing the deadline for
   Xen timers in order to improve the accuracy of the timer emulation.

 - Inject pending upcall events when the vCPU software-enables its APIC to fix
   a bug where an upcall can be lost (and to follow Xen's behavior).

 - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen
   events fails, e.g. if the guest has aliased xAPIC IDs.

 - Extend gfn_to_pfn_cache's mutex to cover (de)activation (in addition to
   refresh), and drop a now-redundant acquisition of xen_lock (that was
   protecting the shared_info cache) to fix a deadlock due to recursively
   acquiring xen_lock.
2024-03-11 10:42:55 -04:00
Paolo Bonzini
c9cd0beae9 Merge tag 'kvm-x86-misc-6.9' of https://github.com/kvm-x86/linux into HEAD
KVM x86 misc changes for 6.9:

 - Explicitly initialize a variety of on-stack variables in the emulator that
   triggered KMSAN false positives (though in fairness in KMSAN, it's comically
   difficult to see that the uninitialized memory is never truly consumed).

 - Fix the deubgregs ABI for 32-bit KVM, and clean up code related to reading
   DR6 and DR7.

 - Rework the "force immediate exit" code so that vendor code ultimately
   decides how and when to force the exit.  This allows VMX to further optimize
   handling preemption timer exits, and allows SVM to avoid sending a duplicate
   IPI (SVM also has a need to force an exit).

 - Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if
   vCPU creation ultimately failed, and add WARN to guard against similar bugs.

 - Provide a dedicated arch hook for checking if a different vCPU was in-kernel
   (for directed yield), and simplify the logic for checking if the currently
   loaded vCPU is in-kernel.

 - Misc cleanups and fixes.
2024-03-11 10:24:56 -04:00
Paolo Bonzini
507e72f899 Merge tag 'kvm-x86-generic-6.9' of https://github.com/kvm-x86/linux into HEAD
KVM common MMU changes for 6.9:

  - Harden KVM against underflowing the active mmu_notifier invalidation
    count, so that "bad" invalidations (usually due to bugs elsehwere in the
    kernel) are detected earlier and are less likely to hang the kernel.

  - Fix a benign bug in __kvm_mmu_topup_memory_cache() where the object size
    and number of objects parameters to kvmalloc_array() were swapped.
2024-03-11 10:23:03 -04:00
Paolo Bonzini
a81d95ae8c Merge tag 'kvm-x86-asyncpf-6.9' of https://github.com/kvm-x86/linux into HEAD
KVM async page fault changes for 6.9:

 - Always flush the async page fault workqueue when a work item is being
   removed, especially during vCPU destruction, to ensure that there are no
   workers running in KVM code when all references to KVM-the-module are gone,
   i.e. to prevent a use-after-free if kvm.ko is unloaded.

 - Grab a reference to the VM's mm_struct in the async #PF worker itself instead
   of gifting the worker a reference, e.g. so that there's no need to remember
   to *conditionally* clean up after the worker.
2024-03-11 10:22:41 -04:00
Paolo Bonzini
961e2bfcf3 Merge tag 'kvmarm-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.9

 - Infrastructure for building KVM's trap configuration based on the
   architectural features (or lack thereof) advertised in the VM's ID
   registers

 - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to
   x86's WC) at stage-2, improving the performance of interacting with
   assigned devices that can tolerate it

 - Conversion of KVM's representation of LPIs to an xarray, utilized to
   address serialization some of the serialization on the LPI injection
   path

 - Support for _architectural_ VHE-only systems, advertised through the
   absence of FEAT_E2H0 in the CPU's ID register

 - Miscellaneous cleanups, fixes, and spelling corrections to KVM and
   selftests
2024-03-11 10:02:32 -04:00
Paolo Bonzini
7d8942d8e7 Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of https://github.com/kvm-x86/linux into HEAD
KVM GUEST_MEMFD fixes for 6.8:

 - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
   avoid creating ABI that KVM can't sanely support.

 - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
   clear that such VMs are purely a development and testing vehicle, and
   come with zero guarantees.

 - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
   is to support confidential VMs with deterministic private memory (SNP
   and TDX) only in the TDP MMU.

 - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
   when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
2024-03-09 11:48:35 -05:00