Commit Graph

14110 Commits

Author SHA1 Message Date
Geert Uytterhoeven
ad614a706b drm/xe/oa/uapi: Make bit masks unsigned
When building with gcc-5:

    In function ‘decode_oa_format.isra.26’,
	inlined from ‘xe_oa_set_prop_oa_format’ at drivers/gpu/drm/xe/xe_oa.c:1664:6:
    ././include/linux/compiler_types.h:510:38: error: call to ‘__compiletime_assert_1336’ declared with attribute error: FIELD_GET: mask is not constant
    [...]
    ./include/linux/bitfield.h:155:3: note: in expansion of macro ‘__BF_FIELD_CHECK’
       __BF_FIELD_CHECK(_mask, _reg, 0U, "FIELD_GET: "); \
       ^
    drivers/gpu/drm/xe/xe_oa.c:1573:18: note: in expansion of macro ‘FIELD_GET’
      u32 bc_report = FIELD_GET(DRM_XE_OA_FORMAT_MASK_BC_REPORT, fmt);
		      ^

Fixes: b6fd51c621 ("drm/xe/oa/uapi: Define and parse OA stream properties")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240729092634.2227611-1-geert+renesas@glider.be
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit f2881dfdaa)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-08-19 10:44:18 -04:00
Linus Torvalds
e1bc113215 Merge tag 'char-misc-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc fixes from Greg KH:
 "Here are some small char/misc fixes for 6.11-rc4 to resolve reported
  problems. Included in here are:

   - fastrpc revert of a change that broke userspace

   - xillybus fixes for reported issues

  Half of these have been in linux-next this week with no reported
  problems, I don't know if the last bit of xillybus driver changes made
  it in, but they are 'obviously correct' so will be safe :)"

* tag 'char-misc-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  char: xillybus: Check USB endpoints when probing device
  char: xillybus: Refine workqueue handling
  Revert "misc: fastrpc: Restrict untrusted app to attach to privileged PD"
  char: xillybus: Don't destroy workqueue from work item running on it
2024-08-18 10:16:34 -07:00
Linus Torvalds
c5ac744cdd Merge tag 'io_uring-6.11-20240824' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:

 - Fix a comment in the uapi header using the wrong member name (Caleb)

 - Fix KCSAN warning for a debug check in sqpoll (me)

 - Two more NAPI tweaks (Olivier)

* tag 'io_uring-6.11-20240824' of git://git.kernel.dk/linux:
  io_uring: fix user_data field name in comment
  io_uring/sqpoll: annotate debug task == current with data_race()
  io_uring/napi: remove duplicate io_napi_entry timeout assignation
  io_uring/napi: check napi_enabled in io_napi_add() before proceeding
2024-08-16 14:00:05 -07:00
Caleb Sander Mateos
1fc2ac428e io_uring: fix user_data field name in comment
io_uring_cqe's user_data field refers to `sqe->data`, but io_uring_sqe
does not have a data field. Fix the comment to say `sqe->user_data`.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Link: https://github.com/axboe/liburing/pull/1206
Link: https://lore.kernel.org/r/20240816181526.3642732-1-csander@purestorage.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-08-16 12:31:26 -06:00
Griffin Kroah-Hartman
9bb5e74b2b Revert "misc: fastrpc: Restrict untrusted app to attach to privileged PD"
This reverts commit bab2f5e8fd.

Joel reported that this commit breaks userspace and stops sensors in
SDM845 from working. Also breaks other qcom SoC devices running postmarketOS.

Cc: stable <stable@kernel.org>
Cc: Ekansh Gupta <quic_ekangupt@quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Reported-by: Joel Selvaraj <joelselvaraj.oss@gmail.com>
Link: https://lore.kernel.org/r/9a9f5646-a554-4b65-8122-d212bb665c81@umsystem.edu
Signed-off-by: Griffin Kroah-Hartman <griffin@kroah.com>
Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Fixes: bab2f5e8fd ("misc: fastrpc: Restrict untrusted app to attach to privileged PD")
Link: https://lore.kernel.org/r/20240815094920.8242-1-griffin@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-15 16:59:14 +02:00
Linus Torvalds
d07b43284a Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
 "s390:

   - Fix failure to start guests with kvm.use_gisa=0

   - Panic if (un)share fails to maintain security.

  ARM:

   - Use kvfree() for the kvmalloc'd nested MMUs array

   - Set of fixes to address warnings in W=1 builds

   - Make KVM depend on assembler support for ARMv8.4

   - Fix for vgic-debug interface for VMs without LPIs

   - Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest

   - Minor code / comment cleanups for configuring PAuth traps

   - Take kvm->arch.config_lock to prevent destruction / initialization
     race for a vCPU's CPUIF which may lead to a UAF

  x86:

   - Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)

   - Fix smatch issues

   - Small cleanups

   - Make x2APIC ID 100% readonly

   - Fix typo in uapi constant

  Generic:

   - Use synchronize_srcu_expedited() on irqfd shutdown"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
  KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG
  KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
  KVM: eventfd: Use synchronize_srcu_expedited() on shutdown
  KVM: selftests: Add a testcase to verify x2APIC is fully readonly
  KVM: x86: Make x2APIC ID 100% readonly
  KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id())
  KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page()
  KVM: SVM: Fix an error code in sev_gmem_post_populate()
  KVM: SVM: Fix uninitialized variable bug
  KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface
  KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list
  KVM: arm64: Tidying up PAuth code in KVM
  KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI
  KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain
  s390/uv: Panic for set and remove shared access UVC errors
  KVM: s390: fix validity interception issue when gisa is switched off
  docs: KVM: Fix register ID of SPSR_FIQ
  KVM: arm64: vgic: fix unexpected unlock sparse warnings
  KVM: arm64: fix kdoc warnings in W=1 builds
  KVM: arm64: fix override-init warnings in W=1 builds
  ...
2024-08-14 13:46:24 -07:00
Amit Shah
1c0e588169 KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG
"INVALID" is misspelt in "SEV_RET_INAVLID_CONFIG". Since this is part of
the UAPI, keep the current definition and add a new one with the fix.

Fix-suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Amit Shah <amit.shah@amd.com>
Message-ID: <20240814083113.21622-1-amit@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-08-14 13:05:42 -04:00
Linus Torvalds
4ac0f08f44 Merge tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
 "VFS:

   - Fix the name of file lease slab cache. When file leases were split
     out of file locks the name of the file lock slab cache was used for
     the file leases slab cache as well.

   - Fix a type in take_fd() helper.

   - Fix infinite directory iteration for stable offsets in tmpfs.

   - When the icache is pruned all reclaimable inodes are marked with
     I_FREEING and other processes that try to lookup such inodes will
     block.

     But some filesystems like ext4 can trigger lookups in their inode
     evict callback causing deadlocks. Ext4 does such lookups if the
     ea_inode feature is used whereby a separate inode may be used to
     store xattrs.

     Introduce I_LRU_ISOLATING which pins the inode while its pages are
     reclaimed. This avoids inode deletion during inode_lru_isolate()
     avoiding the deadlock and evict is made to wait until
     I_LRU_ISOLATING is done.

  netfs:

   - Fault in smaller chunks for non-large folio mappings for
     filesystems that haven't been converted to large folios yet.

   - Fix the CONFIG_NETFS_DEBUG config option. The config option was
     renamed a short while ago and that introduced two minor issues.
     First, it depended on CONFIG_NETFS whereas it wants to depend on
     CONFIG_NETFS_SUPPORT. The former doesn't exist, while the latter
     does. Second, the documentation for the config option wasn't fixed
     up.

   - Revert the removal of the PG_private_2 writeback flag as ceph is
     using it and fix how that flag is handled in netfs.

   - Fix DIO reads on 9p. A program watching a file on a 9p mount
     wouldn't see any changes in the size of the file being exported by
     the server if the file was changed directly in the source
     filesystem. Fix this by attempting to read the full size specified
     when a DIO read is requested.

   - Fix a NULL pointer dereference bug due to a data race where a
     cachefiles cookies was retired even though it was still in use.
     Check the cookie's n_accesses counter before discarding it.

  nsfs:

   - Fix ioctl declaration for NS_GET_MNTNS_ID from _IO() to _IOR() as
     the kernel is writing to userspace.

  pidfs:

   - Prevent the creation of pidfds for kthreads until we have a
     use-case for it and we know the semantics we want. It also confuses
     userspace why they can get pidfds for kthreads.

  squashfs:

   - Fix an unitialized value bug reported by KMSAN caused by a
     corrupted symbolic link size read from disk. Check that the
     symbolic link size is not larger than expected"

* tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  Squashfs: sanity check symbolic link size
  9p: Fix DIO read through netfs
  vfs: Don't evict inode under the inode lru traversing context
  netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
  netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
  file: fix typo in take_fd() comment
  pidfd: prevent creation of pidfds for kthreads
  netfs: clean up after renaming FSCACHE_DEBUG config
  libfs: fix infinite directory reads for offset dir
  nsfs: fix ioctl declaration
  fs/netfs/fscache_cookie: add missing "n_accesses" check
  filelock: fix name of file_lease slab cache
  netfs: Fault in smaller chunks for non-large folio mappings
2024-08-14 09:06:28 -07:00
Christian Brauner
42b0f8da3a nsfs: fix ioctl declaration
The kernel is writing an object of type __u64, so the ioctl has to be
defined to _IOR(NSIO, 0x5, __u64) instead of _IO(NSIO, 0x5).

Reported-by: Dmitry V. Levin <ldv@strace.io>
Link: https://lore.kernel.org/r/20240730164554.GA18486@altlinux.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12 22:03:26 +02:00
Arnd Bergmann
54233a4254 uretprobe: change syscall number, again
Despite multiple attempts to get the syscall number assignment right
for the newly added uretprobe syscall, we ended up with a bit of a mess:

 - The number is defined as 467 based on the assumption that the
   xattrat family of syscalls would use 463 through 466, but those
   did not make it into 6.11.

 - The include/uapi/asm-generic/unistd.h file still lists the number
   463, but the new scripts/syscall.tbl that was supposed to have the
   same data lists 467 instead as the number for arc, arm64, csky,
   hexagon, loongarch, nios2, openrisc and riscv. None of these
   architectures actually provide a uretprobe syscall.

 - All the other architectures (powerpc, arm, mips, ...) don't list
   this syscall at all.

There are two ways to make it consistent again: either list it with
the same syscall number on all architectures, or only list it on x86
but not in scripts/syscall.tbl and asm-generic/unistd.h.

Based on the most recent discussion, it seems like we won't need it
anywhere else, so just remove the inconsistent assignment and instead
move the x86 number to the next available one in the architecture
specific range, which is 335.

Fixes: 5c28424e9a ("syscalls: Fix to add sys_uretprobe to syscall.tbl")
Fixes: 190fec72df ("uprobe: Wire up uretprobe system call")
Fixes: 63ded11097 ("uprobe: Change uretprobe syscall scope and number")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-08-02 15:18:49 +02:00
Linus Torvalds
1722389b0d Merge tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf and netfilter.

  A lot of networking people were at a conference last week, busy
  catching COVID, so relatively short PR.

  Current release - regressions:

   - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP

  Current release - new code bugs:

   - l2tp: protect session IDR and tunnel session list with one lock,
     make sure the state is coherent to avoid a warning

   - eth: bnxt_en: update xdp_rxq_info in queue restart logic

   - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field

  Previous releases - regressions:

   - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len,
     the field reuses previously un-validated pad

  Previous releases - always broken:

   - tap/tun: drop short frames to prevent crashes later in the stack

   - eth: ice: add a per-VF limit on number of FDIR filters

   - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash"

* tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits)
  tun: add missing verification for short frame
  tap: add missing verification for short frame
  mISDN: Fix a use after free in hfcmulti_tx()
  gve: Fix an edge case for TSO skb validity check
  bnxt_en: update xdp_rxq_info in queue restart logic
  tcp: process the 3rd ACK with sk_socket for TFO/MPTCP
  selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
  xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
  bpf: Fix a segment issue when downgrading gso_size
  net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling
  MAINTAINERS: make Breno the netconsole maintainer
  MAINTAINERS: Update bonding entry
  net: nexthop: Initialize all fields in dumped nexthops
  net: stmmac: Correct byte order of perfect_match
  selftests: forwarding: skip if kernel not support setting bridge fdb learning limit
  tipc: Return non-zero value from tipc_udp_addr2str() on error
  netfilter: nft_set_pipapo_avx2: disable softinterrupts
  ice: Fix recipe read procedure
  ice: Add a per-VF limit on number of FDIR filters
  net: bonding: correctly annotate RCU in bond_should_notify_peers()
  ...
2024-07-25 13:32:25 -07:00
Linus Torvalds
f9bcc61ad1 Merge tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux
Pull UML updates from Richard Weinberger:

 - Support for preemption

 - i386 Rust support

 - Huge cleanup by Benjamin Berg

 - UBSAN support

 - Removal of dead code

* tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (41 commits)
  um: vector: always reset vp->opened
  um: vector: remove vp->lock
  um: register power-off handler
  um: line: always fill *error_out in setup_one_line()
  um: remove pcap driver from documentation
  um: Enable preemption in UML
  um: refactor TLB update handling
  um: simplify and consolidate TLB updates
  um: remove force_flush_all from fork_handler
  um: Do not flush MM in flush_thread
  um: Delay flushing syscalls until the thread is restarted
  um: remove copy_context_skas0
  um: remove LDT support
  um: compress memory related stub syscalls while adding them
  um: Rework syscall handling
  um: Add generic stub_syscall6 function
  um: Create signal stack memory assignment in stub_data
  um: Remove stub-data.h include from common-offsets.h
  um: time-travel: fix signal blocking race/hang
  um: time-travel: remove time_exit()
  ...
2024-07-25 12:33:08 -07:00
Jakub Kicinski
f7578df913 Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2024-07-25

We've added 14 non-merge commits during the last 8 day(s) which contain
a total of 19 files changed, 177 insertions(+), 70 deletions(-).

The main changes are:

1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and
   BPF sockhash. Also add test coverage for this case, from Michal Luczaj.

2) Fix a segmentation issue when downgrading gso_size in the BPF helper
   bpf_skb_adjust_room(), from Fred Li.

3) Fix a compiler warning in resolve_btfids due to a missing type cast,
   from Liwei Song.

4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte
   boundary in the fexit_sleep BPF selftest, from Puranjay Mohan.

5) Fix a xsk regression to require a flag when actuating tx_metadata_len,
   from Stanislav Fomichev.

6) Fix function prototype BTF dumping in libbpf for prototypes that have
   no input arguments, from Andrii Nakryiko.

7) Fix stacktrace symbol resolution in perf script for BPF programs
   containing subprograms, from Hou Tao.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test
  xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
  bpf: Fix a segment issue when downgrading gso_size
  tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids
  bpf, events: Use prog to emit ksymbol event for main program
  selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB
  selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags
  selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected()
  af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash
  bpftool: Fix typo in usage help
  libbpf: Fix no-args func prototype BTF dumping syntax
  MAINTAINERS: Update powerpc BPF JIT maintainers
  MAINTAINERS: Update email address of Naveen
  selftests/bpf: fexit_sleep: Fix stack allocation for arm64
====================

Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-25 07:40:25 -07:00
Stanislav Fomichev
d5e726d914 xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len
Julian reports that commit 341ac980ea ("xsk: Support tx_metadata_len")
can break existing use cases which don't zero-initialize xdp_umem_reg
padding. Introduce new XDP_UMEM_TX_METADATA_LEN to make sure we
interpret the padding as tx_metadata_len only when being explicitly
asked.

Fixes: 341ac980ea ("xsk: Support tx_metadata_len")
Reported-by: Julian Schindel <mail@arctic-alpaca.de>
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/bpf/20240713015253.121248-2-sdf@fomichev.me
2024-07-25 11:57:27 +02:00
Linus Torvalds
7a3fad30fd Merge tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
 "This adds getrandom() support to the vDSO.

  First, it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which
  lets the kernel zero out pages anytime under memory pressure, which
  enables allocating memory that never gets swapped to disk but also
  doesn't count as being mlocked.

  Then, the vDSO implementation of getrandom() is introduced in a
  generic manner and hooked into random.c.

  Next, this is implemented on x86. (Also, though it's not ready for
  this pull, somebody has begun an arm64 implementation already)

  Finally, two vDSO selftests are added.

  There are also two housekeeping cleanup commits"

* tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
  MAINTAINERS: add random.h headers to RNG subsection
  random: note that RNDGETPOOL was removed in 2.6.9-rc2
  selftests/vDSO: add tests for vgetrandom
  x86: vdso: Wire up getrandom() vDSO implementation
  random: introduce generic vDSO getrandom() implementation
  mm: add MAP_DROPPABLE for designating always lazily freeable mappings
2024-07-24 10:29:50 -07:00
Linus Torvalds
fbc90c042c Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:

 - In the series "mm: Avoid possible overflows in dirty throttling" Jan
   Kara addresses a couple of issues in the writeback throttling code.
   These fixes are also targetted at -stable kernels.

 - Ryusuke Konishi's series "nilfs2: fix potential issues related to
   reserved inodes" does that. This should actually be in the
   mm-nonmm-stable tree, along with the many other nilfs2 patches. My
   bad.

 - More folio conversions from Kefeng Wang in the series "mm: convert to
   folio_alloc_mpol()"

 - Kemeng Shi has sent some cleanups to the writeback code in the series
   "Add helper functions to remove repeated code and improve readability
   of cgroup writeback"

 - Kairui Song has made the swap code a little smaller and a little
   faster in the series "mm/swap: clean up and optimize swap cache
   index".

 - In the series "mm/memory: cleanly support zeropage in
   vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David
   Hildenbrand has reworked the rather sketchy handling of the use of
   the zeropage in MAP_SHARED mappings. I don't see any runtime effects
   here - more a cleanup/understandability/maintainablity thing.

 - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling
   of higher addresses, for aarch64. The (poorly named) series is
   "Restructure va_high_addr_switch".

 - The core TLB handling code gets some cleanups and possible slight
   optimizations in Bang Li's series "Add update_mmu_tlb_range() to
   simplify code".

 - Jane Chu has improved the handling of our
   fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in
   the series "Enhance soft hwpoison handling and injection".

 - Jeff Johnson has sent a billion patches everywhere to add
   MODULE_DESCRIPTION() to everything. Some landed in this pull.

 - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang
   has simplified migration's use of hardware-offload memory copying.

 - Yosry Ahmed performs more folio API conversions in his series "mm:
   zswap: trivial folio conversions".

 - In the series "large folios swap-in: handle refault cases first",
   Chuanhua Han inches us forward in the handling of large pages in the
   swap code. This is a cleanup and optimization, working toward the end
   objective of full support of large folio swapin/out.

 - In the series "mm,swap: cleanup VMA based swap readahead window
   calculation", Huang Ying has contributed some cleanups and a possible
   fixlet to his VMA based swap readahead code.

 - In the series "add mTHP support for anonymous shmem" Baolin Wang has
   taught anonymous shmem mappings to use multisize THP. By default this
   is a no-op - users must opt in vis sysfs controls. Dramatic
   improvements in pagefault latency are realized.

 - David Hildenbrand has some cleanups to our remaining use of
   page_mapcount() in the series "fs/proc: move page_mapcount() to
   fs/proc/internal.h".

 - David also has some highmem accounting cleanups in the series
   "mm/highmem: don't track highmem pages manually".

 - Build-time fixes and cleanups from John Hubbard in the series
   "cleanups, fixes, and progress towards avoiding "make headers"".

 - Cleanups and consolidation of the core pagemap handling from Barry
   Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers
   and utilize them".

 - Lance Yang's series "Reclaim lazyfree THP without splitting" has
   reduced the latency of the reclaim of pmd-mapped THPs under fairly
   common circumstances. A 10x speedup is seen in a microbenchmark.

   It does this by punting to aother CPU but I guess that's a win unless
   all CPUs are pegged.

 - hugetlb_cgroup cleanups from Xiu Jianfeng in the series
   "mm/hugetlb_cgroup: rework on cftypes".

 - Miaohe Lin's series "Some cleanups for memory-failure" does just that
   thing.

 - Someone other than SeongJae has developed a DAMON feature in Honggyu
   Kim's series "DAMON based tiered memory management for CXL memory".
   This adds DAMON features which may be used to help determine the
   efficiency of our placement of CXL/PCIe attached DRAM.

 - DAMON user API centralization and simplificatio work in SeongJae
   Park's series "mm/damon: introduce DAMON parameters online commit
   function".

 - In the series "mm: page_type, zsmalloc and page_mapcount_reset()"
   David Hildenbrand does some maintenance work on zsmalloc - partially
   modernizing its use of pageframe fields.

 - Kefeng Wang provides more folio conversions in the series "mm: remove
   page_maybe_dma_pinned() and page_mkclean()".

 - More cleanup from David Hildenbrand, this time in the series
   "mm/memory_hotplug: use PageOffline() instead of PageReserved() for
   !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline()
   pages" and permits the removal of some virtio-mem hacks.

 - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and
   __folio_add_anon_rmap()" is a cleanup to the anon folio handling in
   preparation for mTHP (multisize THP) swapin.

 - Kefeng Wang's series "mm: improve clear and copy user folio"
   implements more folio conversions, this time in the area of large
   folio userspace copying.

 - The series "Docs/mm/damon/maintaier-profile: document a mailing tool
   and community meetup series" tells people how to get better involved
   with other DAMON developers. From SeongJae Park.

 - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does
   that.

 - David Hildenbrand sends along more cleanups, this time against the
   migration code. The series is "mm/migrate: move NUMA hinting fault
   folio isolation + checks under PTL".

 - Jan Kara has found quite a lot of strangenesses and minor errors in
   the readahead code. He addresses this in the series "mm: Fix various
   readahead quirks".

 - SeongJae Park's series "selftests/damon: test DAMOS tried regions and
   {min,max}_nr_regions" adds features and addresses errors in DAMON's
   self testing code.

 - Gavin Shan has found a userspace-triggerable WARN in the pagecache
   code. The series "mm/filemap: Limit page cache size to that supported
   by xarray" addresses this. The series is marked cc:stable.

 - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations
   and cleanup" cleans up and slightly optimizes KSM.

 - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of
   code motion. The series (which also makes the memcg-v1 code
   Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put
   under config option" and "mm: memcg: put cgroup v1-specific memcg
   data under CONFIG_MEMCG_V1"

 - Dan Schatzberg's series "Add swappiness argument to memory.reclaim"
   adds an additional feature to this cgroup-v2 control file.

 - The series "Userspace controls soft-offline pages" from Jiaqi Yan
   permits userspace to stop the kernel's automatic treatment of
   excessive correctable memory errors. In order to permit userspace to
   monitor and handle this situation.

 - Kefeng Wang's series "mm: migrate: support poison recover from
   migrate folio" teaches the kernel to appropriately handle migration
   from poisoned source folios rather than simply panicing.

 - SeongJae Park's series "Docs/damon: minor fixups and improvements"
   does those things.

 - In the series "mm/zsmalloc: change back to per-size_class lock"
   Chengming Zhou improves zsmalloc's scalability and memory
   utilization.

 - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for
   pinning memfd folios" makes the GUP code use FOLL_PIN rather than
   bare refcount increments. So these paes can first be moved aside if
   they reside in the movable zone or a CMA block.

 - Andrii Nakryiko has added a binary ioctl()-based API to
   /proc/pid/maps for much faster reading of vma information. The series
   is "query VMAs from /proc/<pid>/maps".

 - In the series "mm: introduce per-order mTHP split counters" Lance
   Yang improves the kernel's presentation of developer information
   related to multisize THP splitting.

 - Michael Ellerman has developed the series "Reimplement huge pages
   without hugepd on powerpc (8xx, e500, book3s/64)". This permits
   userspace to use all available huge page sizes.

 - In the series "revert unconditional slab and page allocator fault
   injection calls" Vlastimil Babka removes a performance-affecting and
   not very useful feature from slab fault injection.

* tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits)
  mm/mglru: fix ineffective protection calculation
  mm/zswap: fix a white space issue
  mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
  mm/hugetlb: fix possible recursive locking detected warning
  mm/gup: clear the LRU flag of a page before adding to LRU batch
  mm/numa_balancing: teach mpol_to_str about the balancing mode
  mm: memcg1: convert charge move flags to unsigned long long
  alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting
  lib: reuse page_ext_data() to obtain codetag_ref
  lib: add missing newline character in the warning message
  mm/mglru: fix overshooting shrinker memory
  mm/mglru: fix div-by-zero in vmpressure_calc_level()
  mm/kmemleak: replace strncpy() with strscpy()
  mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC
  mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB
  mm: ignore data-race in __swap_writepage
  hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr
  mm: shmem: rename mTHP shmem counters
  mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async()
  mm/migrate: putback split folios when numa hint migration fails
  ...
2024-07-21 17:15:46 -07:00
Linus Torvalds
2c9b351240 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
 "ARM:

   - Initial infrastructure for shadow stage-2 MMUs, as part of nested
     virtualization enablement

   - Support for userspace changes to the guest CTR_EL0 value, enabling
     (in part) migration of VMs between heterogenous hardware

   - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1
     of the protocol

   - FPSIMD/SVE support for nested, including merged trap configuration
     and exception routing

   - New command-line parameter to control the WFx trap behavior under
     KVM

   - Introduce kCFI hardening in the EL2 hypervisor

   - Fixes + cleanups for handling presence/absence of FEAT_TCRX

   - Miscellaneous fixes + documentation updates

  LoongArch:

   - Add paravirt steal time support

   - Add support for KVM_DIRTY_LOG_INITIALLY_SET

   - Add perf kvm-stat support for loongarch

  RISC-V:

   - Redirect AMO load/store access fault traps to guest

   - perf kvm stat support

   - Use guest files for IMSIC virtualization, when available

  s390:

   - Assortment of tiny fixes which are not time critical

  x86:

   - Fixes for Xen emulation

   - Add a global struct to consolidate tracking of host values, e.g.
     EFER

   - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the
     effective APIC bus frequency, because TDX

   - Print the name of the APICv/AVIC inhibits in the relevant
     tracepoint

   - Clean up KVM's handling of vendor specific emulation to
     consistently act on "compatible with Intel/AMD", versus checking
     for a specific vendor

   - Drop MTRR virtualization, and instead always honor guest PAT on
     CPUs that support self-snoop

   - Update to the newfangled Intel CPU FMS infrastructure

   - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as
     it reads '0' and writes from userspace are ignored

   - Misc cleanups

  x86 - MMU:

   - Small cleanups, renames and refactoring extracted from the upcoming
     Intel TDX support

   - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages
     that can't hold leafs SPTEs

   - Unconditionally drop mmu_lock when allocating TDP MMU page tables
     for eager page splitting, to avoid stalling vCPUs when splitting
     huge pages

   - Bug the VM instead of simply warning if KVM tries to split a SPTE
     that is non-present or not-huge. KVM is guaranteed to end up in a
     broken state because the callers fully expect a valid SPTE, it's
     all but dangerous to let more MMU changes happen afterwards

  x86 - AMD:

   - Make per-CPU save_area allocations NUMA-aware

   - Force sev_es_host_save_area() to be inlined to avoid calling into
     an instrumentable function from noinstr code

   - Base support for running SEV-SNP guests. API-wise, this includes a
     new KVM_X86_SNP_VM type, encrypting/measure the initial image into
     guest memory, and finalizing it before launching it. Internally,
     there are some gmem/mmu hooks needed to prepare gmem-allocated
     pages before mapping them into guest private memory ranges

     This includes basic support for attestation guest requests, enough
     to say that KVM supports the GHCB 2.0 specification

     There is no support yet for loading into the firmware those signing
     keys to be used for attestation requests, and therefore no need yet
     for the host to provide certificate data for those keys.

     To support fetching certificate data from userspace, a new KVM exit
     type will be needed to handle fetching the certificate from
     userspace.

     An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS
     exit type to handle this was introduced in v1 of this patchset, but
     is still being discussed by community, so for now this patchset
     only implements a stub version of SNP Extended Guest Requests that
     does not provide certificate data

  x86 - Intel:

   - Remove an unnecessary EPT TLB flush when enabling hardware

   - Fix a series of bugs that cause KVM to fail to detect nested
     pending posted interrupts as valid wake eents for a vCPU executing
     HLT in L2 (with HLT-exiting disable by L1)

   - KVM: x86: Suppress MMIO that is triggered during task switch
     emulation

     Explicitly suppress userspace emulated MMIO exits that are
     triggered when emulating a task switch as KVM doesn't support
     userspace MMIO during complex (multi-step) emulation

     Silently ignoring the exit request can result in the
     WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace
     for some other reason prior to purging mmio_needed

     See commit 0dc902267c ("KVM: x86: Suppress pending MMIO write
     exits if emulator detects exception") for more details on KVM's
     limitations with respect to emulated MMIO during complex emulator
     flows

  Generic:

   - Rename the AS_UNMOVABLE flag that was introduced for KVM to
     AS_INACCESSIBLE, because the special casing needed by these pages
     is not due to just unmovability (and in fact they are only
     unmovable because the CPU cannot access them)

   - New ioctl to populate the KVM page tables in advance, which is
     useful to mitigate KVM page faults during guest boot or after live
     migration. The code will also be used by TDX, but (probably) not
     through the ioctl

   - Enable halt poll shrinking by default, as Intel found it to be a
     clear win

   - Setup empty IRQ routing when creating a VM to avoid having to
     synchronize SRCU when creating a split IRQCHIP on x86

   - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with
     a flag that arch code can use for hooking both sched_in() and
     sched_out()

   - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
     truncating a bogus value from userspace, e.g. to help userspace
     detect bugs

   - Mark a vCPU as preempted if and only if it's scheduled out while in
     the KVM_RUN loop, e.g. to avoid marking it preempted and thus
     writing guest memory when retrieving guest state during live
     migration blackout

  Selftests:

   - Remove dead code in the memslot modification stress test

   - Treat "branch instructions retired" as supported on all AMD Family
     17h+ CPUs

   - Print the guest pseudo-RNG seed only when it changes, to avoid
     spamming the log for tests that create lots of VMs

   - Make the PMU counters test less flaky when counting LLC cache
     misses by doing CLFLUSH{OPT} in every loop iteration"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  crypto: ccp: Add the SNP_VLEK_LOAD command
  KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops
  KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops
  KVM: x86: Replace static_call_cond() with static_call()
  KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
  x86/sev: Move sev_guest.h into common SEV header
  KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
  KVM: x86: Suppress MMIO that is triggered during task switch emulation
  KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro
  KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE
  KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY
  KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()
  KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level
  KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault"
  KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler
  KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory
  KVM: Document KVM_PRE_FAULT_MEMORY ioctl
  mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE
  perf kvm: Add kvm-stat for loongarch64
  LoongArch: KVM: Add PV steal time support in guest side
  ...
2024-07-20 12:41:03 -07:00
Linus Torvalds
9fa23750c6 Merge tag 'landlock-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
 "This simplifies code and improves documentation"

* tag 'landlock-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
  landlock: Various documentation improvements
  landlock: Clarify documentation for struct landlock_ruleset_attr
  landlock: Use bit-fields for storing handled layer access masks
2024-07-20 11:41:52 -07:00
Linus Torvalds
acc5965b9f Merge tag 'char-misc-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc and other driver updates from Greg KH:
 "Here is the "big" set of char/misc and other driver subsystem changes
  for 6.11-rc1. Nothing major in here, just loads of new drivers and
  updates. Included in here are:

   - IIO api updates and new drivers added

   - wait_interruptable_timeout() api cleanups for some drivers

   - MODULE_DESCRIPTION() additions for loads of drivers

   - parport out-of-bounds fix

   - interconnect driver updates and additions

   - mhi driver updates and additions

   - w1 driver fixes

   - binder speedups and fixes

   - eeprom driver updates

   - coresight driver updates

   - counter driver update

   - new misc driver additions

   - other minor api updates

  All of these, EXCEPT for the final Kconfig build fix for 32bit
  systems, have been in linux-next for a while with no reported issues.
  The Kconfig fixup went in 29 hours ago, so might have missed the
  latest linux-next, but was acked by everyone involved"

* tag 'char-misc-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (330 commits)
  misc: Kconfig: exclude mrvl-cn10k-dpi compilation for 32-bit systems
  misc: delete Makefile.rej
  binder: fix hang of unregistered readers
  misc: Kconfig: add a new dependency for MARVELL_CN10K_DPI
  virtio: add missing MODULE_DESCRIPTION() macro
  agp: uninorth: add missing MODULE_DESCRIPTION() macro
  spmi: add missing MODULE_DESCRIPTION() macros
  dev/parport: fix the array out-of-bounds risk
  samples: configfs: add missing MODULE_DESCRIPTION() macro
  misc: mrvl-cn10k-dpi: add Octeon CN10K DPI administrative driver
  misc: keba: Fix missing AUXILIARY_BUS dependency
  slimbus: Fix struct and documentation alignment in stream.c
  MAINTAINERS: CC dri-devel list on Qualcomm FastRPC patches
  misc: fastrpc: use coherent pool for untranslated Compute Banks
  misc: fastrpc: support complete DMA pool access to the DSP
  misc: fastrpc: add missing MODULE_DESCRIPTION() macro
  misc: fastrpc: Add missing dev_err newlines
  misc: fastrpc: Use memdup_user()
  nvmem: core: Implement force_ro sysfs attribute
  nvmem: Use sysfs_emit() for type attribute
  ...
2024-07-19 15:55:08 -07:00
Linus Torvalds
12cc3d5389 Merge tag 'sound-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
 "Lots of changes in this cycle, but mostly for cleanups and
  refactoring.

  Significant amount of changes are about DT schema conversions for ASoC
  at this time while we see other usual suspects, too.

  Some highlights below:

  Core:
   - Re-introduction of PCM sync ID support API
   - MIDI2 time-base extension in ALSA sequencer API

  ASoC:
   - Syncing of features between simple-audio-card and the two
     audio-graph cards
   - Support for specifying the order of operations for components
     within cards to allow quirking for unusual systems
   - Lots of DT schema conversions
   - Continued SOF/Intel updates for topology, SoundWire, IPC3/4
   - New support for Asahi Kasei AK4619, Cirrus Logic CS530x, Everest
     Semiconductors ES8311, NXP i.MX95 and LPC32xx, Qualcomm LPASS v2.5
     and WCD937x, Realtek RT1318 and RT1320 and Texas Instruments
     PCM5242

  HD-audio:
   - More quirks, Intel PantherLake support, senarytech codec support
   - Refactoring of Cirrus codec component-binding

  Others:
   - ALSA control kselftest improvements, and fixes for input value
     checks in various drivers"

* tag 'sound-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (349 commits)
  kselftest/alsa: Log the PCM ID in pcm-test
  kselftest/alsa: Use card name rather than number in test names
  ALSA: hda/realtek: Fix the speaker output on Samsung Galaxy Book Pro 360
  ALSA: hda/tas2781: Add new quirk for Lenovo Hera2 Laptop
  ALSA: seq: ump: Skip useless ports for static blocks
  ALSA: pcm_dmaengine: Don't synchronize DMA channel when DMA is paused
  ALSA: usb: Use BIT() for bit values
  ALSA: usb: Fix UBSAN warning in parse_audio_unit()
  ALSA: hda/realtek: Enable headset mic on Positivo SU C1400
  ASoC: tas2781: Add new Kontrol to set tas2563 digital Volume
  ASoC: codecs: wcd937x: Remove separate handling for vdd-buck supply
  ASoC: codecs: wcd937x: Remove the string compare in MIC BIAS widget settings
  ASoC: codecs: wcd937x-sdw: Fix Unbalanced pm_runtime_enable
  ASoC: dt-bindings: cirrus,cs42xx8: Convert to dtschema
  ASoC: cs530x: Remove bclk from private structure
  ASoC: cs530x: Calculate proper bclk rate using TDM
  ASoC: dt-bindings: cirrus,cs4270: Convert to dtschema
  firmware: cs_dsp: Rename fw_ver to wmfw_ver
  firmware: cs_dsp: Clarify wmfw format version log message
  firmware: cs_dsp: Make wmfw and bin filename arguments const char *
  ...
2024-07-19 12:39:34 -07:00
Jason A. Donenfeld
13f75d9ecf random: note that RNDGETPOOL was removed in 2.6.9-rc2
RNDGETPOOL was thankfully removed twenty years ago, but it's stuck
around in headers. Probably removing it from uapi headers isn't great in
case there are some weird users out there, but we should at least mark
this as having been removed, to save future readers the same goose chase
I just went on.

Link: https://lore.kernel.org/all/E1By1St-0001TS-Qj@thunk.org/
Link: https://lore.kernel.org/all/Pine.LNX.4.58.0409130937050.4094@ppc970.osdl.org/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19 20:22:48 +02:00
Jason A. Donenfeld
4ad10a5f5f random: introduce generic vDSO getrandom() implementation
Provide a generic C vDSO getrandom() implementation, which operates on
an opaque state returned by vgetrandom_alloc() and produces random bytes
the same way as getrandom(). This has the following API signature:

  ssize_t vgetrandom(void *buffer, size_t len, unsigned int flags,
                     void *opaque_state, size_t opaque_len);

The return value and the first three arguments are the same as ordinary
getrandom(), while the last two arguments are a pointer to the opaque
allocated state and its size. Were all five arguments passed to the
getrandom() syscall, nothing different would happen, and the functions
would have the exact same behavior.

The actual vDSO RNG algorithm implemented is the same one implemented by
drivers/char/random.c, using the same fast-erasure techniques as that.
Should the in-kernel implementation change, so too will the vDSO one.

It requires an implementation of ChaCha20 that does not use any stack,
in order to maintain forward secrecy if a multi-threaded program forks
(though this does not account for a similar issue with SA_SIGINFO
copying registers to the stack), so this is left as an
architecture-specific fill-in. Stack-less ChaCha20 is an easy algorithm
to implement on a variety of architectures, so this shouldn't be too
onerous.

Initially, the state is keyless, and so the first call makes a
getrandom() syscall to generate that key, and then uses it for
subsequent calls. By keeping track of a generation counter, it knows
when its key is invalidated and it should fetch a new one using the
syscall. Later, more than just a generation counter might be used.

Since MADV_WIPEONFORK is set on the opaque state, the key and related
state is wiped during a fork(), so secrets don't roll over into new
processes, and the same state doesn't accidentally generate the same
random stream. The generation counter, as well, is always >0, so that
the 0 counter is a useful indication of a fork() or otherwise
uninitialized state.

If the kernel RNG is not yet initialized, then the vDSO always calls the
syscall, because that behavior cannot be emulated in userspace, but
fortunately that state is short lived and only during early boot. If it
has been initialized, then there is no need to inspect the `flags`
argument, because the behavior does not change post-initialization
regardless of the `flags` value.

Since the opaque state passed to it is mutated, vDSO getrandom() is not
reentrant, when used with the same opaque state, which libc should be
mindful of.

The function works over an opaque per-thread state of a particular size,
which must be marked VM_WIPEONFORK, VM_DONTDUMP, VM_NORESERVE, and
VM_DROPPABLE for proper operation. Over time, the nuances of these
allocations may change or grow or even differ based on architectural
features.

The opaque state passed to vDSO getrandom() must be allocated using the
mmap_flags and mmap_prot parameters provided by the vgetrandom_opaque_params
struct, which also contains the size of each state. That struct can be
obtained with a call to vgetrandom(NULL, 0, 0, &params, ~0UL). Then,
libc can call mmap(2) and slice up the returned array into a state per
each thread, while ensuring that no single state straddles a page
boundary. Libc is expected to allocate a chunk of these on first use,
and then dole them out to threads as they're created, allocating more
when needed.

vDSO getrandom() provides the ability for userspace to generate random
bytes quickly and safely, and is intended to be integrated into libc's
thread management. As an illustrative example, the introduced code in
the vdso_test_getrandom self test later in this series might be used to
do the same outside of libc. In a libc the various pthread-isms are
expected to be elided into libc internals.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19 20:22:12 +02:00
Jason A. Donenfeld
9651fcedf7 mm: add MAP_DROPPABLE for designating always lazily freeable mappings
The vDSO getrandom() implementation works with a buffer allocated with a
new system call that has certain requirements:

- It shouldn't be written to core dumps.
  * Easy: VM_DONTDUMP.
- It should be zeroed on fork.
  * Easy: VM_WIPEONFORK.

- It shouldn't be written to swap.
  * Uh-oh: mlock is rlimited.
  * Uh-oh: mlock isn't inherited by forks.

- It shouldn't reserve actual memory, but it also shouldn't crash when
  page faulting in memory if none is available
  * Uh-oh: VM_NORESERVE means segfaults.

It turns out that the vDSO getrandom() function has three really nice
characteristics that we can exploit to solve this problem:

1) Due to being wiped during fork(), the vDSO code is already robust to
   having the contents of the pages it reads zeroed out midway through
   the function's execution.

2) In the absolute worst case of whatever contingency we're coding for,
   we have the option to fallback to the getrandom() syscall, and
   everything is fine.

3) The buffers the function uses are only ever useful for a maximum of
   60 seconds -- a sort of cache, rather than a long term allocation.

These characteristics mean that we can introduce VM_DROPPABLE, which
has the following semantics:

a) It never is written out to swap.
b) Under memory pressure, mm can just drop the pages (so that they're
   zero when read back again).
c) It is inherited by fork.
d) It doesn't count against the mlock budget, since nothing is locked.
e) If there's not enough memory to service a page fault, it's not fatal,
   and no signal is sent.

This way, allocations used by vDSO getrandom() can use:

    VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE

And there will be no problem with OOMing, crashing on overcommitment,
using memory when not in use, not wiping on fork(), coredumps, or
writing out to swap.

In order to let vDSO getrandom() use this, expose these via mmap(2) as
MAP_DROPPABLE.

Note that this involves removing the MADV_FREE special case from
sort_folio(), which according to Yu Zhao is unnecessary and will simply
result in an extra call to shrink_folio_list() in the worst case. The
chunk removed reenables the swapbacked flag, which we don't want for
VM_DROPPABLE, and we can't conditionalize it here because there isn't a
vma reference available.

Finally, the provided self test ensures that this is working as desired.

Cc: linux-mm@kvack.org
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19 20:22:12 +02:00
Linus Torvalds
4305ca0087 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
 "Updates to the usual drivers (ufs, lpfc, qla2xxx, mpi3mr) plus some
  misc small fixes.

  The only core changes are to both bsg and scsi to pass in the device
  instead of setting it afterwards as q->queuedata, so no functional
  change"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (69 commits)
  scsi: aha152x: Use DECLARE_COMPLETION_ONSTACK for non-constant completion
  scsi: qla2xxx: Convert comma to semicolon
  scsi: qla2xxx: Update version to 10.02.09.300-k
  scsi: qla2xxx: Use QP lock to search for bsg
  scsi: qla2xxx: Reduce fabric scan duplicate code
  scsi: qla2xxx: Fix optrom version displayed in FDMI
  scsi: qla2xxx: During vport delete send async logout explicitly
  scsi: qla2xxx: Complete command early within lock
  scsi: qla2xxx: Fix flash read failure
  scsi: qla2xxx: Return ENOBUFS if sg_cnt is more than one for ELS cmds
  scsi: qla2xxx: Fix for possible memory corruption
  scsi: qla2xxx: validate nvme_local_port correctly
  scsi: qla2xxx: Unable to act on RSCN for port online
  scsi: ufs: exynos: Add support for Flash Memory Protector (FMP)
  scsi: ufs: core: Add UFSHCD_QUIRK_KEYS_IN_PRDT
  scsi: ufs: core: Add fill_crypto_prdt variant op
  scsi: ufs: core: Add UFSHCD_QUIRK_BROKEN_CRYPTO_ENABLE
  scsi: ufs: core: fold ufshcd_clear_keyslot() into its caller
  scsi: ufs: core: Add UFSHCD_QUIRK_CUSTOM_CRYPTO_PROFILE
  scsi: ufs: mcq: Make .get_hba_mac() optional
  ...
2024-07-19 10:56:58 -07:00
Linus Torvalds
3d51520954 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
 "Usual collection of small improvements and fixes:

   - Bug fixes and minor improvments in efa, irdma, mlx4, mlx5, rxe,
     hf1, qib, ocrdma

   - bnxt_re support for MSN, which is a new retransmit logic

   - Initial mana support for RC qps

   - Use after free bug and cleanups in iwcm

   - Reduce resource usage in mlx5 when RDMA verbs features are not used

   - New verb to drain shared recieve queues, similar to normal recieve
     queues. This is necessary to allow ULPs a clean shutdown. Used in
     the iscsi rdma target

   - mlx5 support for more than 16 bits of doorbell indexes

   - Doorbell moderation support for bnxt_re

   - IB multi-plane support for mlx5

   - New EFA adaptor PCI IDs

   - RDMA_NAME_ASSIGN_TYPE_USER to hint to userspace that it shouldn't
     rename the device

   - A collection of hns bugs

   - Fix long standing bug in bnxt_re with incorrect endian handling of
     immediate data"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (65 commits)
  IB/hfi1: Constify struct flag_table
  RDMA/mana_ib: Set correct device into ib
  bnxt_re: Fix imm_data endianness
  RDMA: Fix netdev tracker in ib_device_set_netdev
  RDMA/hns: Fix mbx timing out before CMD execution is completed
  RDMA/hns: Fix insufficient extend DB for VFs.
  RDMA/hns: Fix undifined behavior caused by invalid max_sge
  RDMA/hns: Fix shift-out-bounds when max_inline_data is 0
  RDMA/hns: Fix missing pagesize and alignment check in FRMR
  RDMA/hns: Fix unmatch exception handling when init eq table fails
  RDMA/hns: Fix soft lockup under heavy CEQE load
  RDMA/hns: Check atomic wr length
  RDMA/ocrdma: Don't inline statistics functions
  RDMA/core: Introduce "name_assign_type" for an IB device
  RDMA/qib: Fix truncation compilation warnings in qib_verbs.c
  RDMA/qib: Fix truncation compilation warnings in qib_init.c
  RDMA/efa: Add EFA 0xefa3 PCI ID
  RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register
  net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports
  RDMA/mlx5: Add plane index support when querying PTYS registers
  ...
2024-07-19 09:51:33 -07:00