Commit Graph

19122 Commits

Author SHA1 Message Date
Linus Torvalds
cd557bc0a2 Merge tag 'x86_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:

 - Ignore invalid x2APIC entries in order to not waste per-CPU data

 - Fix a back-to-back signals handling scenario when shadow stack is in
   use

 - A documentation fix

 - Add Kirill as TDX maintainer

* tag 'x86_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/acpi: Ignore invalid x2APIC entries
  x86/shstk: Delay signal entry SSP write until after user accesses
  x86/Documentation: Indent 'note::' directive for protocol version number note
  MAINTAINERS: Add Intel TDX entry
2023-11-19 13:46:17 -08:00
Zhang Rui
ec9aedb2aa x86/acpi: Ignore invalid x2APIC entries
Currently, the kernel enumerates the possible CPUs by parsing both ACPI
MADT Local APIC entries and x2APIC entries. So CPUs with "valid" APIC IDs,
even if they have duplicated APIC IDs in Local APIC and x2APIC, are always
enumerated.

Below is what ACPI MADT Local APIC and x2APIC describes on an
Ivebridge-EP system,

[02Ch 0044   1]                Subtable Type : 00 [Processor Local APIC]
[02Fh 0047   1]                Local Apic ID : 00
...
[164h 0356   1]                Subtable Type : 00 [Processor Local APIC]
[167h 0359   1]                Local Apic ID : 39
[16Ch 0364   1]                Subtable Type : 00 [Processor Local APIC]
[16Fh 0367   1]                Local Apic ID : FF
...
[3ECh 1004   1]                Subtable Type : 09 [Processor Local x2APIC]
[3F0h 1008   4]                Processor x2Apic ID : 00000000
...
[B5Ch 2908   1]                Subtable Type : 09 [Processor Local x2APIC]
[B60h 2912   4]                Processor x2Apic ID : 00000077

As a result, kernel shows "smpboot: Allowing 168 CPUs, 120 hotplug CPUs".
And this wastes significant amount of memory for the per-cpu data.
Plus this also breaks https://lore.kernel.org/all/87edm36qqb.ffs@tglx/,
because __max_logical_packages is over-estimated by the APIC IDs in
the x2APIC entries.

According to https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#processor-local-x2apic-structure:

  "[Compatibility note] On some legacy OSes, Logical processors with APIC
   ID values less than 255 (whether in XAPIC or X2APIC mode) must use the
   Processor Local APIC structure to convey their APIC information to OSPM,
   and those processors must be declared in the DSDT using the Processor()
   keyword. Logical processors with APIC ID values 255 and greater must use
   the Processor Local x2APIC structure and be declared using the Device()
   keyword."

Therefore prevent the registration of x2APIC entries with an APIC ID less
than 255 if the local APIC table enumerates valid APIC IDs.

[ tglx: Simplify the logic ]

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230702162802.344176-1-rui.zhang@intel.com
2023-11-09 14:33:30 +01:00
Rick Edgecombe
31255e072b x86/shstk: Delay signal entry SSP write until after user accesses
When a signal is being delivered, the kernel needs to make accesses to
userspace. These accesses could encounter an access error, in which case
the signal delivery itself will trigger a segfault. Usually this would
result in the kernel killing the process. But in the case of a SEGV signal
handler being configured, the failure of the first signal delivery will
result in *another* signal getting delivered. The second signal may
succeed if another thread has resolved the issue that triggered the
segfault (i.e. a well timed mprotect()/mmap()), or the second signal is
being delivered to another stack (i.e. an alt stack).

On x86, in the non-shadow stack case, all the accesses to userspace are
done before changes to the registers (in pt_regs). The operation is
aborted when an access error occurs, so although there may be writes done
for the first signal, control flow changes for the signal (regs->ip,
regs->sp, etc) are not committed until all the accesses have already
completed successfully. This means that the second signal will be
delivered as if it happened at the time of the first signal. It will
effectively replace the first aborted signal, overwriting the half-written
frame of the aborted signal. So on sigreturn from the second signal,
control flow will resume happily from the point of control flow where the
original signal was delivered.

The problem is, when shadow stack is active, the shadow stack SSP
register/MSR is updated *before* some of the userspace accesses. This
means if the earlier accesses succeed and the later ones fail, the second
signal will not be delivered at the same spot on the shadow stack as the
first one. So on sigreturn from the second signal, the SSP will be
pointing to the wrong location on the shadow stack (off by a frame).

Pengfei privately reported that while using a shadow stack enabled glibc,
the “signal06” test in the LTP test-suite hung. It turns out it is
testing the above described double signal scenario. When this test was
compiled with shadow stack, the first signal pushed a shadow stack
sigframe, then the second pushed another. When the second signal was
handled, the SSP was at the first shadow stack signal frame instead of
the original location. The test then got stuck as the #CP from the twice
incremented SSP was incorrect and generated segfaults in a loop.

Fix this by adjusting the SSP register only after any userspace accesses,
such that there can be no failures after the SSP is adjusted. Do this by
moving the shadow stack sigframe push logic to happen after all other
userspace accesses.

Note, sigreturn (as opposed to the signal delivery dealt with in this
patch) has ordering behavior that could lead to similar failures. The
ordering issues there extend beyond shadow stack to include the alt stack
restoration. Fixing that would require cross-arch changes, and the
ordering today does not cause any known test or apps breakages. So leave
it as is, for now.

[ dhansen: minor changelog/subject tweak ]

Fixes: 05e36022c0 ("x86/shstk: Handle signals for shadow stack")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20231107182251.91276-1-rick.p.edgecombe%40intel.com
Link: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/signal/signal06.c
2023-11-08 08:55:37 -08:00
Linus Torvalds
0a23fb262d Merge tag 'x86_microcode_for_v6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Borislac Petkov:
 "Major microcode loader restructuring, cleanup and improvements by
  Thomas Gleixner:

   - Restructure the code needed for it and add a temporary initrd
     mapping on 32-bit so that the loader can access the microcode
     blobs. This in itself is a preparation for the next major
     improvement:

   - Do not load microcode on 32-bit before paging has been enabled.

     Handling this has caused an endless stream of headaches, issues,
     ugly code and unnecessary hacks in the past. And there really
     wasn't any sensible reason to do that in the first place. So switch
     the 32-bit loading to happen after paging has been enabled and turn
     the loader code "real purrty" again

   - Drop mixed microcode steppings loading on Intel - there, a single
     patch loaded on the whole system is sufficient

   - Rework late loading to track which CPUs have updated microcode
     successfully and which haven't, act accordingly

   - Move late microcode loading on Intel in NMI context in order to
     guarantee concurrent loading on all threads

   - Make the late loading CPU-hotplug-safe and have the offlined
     threads be woken up for the purpose of the update

   - Add support for a minimum revision which determines whether late
     microcode loading is safe on a machine and the microcode does not
     change software visible features which the machine cannot use
     anyway since feature detection has happened already. Roughly, the
     minimum revision is the smallest revision number which must be
     loaded currently on the system so that late updates can be allowed

   - Other nice leanups, fixess, etc all over the place"

* tag 'x86_microcode_for_v6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
  x86/microcode/intel: Add a minimum required revision for late loading
  x86/microcode: Prepare for minimal revision check
  x86/microcode: Handle "offline" CPUs correctly
  x86/apic: Provide apic_force_nmi_on_cpu()
  x86/microcode: Protect against instrumentation
  x86/microcode: Rendezvous and load in NMI
  x86/microcode: Replace the all-in-one rendevous handler
  x86/microcode: Provide new control functions
  x86/microcode: Add per CPU control field
  x86/microcode: Add per CPU result state
  x86/microcode: Sanitize __wait_for_cpus()
  x86/microcode: Clarify the late load logic
  x86/microcode: Handle "nosmt" correctly
  x86/microcode: Clean up mc_cpu_down_prep()
  x86/microcode: Get rid of the schedule work indirection
  x86/microcode: Mop up early loading leftovers
  x86/microcode/amd: Use cached microcode for AP load
  x86/microcode/amd: Cache builtin/initrd microcode early
  x86/microcode/amd: Cache builtin microcode too
  x86/microcode/amd: Use correct per CPU ucode_cpu_info
  ...
2023-11-04 08:46:37 -10:00
Linus Torvalds
1f24458a10 Merge tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty and serial updates from Greg KH:
 "Here is the big set of tty/serial driver changes for 6.7-rc1. Included
  in here are:

   - console/vgacon cleanups and removals from Arnd

   - tty core and n_tty cleanups from Jiri

   - lots of 8250 driver updates and cleanups

   - sc16is7xx serial driver updates

   - dt binding updates

   - first set of port lock wrapers from Thomas for the printk fixes
     coming in future releases

   - other small serial and tty core cleanups and updates

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (193 commits)
  serdev: Replace custom code with device_match_acpi_handle()
  serdev: Simplify devm_serdev_device_open() function
  serdev: Make use of device_set_node()
  tty: n_gsm: add copyright Siemens Mobility GmbH
  tty: n_gsm: fix race condition in status line change on dead connections
  serial: core: Fix runtime PM handling for pending tx
  vgacon: fix mips/sibyte build regression
  dt-bindings: serial: drop unsupported samsung bindings
  tty: serial: samsung: drop earlycon support for unsupported platforms
  tty: 8250: Add note for PX-835
  tty: 8250: Fix IS-200 PCI ID comment
  tty: 8250: Add Brainboxes Oxford Semiconductor-based quirks
  tty: 8250: Add support for Intashield IX cards
  tty: 8250: Add support for additional Brainboxes PX cards
  tty: 8250: Fix up PX-803/PX-857
  tty: 8250: Fix port count of PX-257
  tty: 8250: Add support for Intashield IS-100
  tty: 8250: Add support for Brainboxes UP cards
  tty: 8250: Add support for additional Brainboxes UC cards
  tty: 8250: Remove UC-257 and UC-431
  ...
2023-11-03 15:44:25 -10:00
Linus Torvalds
8f6f76a6a2 Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
 "As usual, lots of singleton and doubleton patches all over the tree
  and there's little I can say which isn't in the individual changelogs.

  The lengthier patch series are

   - 'kdump: use generic functions to simplify crashkernel reservation
     in arch', from Baoquan He. This is mainly cleanups and
     consolidation of the 'crashkernel=' kernel parameter handling

   - After much discussion, David Laight's 'minmax: Relax type checks in
     min() and max()' is here. Hopefully reduces some typecasting and
     the use of min_t() and max_t()

   - A group of patches from Oleg Nesterov which clean up and slightly
     fix our handling of reads from /proc/PID/task/... and which remove
     task_struct.thread_group"

* tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits)
  scripts/gdb/vmalloc: disable on no-MMU
  scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n
  .mailmap: add address mapping for Tomeu Vizoso
  mailmap: update email address for Claudiu Beznea
  tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions
  .mailmap: map Benjamin Poirier's address
  scripts/gdb: add lx_current support for riscv
  ocfs2: fix a spelling typo in comment
  proc: test ProtectionKey in proc-empty-vm test
  proc: fix proc-empty-vm test with vsyscall
  fs/proc/base.c: remove unneeded semicolon
  do_io_accounting: use sig->stats_lock
  do_io_accounting: use __for_each_thread()
  ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error()
  ocfs2: fix a typo in a comment
  scripts/show_delta: add __main__ judgement before main code
  treewide: mark stuff as __ro_after_init
  fs: ocfs2: check status values
  proc: test /proc/${pid}/statm
  compiler.h: move __is_constexpr() to compiler.h
  ...
2023-11-02 20:53:31 -10:00
Linus Torvalds
426ee5196d Merge tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
 "To help make the move of sysctls out of kernel/sysctl.c not incur a
  size penalty sysctl has been changed to allow us to not require the
  sentinel, the final empty element on the sysctl array. Joel Granados
  has been doing all this work. On the v6.6 kernel we got the major
  infrastructure changes required to support this. For v6.7-rc1 we have
  all arch/ and drivers/ modified to remove the sentinel. Both arch and
  driver changes have been on linux-next for a bit less than a month. It
  is worth re-iterating the value:

   - this helps reduce the overall build time size of the kernel and run
     time memory consumed by the kernel by about ~64 bytes per array

   - the extra 64-byte penalty is no longer inncurred now when we move
     sysctls out from kernel/sysctl.c to their own files

  For v6.8-rc1 expect removal of all the sentinels and also then the
  unneeded check for procname == NULL.

  The last two patches are fixes recently merged by Krister Johansen
  which allow us again to use softlockup_panic early on boot. This used
  to work but the alias work broke it. This is useful for folks who want
  to detect softlockups super early rather than wait and spend money on
  cloud solutions with nothing but an eventual hung kernel. Although
  this hadn't gone through linux-next it's also a stable fix, so we
  might as well roll through the fixes now"

* tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (23 commits)
  watchdog: move softlockup_panic back to early_param
  proc: sysctl: prevent aliased sysctls from getting passed to init
  intel drm: Remove now superfluous sentinel element from ctl_table array
  Drivers: hv: Remove now superfluous sentinel element from ctl_table array
  raid: Remove now superfluous sentinel element from ctl_table array
  fw loader: Remove the now superfluous sentinel element from ctl_table array
  sgi-xp: Remove the now superfluous sentinel element from ctl_table array
  vrf: Remove the now superfluous sentinel element from ctl_table array
  char-misc: Remove the now superfluous sentinel element from ctl_table array
  infiniband: Remove the now superfluous sentinel element from ctl_table array
  macintosh: Remove the now superfluous sentinel element from ctl_table array
  parport: Remove the now superfluous sentinel element from ctl_table array
  scsi: Remove now superfluous sentinel element from ctl_table array
  tty: Remove now superfluous sentinel element from ctl_table array
  xen: Remove now superfluous sentinel element from ctl_table array
  hpet: Remove now superfluous sentinel element from ctl_table array
  c-sky: Remove now superfluous sentinel element from ctl_talbe array
  powerpc: Remove now superfluous sentinel element from ctl_table arrays
  riscv: Remove now superfluous sentinel element from ctl_table array
  x86/vdso: Remove now superfluous sentinel element from ctl_table array
  ...
2023-11-01 20:51:41 -10:00
Linus Torvalds
8999ad99f4 Merge tag 'x86_tdx_for_6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 TDX updates from Dave Hansen:
 "The majority of this is a rework of the assembly and C wrappers that
  are used to talk to the TDX module and VMM. This is a nice cleanup in
  general but is also clearing the way for using this code when Linux is
  the TDX VMM.

  There are also some tidbits to make TDX guests play nicer with Hyper-V
  and to take advantage the hardware TSC.

  Summary:

   - Refactor and clean up TDX hypercall/module call infrastructure

   - Handle retrying/resuming page conversion hypercalls

   - Make sure to use the (shockingly) reliable TSC in TDX guests"

[ TLA reminder: TDX is "Trust Domain Extensions", Intel's guest VM
  confidentiality technology ]

* tag 'x86_tdx_for_6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tdx: Mark TSC reliable
  x86/tdx: Fix __noreturn build warning around __tdx_hypercall_failed()
  x86/virt/tdx: Make TDX_MODULE_CALL handle SEAMCALL #UD and #GP
  x86/virt/tdx: Wire up basic SEAMCALL functions
  x86/tdx: Remove 'struct tdx_hypercall_args'
  x86/tdx: Reimplement __tdx_hypercall() using TDX_MODULE_CALL asm
  x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL
  x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs
  x86/tdx: Pass TDCALL/SEAMCALL input/output registers via a structure
  x86/tdx: Rename __tdx_module_call() to __tdcall()
  x86/tdx: Make macros of TDCALLs consistent with the spec
  x86/tdx: Skip saving output regs when SEAMCALL fails with VMFailInvalid
  x86/tdx: Zero out the missing RSI in TDX_HYPERCALL macro
  x86/tdx: Retry partially-completed page conversion hypercalls
2023-11-01 10:28:32 -10:00
Linus Torvalds
2656821f1f Merge tag 'rcu-next-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks
Pull RCU updates from Frederic Weisbecker:

 - RCU torture, locktorture and generic torture infrastructure updates
   that include various fixes, cleanups and consolidations.

   Among the user visible things, ftrace dumps can now be found into
   their own file, and module parameters get better documented and
   reported on dumps.

 - Generic and misc fixes all over the place. Some highlights:

     * Hotplug handling has seen some light cleanups and comments

     * An RCU barrier can now be triggered through sysfs to serialize
       memory stress testing and avoid OOM

     * Object information is now dumped in case of invalid callback
       invocation

     * Also various SRCU issues, too hard to trigger to deserve urgent
       pull requests, have been fixed

 - RCU documentation updates

 - RCU reference scalability test minor fixes and doc improvements.

 - RCU tasks minor fixes

 - Stall detection updates. Introduce RCU CPU Stall notifiers that
   allows a subsystem to provide informations to help debugging. Also
   cure some false positive stalls.

* tag 'rcu-next-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks: (56 commits)
  srcu: Only accelerate on enqueue time
  locktorture: Check the correct variable for allocation failure
  srcu: Fix callbacks acceleration mishandling
  rcu: Comment why callbacks migration can't wait for CPUHP_RCUTREE_PREP
  rcu: Standardize explicit CPU-hotplug calls
  rcu: Conditionally build CPU-hotplug teardown callbacks
  rcu: Remove references to rcu_migrate_callbacks() from diagrams
  rcu: Assume rcu_report_dead() is always called locally
  rcu: Assume IRQS disabled from rcu_report_dead()
  rcu: Use rcu_segcblist_segempty() instead of open coding it
  rcu: kmemleak: Ignore kmemleak false positives when RCU-freeing objects
  srcu: Fix srcu_struct node grpmask overflow on 64-bit systems
  torture: Convert parse-console.sh to mktemp
  rcutorture: Traverse possible cpu to set maxcpu in rcu_nocb_toggle()
  rcutorture: Replace schedule_timeout*() 1-jiffy waits with HZ/20
  torture: Add kvm.sh --debug-info argument
  locktorture: Rename readers_bind/writers_bind to bind_readers/bind_writers
  doc: Catch-up update for locktorture module parameters
  locktorture: Add call_rcu_chains module parameter
  locktorture: Add new module parameters to lock_torture_print_module_parms()
  ...
2023-10-30 18:01:41 -10:00
Linus Torvalds
eb55307e67 Merge tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Thomas Gleixner:

 - Limit the hardcoded topology quirk for Hygon CPUs to those which have
   a model ID less than 4.

   The newer models have the topology CPUID leaf 0xB correctly
   implemented and are not affected.

 - Make SMT control more robust against enumeration failures

   SMT control was added to allow controlling SMT at boottime or
   runtime. The primary purpose was to provide a simple mechanism to
   disable SMT in the light of speculation attack vectors.

   It turned out that the code is sensible to enumeration failures and
   worked only by chance for XEN/PV. XEN/PV has no real APIC enumeration
   which means the primary thread mask is not set up correctly. By
   chance a XEN/PV boot ends up with smp_num_siblings == 2, which makes
   the hotplug control stay at its default value "enabled". So the mask
   is never evaluated.

   The ongoing rework of the topology evaluation caused XEN/PV to end up
   with smp_num_siblings == 1, which sets the SMT control to "not
   supported" and the empty primary thread mask causes the hotplug core
   to deny the bringup of the APS.

   Make the decision logic more robust and take 'not supported' and 'not
   implemented' into account for the decision whether a CPU should be
   booted or not.

 - Fake primary thread mask for XEN/PV

   Pretend that all XEN/PV vCPUs are primary threads, which makes the
   usage of the primary thread mask valid on XEN/PV. That is consistent
   with because all of the topology information on XEN/PV is fake or
   even non-existent.

 - Encapsulate topology information in cpuinfo_x86

   Move the randomly scattered topology data into a separate data
   structure for readability and as a preparatory step for the topology
   evaluation overhaul.

 - Consolidate APIC ID data type to u32

   It's fixed width hardware data and not randomly u16, int, unsigned
   long or whatever developers decided to use.

 - Cure the abuse of cpuinfo for persisting logical IDs.

   Per CPU cpuinfo is used to persist the logical package and die IDs.
   That's really not the right place simply because cpuinfo is subject
   to be reinitialized when a CPU goes through an offline/online cycle.

   Use separate per CPU data for the persisting to enable the further
   topology management rework. It will be removed once the new topology
   management is in place.

 - Provide a debug interface for inspecting topology information

   Useful in general and extremly helpful for validating the topology
   management rework in terms of correctness or "bug" compatibility.

* tag 'x86-core-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/apic, x86/hyperv: Use u32 in hv_snp_boot_ap() too
  x86/cpu: Provide debug interface
  x86/cpu/topology: Cure the abuse of cpuinfo for persisting logical ids
  x86/apic: Use u32 for wakeup_secondary_cpu[_64]()
  x86/apic: Use u32 for [gs]et_apic_id()
  x86/apic: Use u32 for phys_pkg_id()
  x86/apic: Use u32 for cpu_present_to_apicid()
  x86/apic: Use u32 for check_apicid_used()
  x86/apic: Use u32 for APIC IDs in global data
  x86/apic: Use BAD_APICID consistently
  x86/cpu: Move cpu_l[l2]c_id into topology info
  x86/cpu: Move logical package and die IDs into topology info
  x86/cpu: Remove pointless evaluation of x86_coreid_bits
  x86/cpu: Move cu_id into topology info
  x86/cpu: Move cpu_core_id into topology info
  hwmon: (fam15h_power) Use topology_core_id()
  scsi: lpfc: Use topology_core_id()
  x86/cpu: Move cpu_die_id into topology info
  x86/cpu: Move phys_proc_id into topology info
  x86/cpu: Encapsulate topology information in cpuinfo_x86
  ...
2023-10-30 17:37:47 -10:00
Linus Torvalds
943af0e73a Merge tag 'x86-apic-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC updates from Thomas Gleixner:

 - Make the quirk for non-maskable MSI interrupts in the affinity setter
   functional again.

   It was broken by a MSI core code update, which restructured the code
   in a way that the quirk flag was not longer set correctly.

   Trying to restore the core logic caused a deeper inspection and it
   turned out that the extra quirk flag is not required at all because
   it's the inverse of the reservation mode bit, which only can be set
   when the MSI interrupt is maskable.

   So the trivial fix is to use the reservation mode check in the
   affinity setter function and remove almost 40 lines of code related
   to the no-mask quirk flag.

 - Cure a Kconfig dependency issue which causes compile failures by
   correcting the conditionals in the affected header files.

 - Clean up coding style in the UV APIC driver.

* tag 'x86-apic-2023-10-29-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/msi: Fix misconfigured non-maskable MSI quirk
  x86/msi: Fix compile error caused by CONFIG_GENERIC_MSI_IRQ=y && !CONFIG_X86_LOCAL_APIC
  x86/platform/uv/apic: Clean up inconsistent indenting
2023-10-30 17:27:56 -10:00
Linus Torvalds
f0d25b5d0f Merge tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm handling updates from Ingo Molnar:

 - Add new NX-stack self-test

 - Improve NUMA partial-CFMWS handling

 - Fix #VC handler bugs resulting in SEV-SNP boot failures

 - Drop the 4MB memory size restriction on minimal NUMA nodes

 - Reorganize headers a bit, in preparation to header dependency
   reduction efforts

 - Misc cleanups & fixes

* tag 'x86-mm-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Drop the 4 MB restriction on minimal NUMA node memory size
  selftests/x86/lam: Zero out buffer for readlink()
  x86/sev: Drop unneeded #include
  x86/sev: Move sev_setup_arch() to mem_encrypt.c
  x86/tdx: Replace deprecated strncpy() with strtomem_pad()
  selftests/x86/mm: Add new test that userspace stack is in fact NX
  x86/sev: Make boot_ghcb_page[] static
  x86/boot: Move x86_cache_alignment initialization to correct spot
  x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach
  x86/sev-es: Allow copy_from_kernel_nofault() in earlier boot
  x86_64: Show CR4.PSE on auxiliaries like on BSP
  x86/iommu/docs: Update AMD IOMMU specification document URL
  x86/sev/docs: Update document URL in amd-memory-encryption.rst
  x86/mm: Move arch_memory_failure() and arch_is_platform_page() definitions from <asm/processor.h> to <asm/pgtable.h>
  ACPI/NUMA: Apply SRAT proximity domain to entire CFMWS window
  x86/numa: Introduce numa_fill_memblks()
2023-10-30 15:40:57 -10:00
Linus Torvalds
1641b9b040 Merge tag 'x86-irq-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 irq fix from Ingo Molnar:
 "Fix out-of-order NMI nesting checks resulting in false positive
  warnings"

* tag 'x86-irq-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/nmi: Fix out-of-order NMI nesting checks & false positive warning
2023-10-30 15:39:38 -10:00
Linus Torvalds
ed766c2611 Merge tag 'x86-entry-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 entry updates from Ingo Molnar:

 - Make IA32_EMULATION boot time configurable with
   the new ia32_emulation=<bool> boot option

 - Clean up fast syscall return validation code: convert
   it to C and refactor the code

 - As part of this, optimize the canonical RIP test code

* tag 'x86-entry-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/32: Clean up syscall fast exit tests
  x86/entry/64: Use TASK_SIZE_MAX for canonical RIP test
  x86/entry/64: Convert SYSRET validation tests to C
  x86/entry/32: Remove SEP test for SYSEXIT
  x86/entry/32: Convert do_fast_syscall_32() to bool return type
  x86/entry/compat: Combine return value test from syscall handler
  x86/entry/64: Remove obsolete comment on tracing vs. SYSRET
  x86: Make IA32_EMULATION boot time configurable
  x86/entry: Make IA32 syscalls' availability depend on ia32_enabled()
  x86/elf: Make loading of 32bit processes depend on ia32_enabled()
  x86/entry: Compile entry_SYSCALL32_ignore() unconditionally
  x86/entry: Rename ignore_sysret()
  x86: Introduce ia32_enabled()
2023-10-30 15:27:27 -10:00
Linus Torvalds
2b95bb0526 Merge tag 'x86-boot-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:

 - Rework PE header generation, primarily to generate a modern, 4k
   aligned kernel image view with narrower W^X permissions.

 - Further refine init-lifetime annotations

 - Misc cleanups & fixes

* tag 'x86-boot-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  x86/boot: efistub: Assign global boot_params variable
  x86/boot: Rename conflicting 'boot_params' pointer to 'boot_params_ptr'
  x86/head/64: Move the __head definition to <asm/init.h>
  x86/head/64: Add missing __head annotation to startup_64_load_idt()
  x86/head/64: Mark 'startup_gdt[]' and 'startup_gdt_descr' as __initdata
  x86/boot: Harmonize the style of array-type parameter for fixup_pointer() calls
  x86/boot: Fix incorrect startup_gdt_descr.size
  x86/boot: Compile boot code with -std=gnu11 too
  x86/boot: Increase section and file alignment to 4k/512
  x86/boot: Split off PE/COFF .data section
  x86/boot: Drop PE/COFF .reloc section
  x86/boot: Construct PE/COFF .text section from assembler
  x86/boot: Derive file size from _edata symbol
  x86/boot: Define setup size in linker script
  x86/boot: Set EFI handover offset directly in header asm
  x86/boot: Grab kernel_info offset from zoffset header directly
  x86/boot: Drop references to startup_64
  x86/boot: Drop redundant code setting the root device
  x86/boot: Omit compression buffer from PE/COFF image memory footprint
  x86/boot: Remove the 'bugger off' message
  ...
2023-10-30 14:11:57 -10:00
Linus Torvalds
3b8b4b4fc4 Merge tag 'x86-headers-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 header file cleanup from Ingo Molnar:
 "Replace <asm/export.h> uses with <linux/export.h> and then remove
  <asm/export.h>"

* tag 'x86-headers-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/headers: Remove <asm/export.h>
  x86/headers: Replace #include <asm/export.h> with #include <linux/export.h>
  x86/headers: Remove unnecessary #include <asm/export.h>
2023-10-30 14:04:23 -10:00
Linus Torvalds
cd063c8b9e Merge tag 'objtool-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
 "Misc fixes and cleanups:

   - Fix potential MAX_NAME_LEN limit related build failures

   - Fix scripts/faddr2line symbol filtering bug

   - Fix scripts/faddr2line on LLVM=1

   - Fix scripts/faddr2line to accept readelf output with mapping
     symbols

   - Minor cleanups"

* tag 'objtool-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  scripts/faddr2line: Skip over mapping symbols in output from readelf
  scripts/faddr2line: Use LLVM addr2line and readelf if LLVM=1
  scripts/faddr2line: Don't filter out non-function symbols from readelf
  objtool: Remove max symbol name length limitation
  objtool: Propagate early errors
  objtool: Use 'the fallthrough' pseudo-keyword
  x86/speculation, objtool: Use absolute relocations for annotations
  x86/unwind/orc: Remove redundant initialization of 'mid' pointer in __orc_find()
2023-10-30 13:20:02 -10:00
Linus Torvalds
63ce50fff9 Merge tag 'sched-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "Fair scheduler (SCHED_OTHER) improvements:
   - Remove the old and now unused SIS_PROP code & option
   - Scan cluster before LLC in the wake-up path
   - Use candidate prev/recent_used CPU if scanning failed for cluster
     wakeup

  NUMA scheduling improvements:
   - Improve the VMA access-PID code to better skip/scan VMAs
   - Extend tracing to cover VMA-skipping decisions
   - Improve/fix the recently introduced sched_numa_find_nth_cpu() code
   - Generalize numa_map_to_online_node()

  Energy scheduling improvements:
   - Remove the EM_MAX_COMPLEXITY limit
   - Add tracepoints to track energy computation
   - Make the behavior of the 'sched_energy_aware' sysctl more
     consistent
   - Consolidate and clean up access to a CPU's max compute capacity
   - Fix uclamp code corner cases

  RT scheduling improvements:
   - Drive dl_rq->overloaded with dl_rq->pushable_dl_tasks updates
   - Drive the ->rto_mask with rt_rq->pushable_tasks updates

  Scheduler scalability improvements:
   - Rate-limit updates to tg->load_avg
   - On x86 disable IBRS when CPU is offline to improve single-threaded
     performance
   - Micro-optimize in_task() and in_interrupt()
   - Micro-optimize the PSI code
   - Avoid updating PSI triggers and ->rtpoll_total when there are no
     state changes

  Core scheduler infrastructure improvements:
   - Use saved_state to reduce some spurious freezer wakeups
   - Bring in a handful of fast-headers improvements to scheduler
     headers
   - Make the scheduler UAPI headers more widely usable by user-space
   - Simplify the control flow of scheduler syscalls by using lock
     guards
   - Fix sched_setaffinity() vs. CPU hotplug race

  Scheduler debuggability improvements:
   - Disallow writing invalid values to sched_rt_period_us
   - Fix a race in the rq-clock debugging code triggering warnings
   - Fix a warning in the bandwidth distribution code
   - Micro-optimize in_atomic_preempt_off() checks
   - Enforce that the tasklist_lock is held in for_each_thread()
   - Print the TGID in sched_show_task()
   - Remove the /proc/sys/kernel/sched_child_runs_first sysctl

  ... and misc cleanups & fixes"

* tag 'sched-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
  sched/fair: Remove SIS_PROP
  sched/fair: Use candidate prev/recent_used CPU if scanning failed for cluster wakeup
  sched/fair: Scan cluster before scanning LLC in wake-up path
  sched: Add cpus_share_resources API
  sched/core: Fix RQCF_ACT_SKIP leak
  sched/fair: Remove unused 'curr' argument from pick_next_entity()
  sched/nohz: Update comments about NEWILB_KICK
  sched/fair: Remove duplicate #include
  sched/psi: Update poll => rtpoll in relevant comments
  sched: Make PELT acronym definition searchable
  sched: Fix stop_one_cpu_nowait() vs hotplug
  sched/psi: Bail out early from irq time accounting
  sched/topology: Rename 'DIE' domain to 'PKG'
  sched/psi: Delete the 'update_total' function parameter from update_triggers()
  sched/psi: Avoid updating PSI triggers and ->rtpoll_total when there are no state changes
  sched/headers: Remove comment referring to rq::cpu_load, since this has been removed
  sched/numa: Complete scanning of inactive VMAs when there is no alternative
  sched/numa: Complete scanning of partial VMAs regardless of PID activity
  sched/numa: Move up the access pid reset logic
  sched/numa: Trace decisions related to skipping VMAs
  ...
2023-10-30 13:12:15 -10:00
Linus Torvalds
9cda4eb04a Merge tag 'x86_fpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu fixlet from Borislav Petkov:

 - kernel-doc fix

* tag 'x86_fpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu/xstate: Address kernel-doc warning
2023-10-30 12:36:41 -10:00
Linus Torvalds
f155f3b3ed Merge tag 'x86_platform_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Borislav Petkov:

 - Make sure PCI function 4 IDs of AMD family 0x19, models 0x60-0x7f are
   actually used in the amd_nb.c enumeration

 - Add support for extracting NUMA information from devicetree for
   Hyper-V usages

 - Add PCI device IDs for the new AMD MI300 AI accelerators

 - Annotate an array in struct uv_rtc_timer_head with the new
   __counted_by attribute

 - Rework UV's NMI action parameter handling

* tag 'x86_platform_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/amd_nb: Use Family 19h Models 60h-7Fh Function 4 IDs
  x86/numa: Add Devicetree support
  x86/of: Move the x86_flattree_get_config() call out of x86_dtb_init()
  x86/amd_nb: Add AMD Family MI300 PCI IDs
  x86/platform/uv: Annotate struct uv_rtc_timer_head with __counted_by
  x86/platform/uv: Rework NMI "action" modparam handling
2023-10-30 12:32:48 -10:00
Linus Torvalds
ca2e9c3bee Merge tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:

 - Make sure the "svm" feature flag is cleared from /proc/cpuinfo when
   virtualization support is disabled in the BIOS on AMD and Hygon
   platforms

 - A minor cleanup

* tag 'x86_cpu_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu/amd: Remove redundant 'break' statement
  x86/cpu: Clear SVM feature if disabled by BIOS
2023-10-30 12:10:24 -10:00
Linus Torvalds
9ab021a1b5 Merge tag 'x86_cache_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:

 - Add support for non-contiguous capacity bitmasks being added to
   Intel's CAT implementation

 - Other improvements to resctrl code: better configuration,
   simplifications, debugging support, fixes

* tag 'x86_cache_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Display RMID of resource group
  x86/resctrl: Add support for the files of MON groups only
  x86/resctrl: Display CLOSID for resource group
  x86/resctrl: Introduce "-o debug" mount option
  x86/resctrl: Move default group file creation to mount
  x86/resctrl: Unwind properly from rdt_enable_ctx()
  x86/resctrl: Rename rftype flags for consistency
  x86/resctrl: Simplify rftype flag definitions
  x86/resctrl: Add multiple tasks to the resctrl group at once
  Documentation/x86: Document resctrl's new sparse_masks
  x86/resctrl: Add sparse_masks file in info
  x86/resctrl: Enable non-contiguous CBMs in Intel CAT
  x86/resctrl: Rename arch_has_sparse_bitmaps
  x86/resctrl: Fix remaining kernel-doc warnings
2023-10-30 12:07:29 -10:00
Linus Torvalds
f84a52eef5 Merge tag 'x86_bugs_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 hw mitigation updates from Borislav Petkov:

 - A bunch of improvements, cleanups and fixlets to the SRSO mitigation
   machinery and other, general cleanups to the hw mitigations code, by
   Josh Poimboeuf

 - Improve the return thunk detection by objtool as it is absolutely
   important that the default return thunk is not used after returns
   have been patched. Future work to detect and report this better is
   pending

 - Other misc cleanups and fixes

* tag 'x86_bugs_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  x86/retpoline: Document some thunk handling aspects
  x86/retpoline: Make sure there are no unconverted return thunks due to KCSAN
  x86/callthunks: Delete unused "struct thunk_desc"
  x86/vdso: Run objtool on vdso32-setup.o
  objtool: Fix return thunk patching in retpolines
  x86/srso: Remove unnecessary semicolon
  x86/pti: Fix kernel warnings for pti= and nopti cmdline options
  x86/calldepth: Rename __x86_return_skl() to call_depth_return_thunk()
  x86/nospec: Refactor UNTRAIN_RET[_*]
  x86/rethunk: Use SYM_CODE_START[_LOCAL]_NOALIGN macros
  x86/srso: Disentangle rethunk-dependent options
  x86/srso: Move retbleed IBPB check into existing 'has_microcode' code block
  x86/bugs: Remove default case for fully switched enums
  x86/srso: Remove 'pred_cmd' label
  x86/srso: Unexport untraining functions
  x86/srso: Improve i-cache locality for alias mitigation
  x86/srso: Fix unret validation dependencies
  x86/srso: Fix vulnerability reporting for missing microcode
  x86/srso: Print mitigation for retbleed IBPB case
  x86/srso: Print actual mitigation if requested mitigation isn't possible
  ...
2023-10-30 11:48:49 -10:00
Linus Torvalds
01ae815c50 Merge tag 'ras_core_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:

 - Specify what error addresses reported on AMD are actually usable
   memory error addresses for further decoding

* tag 'ras_core_for_6.7_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Cleanup mce_usable_address()
  x86/mce: Define amd_mce_usable_address()
  x86/MCE/AMD: Split amd_mce_is_memory_error()
2023-10-30 11:47:03 -10:00
Linus Torvalds
2af9b20dbb Merge tag 'x86-urgent-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar:

 - Fix a possible CPU hotplug deadlock bug caused by the new TSC
   synchronization code

 - Fix a legacy PIC discovery bug that results in device troubles on
   affected systems, such as non-working keybards, etc

 - Add a new Intel CPU model number to <asm/intel-family.h>

* tag 'x86-urgent-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/tsc: Defer marking TSC unstable to a worker
  x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibility
  x86/cpu: Add model number for Intel Arrow Lake mobile processor
2023-10-28 08:15:07 -10:00