mirror of
https://github.com/ukui/kernel.git
synced 2026-03-09 10:07:04 -07:00
visionfive
516 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
27bc50fc90 |
Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ... |
||
|
|
65f109e199 |
Merge tag 'objtool-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar: - Remove the "ANNOTATE_NOENDBR on ENDBR" warning: it's not really useful and only found a non-bug false positive so far. - Properly decode LOOP/LOOPE/LOOPNE, which were missing from the x86 decoder. Because these instructions are rather ineffective, they never showed up in compiler output, but they are simple enough to support, so add them for completeness. - A bit more cross-arch preparatory work. * tag 'objtool-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool,x86: Teach decode about LOOP* instructions objtool: Remove "ANNOTATE_NOENDBR on ENDBR" warning objtool: Use arch_jump_destination() in read_intra_function_calls() |
||
|
|
0326074ff4 |
Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core:
- Introduce and use a single page frag cache for allocating small skb
heads, clawing back the 10-20% performance regression in UDP flood
test from previous fixes.
- Run packets which already went thru HW coalescing thru SW GRO. This
significantly improves TCP segment coalescing and simplifies
deployments as different workloads benefit from HW or SW GRO.
- Shrink the size of the base zero-copy send structure.
- Move TCP init under a new slow / sleepable version of DO_ONCE().
BPF:
- Add BPF-specific, any-context-safe memory allocator.
- Add helpers/kfuncs for PKCS#7 signature verification from BPF
programs.
- Define a new map type and related helpers for user space -> kernel
communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF).
- Allow targeting BPF iterators to loop through resources of one
task/thread.
- Add ability to call selected destructive functions. Expose
crash_kexec() to allow BPF to trigger a kernel dump. Use
CAP_SYS_BOOT check on the loading process to judge permissions.
- Enable BPF to collect custom hierarchical cgroup stats efficiently
by integrating with the rstat framework.
- Support struct arguments for trampoline based programs. Only
structs with size <= 16B and x86 are supported.
- Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping
sockets (instead of just TCP and UDP sockets).
- Add a helper for accessing CLOCK_TAI for time sensitive network
related programs.
- Support accessing network tunnel metadata's flags.
- Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open.
- Add support for writing to Netfilter's nf_conn:mark.
Protocols:
- WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation
(MLO) work (802.11be, WiFi 7).
- vsock: improve support for SO_RCVLOWAT.
- SMC: support SO_REUSEPORT.
- Netlink: define and document how to use netlink in a "modern" way.
Support reporting missing attributes via extended ACK.
- IPSec: support collect metadata mode for xfrm interfaces.
- TCPv6: send consistent autoflowlabel in SYN_RECV state and RST
packets.
- TCP: introduce optional per-netns connection hash table to allow
better isolation between namespaces (opt-in, at the cost of memory
and cache pressure).
- MPTCP: support TCP_FASTOPEN_CONNECT.
- Add NEXT-C-SID support in Segment Routing (SRv6) End behavior.
- Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets.
- Open vSwitch:
- Allow specifying ifindex of new interfaces.
- Allow conntrack and metering in non-initial user namespace.
- TLS: support the Korean ARIA-GCM crypto algorithm.
- Remove DECnet support.
Driver API:
- Allow selecting the conduit interface used by each port in DSA
switches, at runtime.
- Ethernet Power Sourcing Equipment and Power Device support.
- Add tc-taprio support for queueMaxSDU parameter, i.e. setting per
traffic class max frame size for time-based packet schedules.
- Support PHY rate matching - adapting between differing host-side
and link-side speeds.
- Introduce QUSGMII PHY mode and 1000BASE-KX interface mode.
- Validate OF (device tree) nodes for DSA shared ports; make
phylink-related properties mandatory on DSA and CPU ports.
Enforcing more uniformity should allow transitioning to phylink.
- Require that flash component name used during update matches one of
the components for which version is reported by info_get().
- Remove "weight" argument from driver-facing NAPI API as much as
possible. It's one of those magic knobs which seemed like a good
idea at the time but is too indirect to use in practice.
- Support offload of TLS connections with 256 bit keys.
New hardware / drivers:
- Ethernet:
- Microchip KSZ9896 6-port Gigabit Ethernet Switch
- Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs
- Analog Devices ADIN1110 and ADIN2111 industrial single pair
Ethernet (10BASE-T1L) MAC+PHY.
- Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP).
- Ethernet SFPs / modules:
- RollBall / Hilink / Turris 10G copper SFPs
- HALNy GPON module
- WiFi:
- CYW43439 SDIO chipset (brcmfmac)
- CYW89459 PCIe chipset (brcmfmac)
- BCM4378 on Apple platforms (brcmfmac)
Drivers:
- CAN:
- gs_usb: HW timestamp support
- Ethernet PHYs:
- lan8814: cable diagnostics
- Ethernet NICs:
- Intel (100G):
- implement control of FCS/CRC stripping
- port splitting via devlink
- L2TPv3 filtering offload
- nVidia/Mellanox:
- tunnel offload for sub-functions
- MACSec offload, w/ Extended packet number and replay window
offload
- significantly restructure, and optimize the AF_XDP support,
align the behavior with other vendors
- Huawei:
- configuring DSCP map for traffic class selection
- querying standard FEC statistics
- querying SerDes lane number via ethtool
- Marvell/Cavium:
- egress priority flow control
- MACSec offload
- AMD/SolarFlare:
- PTP over IPv6 and raw Ethernet
- small / embedded:
- ax88772: convert to phylink (to support SFP cages)
- altera: tse: convert to phylink
- ftgmac100: support fixed link
- enetc: standard Ethtool counters
- macb: ZynqMP SGMII dynamic configuration support
- tsnep: support multi-queue and use page pool
- lan743x: Rx IP & TCP checksum offload
- igc: add xdp frags support to ndo_xdp_xmit
- Ethernet high-speed switches:
- Marvell (prestera):
- support SPAN port features (traffic mirroring)
- nexthop object offloading
- Microchip (sparx5):
- multicast forwarding offload
- QoS queuing offload (tc-mqprio, tc-tbf, tc-ets)
- Ethernet embedded switches:
- Marvell (mv88e6xxx):
- support RGMII cmode
- NXP (felix):
- standardized ethtool counters
- Microchip (lan966x):
- QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets)
- traffic policing and mirroring
- link aggregation / bonding offload
- QUSGMII PHY mode support
- Qualcomm 802.11ax WiFi (ath11k):
- cold boot calibration support on WCN6750
- support to connect to a non-transmit MBSSID AP profile
- enable remain-on-channel support on WCN6750
- Wake-on-WLAN support for WCN6750
- support to provide transmit power from firmware via nl80211
- support to get power save duration for each client
- spectral scan support for 160 MHz
- MediaTek WiFi (mt76):
- WiFi-to-Ethernet bridging offload for MT7986 chips
- RealTek WiFi (rtw89):
- P2P support"
* tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits)
eth: pse: add missing static inlines
once: rename _SLOW to _SLEEPABLE
net: pse-pd: add regulator based PSE driver
dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller
ethtool: add interface to interact with Ethernet Power Equipment
net: mdiobus: search for PSE nodes by parsing PHY nodes.
net: mdiobus: fwnode_mdiobus_register_phy() rework error handling
net: add framework to support Ethernet PSE and PDs devices
dt-bindings: net: phy: add PoDL PSE property
net: marvell: prestera: Propagate nh state from hw to kernel
net: marvell: prestera: Add neighbour cache accounting
net: marvell: prestera: add stub handler neighbour events
net: marvell: prestera: Add heplers to interact with fib_notifier_info
net: marvell: prestera: Add length macros for prestera_ip_addr
net: marvell: prestera: add delayed wq and flush wq on deinit
net: marvell: prestera: Add strict cleanup of fib arbiter
net: marvell: prestera: Add cleanup of allocated fib_nodes
net: marvell: prestera: Add router nexthops ABI
eth: octeon: fix build after netif_napi_add() changes
net/mlx5: E-Switch, Return EBUSY if can't get mode lock
...
|
||
|
|
7db99f01d1 |
Merge tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Borislav Petkov: - Print the CPU number at segfault time. The number printed is not always accurate (preemption is enabled at that time) but the print string contains "likely" and after a lot of back'n'forth on this, this was the consensus that was reached. See thread at [1]. - After a *lot* of testing and polishing, finally the clear_user() improvements to inline REP; STOSB by default Link: https://lore.kernel.org/r/5d62c1d0-7425-d5bb-ecb5-1dc3b4d7d245@intel.com [1] * tag 'x86_cpu_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Print likely CPU at segfault time x86/clear_user: Make it faster |
||
|
|
40b22c9df2 |
objtool: kmsan: list KMSAN API functions as uaccess-safe
KMSAN inserts API function calls in a lot of places (function entries and exits, local variables, memory accesses), so they may get called from the uaccess regions as well. KMSAN API functions are used to update the metadata (shadow/origin pages) for kernel memory accesses. The metadata pages for kernel pointers are also located in the kernel memory, so touching them is not a problem. For userspace pointers, no metadata is allocated. If an API function is supposed to read or modify the metadata, it does so for kernel pointers and ignores userspace pointers. If an API function is supposed to return a pair of metadata pointers for the instrumentation to use (like all __msan_metadata_ptr_for_TYPE_SIZE() functions do), it returns the allocated metadata for kernel pointers and special dummy buffers residing in the kernel memory for userspace pointers. As a result, none of KMSAN API functions perform userspace accesses, but since they might be called from UACCESS regions they use user_access_save/restore(). Link: https://lkml.kernel.org/r/20220915150417.722975-32-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
||
|
|
3c68a92d17 |
objtool: Disable CFI warnings
The __cfi_ preambles contain a mov instruction that embeds the KCFI
type identifier in the following format:
; type preamble
__cfi_function:
mov <id>, %eax
function:
...
While the preamble symbols are STT_FUNC and contain valid
instructions, they are never executed and always fall through. Skip
the warning for them.
.kcfi_traps sections point to CFI traps in text sections. Also skip
the warning about them referencing !ENDBR instructions.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220908215504.3686827-18-samitolvanen@google.com
|
||
|
|
5141d3a06b |
objtool: Preserve special st_shndx indexes in elf_update_symbol
elf_update_symbol fails to preserve the special st_shndx values
between [SHN_LORESERVE, SHN_HIRESERVE], which results in it
converting SHN_ABS entries into SHN_UNDEF, for example. Explicitly
check for the special indexes and ensure these symbols are not
marked undefined.
Fixes:
|
||
|
|
9440155ccb |
ftrace: Add HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
x86 will shortly start using -fpatchable-function-entry for purposes other than ftrace, make sure the __patchable_function_entry section isn't merged in the mcount_loc section. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220903131154.420467-2-jolsa@kernel.org |
||
|
|
7a7621dfa4 |
objtool,x86: Teach decode about LOOP* instructions
When 'discussing' control flow Masami mentioned the LOOP* instructions and I realized objtool doesn't decode them properly. As it turns out, these instructions are somewhat inefficient and as such unlikely to be emitted by the compiler (a few vmlinux.o checks can't find a single one) so this isn't critical, but still, best to decode them properly. Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Yxhd4EMKyoFoH9y4@hirez.programming.kicks-ass.net |
||
|
|
2f23a7c914 |
Merge tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes from Ingo Molnar: - Fix PAT on Xen, which caused i915 driver failures - Fix compat INT 80 entry crash on Xen PV guests - Fix 'MMIO Stale Data' mitigation status reporting on older Intel CPUs - Fix RSB stuffing regressions - Fix ORC unwinding on ftrace trampolines - Add Intel Raptor Lake CPU model number - Fix (work around) a SEV-SNP bootloader bug providing bogus values in boot_params->cc_blob_address, by ignoring the value on !SEV-SNP bootups. - Fix SEV-SNP early boot failure - Fix the objtool list of noreturn functions and annotate snp_abort(), which bug confused objtool on gcc-12. - Fix the documentation for retbleed * tag 'x86-urgent-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/ABI: Mention retbleed vulnerability info file for sysfs x86/sev: Mark snp_abort() noreturn x86/sev: Don't use cc_platform_has() for early SEV-SNP calls x86/boot: Don't propagate uninitialized boot_params->cc_blob_address x86/cpu: Add new Raptor Lake CPU model number x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry x86/nospec: Fix i386 RSB stuffing x86/nospec: Unwreck the RSB stuffing x86/bugs: Add "unknown" reporting for MMIO Stale Data x86/entry: Fix entry_INT80_compat for Xen PV guests x86/PAT: Have pat_enabled() properly reflect state when running on Xen |
||
|
|
c93c296fff |
x86/sev: Mark snp_abort() noreturn
Mark both the function prototype and definition as noreturn in order to prevent the compiler from doing transformations which confuse objtool like so: vmlinux.o: warning: objtool: sme_enable+0x71: unreachable instruction This triggers with gcc-12. Add it and sev_es_terminate() to the objtool noreturn tracking array too. Sort it while at it. Suggested-by: Michael Matz <matz@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220824152420.20547-1-bp@alien8.de |
||
|
|
3c6f390080 |
objtool: Remove "ANNOTATE_NOENDBR on ENDBR" warning
This warning isn't very useful: why would you put ANNOTATE_NOENDBR on ENDBR, and if you did, what's the harm? And thus far it's only found one non-bug, where the '__end_entry_SYSENTER_compat' label happens to land on the ENDBR from entry_SYSCALL_compat: vmlinux.o: warning: objtool: entry_SYSCALL_compat+0x0: ANNOTATE_NOENDBR on ENDBR .. which is fine. Just remove the warning. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/142341a5dafdfc788e4c95b9e226a6eefc9b626e.1660839773.git.jpoimboe@kernel.org |
||
|
|
7b3e318690 |
objtool: Use arch_jump_destination() in read_intra_function_calls()
Use arch_jump_destiation() instead of the open-coded 'offset + len + immediate' that is x86 specific. Avoids future trouble with other architectures. Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220818014553.220261-1-chenzhongjin@huawei.com |
||
|
|
e27e5bea95 |
x86/ibt, objtool: Add IBT_NOSEAL()
Add a macro which prevents a function from getting sealed if there are no compile-time references to it. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Message-Id: <20220818213927.e44fmxkoq4yj6ybn@treble> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
|
|
0db7058e8e |
x86/clear_user: Make it faster
Based on a patch by Mark Hemment <markhemm@googlemail.com> and incorporating very sane suggestions from Linus. The point here is to have the default case with FSRM - which is supposed to be the majority of x86 hw out there - if not now then soon - be directly inlined into the instruction stream so that no function call overhead is taking place. Drop the early clobbers from the @size and @addr operands as those are not needed anymore since we have single instruction alternatives. The benchmarks I ran would show very small improvements and a PF benchmark would even show weird things like slowdowns with higher core counts. So for a ~6m running the git test suite, the function gets called under 700K times, all from padzero(): <...>-2536 [006] ..... 261.208801: padzero: to: 0x55b0663ed214, size: 3564, cycles: 21900 <...>-2536 [006] ..... 261.208819: padzero: to: 0x7f061adca078, size: 3976, cycles: 17160 <...>-2537 [008] ..... 261.211027: padzero: to: 0x5572d019e240, size: 3520, cycles: 23850 <...>-2537 [008] ..... 261.211049: padzero: to: 0x7f1288dc9078, size: 3976, cycles: 15900 ... which is around 1%-ish of the total time and which is consistent with the benchmark numbers. So Mel gave me the idea to simply measure how fast the function becomes. I.e.: start = rdtsc_ordered(); ret = __clear_user(to, n); end = rdtsc_ordered(); Computing the mean average of all the samples collected during the test suite run then shows some improvement: clear_user_original: Amean: 9219.71 (Sum: 6340154910, samples: 687674) fsrm: Amean: 8030.63 (Sum: 5522277720, samples: 687652) That's on Zen3. The situation looks a lot more confusing on Intel: Icelake: clear_user_original: Amean: 19679.4 (Sum: 13652560764, samples: 693750) Amean: 19743.7 (Sum: 13693470604, samples: 693562) (I ran it twice just to be sure.) ERMS: Amean: 20374.3 (Sum: 13910601024, samples: 682752) Amean: 20453.7 (Sum: 14186223606, samples: 693576) FSRM: Amean: 20458.2 (Sum: 13918381386, sample s: 680331) The original microbenchmark which people were complaining about: for i in $(seq 1 10); do dd if=/dev/zero of=/dev/null bs=1M status=progress count=65536; done 2>&1 | grep copied 32207011840 bytes (32 GB, 30 GiB) copied, 1 s, 32.2 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.93069 s, 35.6 GB/s 37597741056 bytes (38 GB, 35 GiB) copied, 1 s, 37.6 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.78017 s, 38.6 GB/s 62020124672 bytes (62 GB, 58 GiB) copied, 2 s, 31.0 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 2.13716 s, 32.2 GB/s 60010004480 bytes (60 GB, 56 GiB) copied, 1 s, 60.0 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.14129 s, 60.2 GB/s 53212086272 bytes (53 GB, 50 GiB) copied, 1 s, 53.2 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.28398 s, 53.5 GB/s 55698259968 bytes (56 GB, 52 GiB) copied, 1 s, 55.7 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.22507 s, 56.1 GB/s 55306092544 bytes (55 GB, 52 GiB) copied, 1 s, 55.3 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.23647 s, 55.6 GB/s 54387539968 bytes (54 GB, 51 GiB) copied, 1 s, 54.4 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.25693 s, 54.7 GB/s 50566529024 bytes (51 GB, 47 GiB) copied, 1 s, 50.6 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.35096 s, 50.9 GB/s 58308165632 bytes (58 GB, 54 GiB) copied, 1 s, 58.3 GB/s 68719476736 bytes (69 GB, 64 GiB) copied, 1.17394 s, 58.5 GB/s Now the same thing with smaller buffers: for i in $(seq 1 10); do dd if=/dev/zero of=/dev/null bs=1M status=progress count=8192; done 2>&1 | grep copied 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.28485 s, 30.2 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.276112 s, 31.1 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.29136 s, 29.5 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.283803 s, 30.3 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.306503 s, 28.0 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.349169 s, 24.6 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.276912 s, 31.0 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.265356 s, 32.4 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.28464 s, 30.2 GB/s 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 0.242998 s, 35.3 GB/s is also not conclusive because it all depends on the buffer sizes, their alignments and when the microcode detects that cachelines can be aggregated properly and copied in bigger sizes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/CAHk-=wh=Mu_EYhtOmPn6AxoQZyEh-4fo2Zx3G7rBv1g7vwoKiw@mail.gmail.com |
||
|
|
aad26f55f4 |
Merge tag 'docs-6.0' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"This was a moderately busy cycle for documentation, but nothing
all that earth-shaking:
- More Chinese translations, and an update to the Italian
translations.
The Japanese, Korean, and traditional Chinese translations
are more-or-less unmaintained at this point, instead.
- Some build-system performance improvements.
- The removal of the archaic submitting-drivers.rst document,
with the movement of what useful material that remained into
other docs.
- Improvements to sphinx-pre-install to, hopefully, give more
useful suggestions.
- A number of build-warning fixes
Plus the usual collection of typo fixes, updates, and more"
* tag 'docs-6.0' of git://git.lwn.net/linux: (92 commits)
docs: efi-stub: Fix paths for x86 / arm stubs
Docs/zh_CN: Update the translation of sched-stats to 5.19-rc8
Docs/zh_CN: Update the translation of pci to 5.19-rc8
Docs/zh_CN: Update the translation of pci-iov-howto to 5.19-rc8
Docs/zh_CN: Update the translation of usage to 5.19-rc8
Docs/zh_CN: Update the translation of testing-overview to 5.19-rc8
Docs/zh_CN: Update the translation of sparse to 5.19-rc8
Docs/zh_CN: Update the translation of kasan to 5.19-rc8
Docs/zh_CN: Update the translation of iio_configfs to 5.19-rc8
doc:it_IT: align Italian documentation
docs: Remove spurious tag from admin-guide/mm/overcommit-accounting.rst
Documentation: process: Update email client instructions for Thunderbird
docs: ABI: correct QEMU fw_cfg spec path
doc/zh_CN: remove submitting-driver reference from docs
docs: zh_TW: align to submitting-drivers removal
docs: zh_CN: align to submitting-drivers removal
docs: ko_KR: howto: remove reference to removed submitting-drivers
docs: ja_JP: howto: remove reference to removed submitting-drivers
docs: it_IT: align to submitting-drivers removal
docs: process: remove outdated submitting-drivers.rst
...
|
||
|
|
ce114c8668 |
Merge tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 retbleed fixes from Borislav Petkov:
"Just when you thought that all the speculation bugs were addressed and
solved and the nightmare is complete, here's the next one: speculating
after RET instructions and leaking privileged information using the
now pretty much classical covert channels.
It is called RETBleed and the mitigation effort and controlling
functionality has been modelled similar to what already existing
mitigations provide"
* tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
x86/speculation: Disable RRSBA behavior
x86/kexec: Disable RET on kexec
x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported
x86/entry: Move PUSH_AND_CLEAR_REGS() back into error_entry
x86/bugs: Add Cannon lake to RETBleed affected CPU list
x86/retbleed: Add fine grained Kconfig knobs
x86/cpu/amd: Enumerate BTC_NO
x86/common: Stamp out the stepping madness
KVM: VMX: Prevent RSB underflow before vmenter
x86/speculation: Fill RSB on vmexit for IBRS
KVM: VMX: Fix IBRS handling after vmexit
KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
KVM: VMX: Convert launched argument to flags
KVM: VMX: Flatten __vmx_vcpu_run()
objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}
x86/speculation: Remove x86_spec_ctrl_mask
x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit
x86/speculation: Fix SPEC_CTRL write on SMT state change
x86/speculation: Fix firmware entry SPEC_CTRL handling
x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n
...
|
||
|
|
d6a21f2d73 |
objtool: update objtool.txt references
Changeset |
||
|
|
38e0e4d04d |
x86/ibt, objtool: Don't discard text references from tracepoint section
On Tue, Jun 28, 2022 at 04:28:58PM +0800, Pengfei Xu wrote:
> # ./ftracetest
> === Ftrace unit tests ===
> [1] Basic trace file check [PASS]
> [2] Basic test for tracers [PASS]
> [3] Basic trace clock test [PASS]
> [4] Basic event tracing check [PASS]
> [5] Change the ringbuffer size [PASS]
> [6] Snapshot and tracing setting [PASS]
> [7] trace_pipe and trace_marker [PASS]
> [8] Test ftrace direct functions against tracers [UNRESOLVED]
> [9] Test ftrace direct functions against kprobes [UNRESOLVED]
> [10] Generic dynamic event - add/remove eprobe events [FAIL]
> [11] Generic dynamic event - add/remove kprobe events
>
> It 100% reproduced in step 11 and then missing ENDBR BUG generated:
> "
> [ 9332.752836] mmiotrace: enabled CPU7.
> [ 9332.788612] mmiotrace: disabled.
> [ 9337.103426] traps: Missing ENDBR: syscall_regfunc+0x0/0xb0
It turns out that while syscall_regfunc() does have an ENDBR when
generated, it gets sealed by objtool's .ibt_endbr_seal list.
Since the only text references to this function:
$ git grep syscall_regfunc
include/linux/tracepoint.h:extern int syscall_regfunc(void);
include/trace/events/syscalls.h: syscall_regfunc, syscall_unregfunc
include/trace/events/syscalls.h: syscall_regfunc, syscall_unregfunc
kernel/tracepoint.c:int syscall_regfunc(void)
appear in the __tracepoint section which is excluded by objtool.
Fixes:
|
||
|
|
f43b9876e8 |
x86/retbleed: Add fine grained Kconfig knobs
Do fine-grained Kconfig for all the various retbleed parts. NOTE: if your compiler doesn't support return thunks this will silently 'upgrade' your mitigation to IBPB, you might not like this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> |
||
|
|
8faea26e61 |
objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}
Commit
|
||
|
|
a09a6e2399 |
objtool: Add entry UNRET validation
Since entry asm is tricky, add a validation pass that ensures the retbleed mitigation has been done before the first actual RET instruction. Entry points are those that either have UNWIND_HINT_ENTRY, which acts as UNWIND_HINT_EMPTY but marks the instruction as an entry point, or those that have UWIND_HINT_IRET_REGS at +0. This is basically a variant of validate_branch() that is intra-function and it will simply follow all branches from marked entry points and ensures that all paths lead to ANNOTATE_UNRET_END. If a path hits RET or an indirection the path is a fail and will be reported. There are 3 ANNOTATE_UNRET_END instances: - UNTRAIN_RET itself - exception from-kernel; this path doesn't need UNTRAIN_RET - all early exceptions; these also don't need UNTRAIN_RET Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> |
||
|
|
9bb2ec608a |
objtool: Update Retpoline validation
Update retpoline validation with the new CONFIG_RETPOLINE requirement of not having bare naked RET instructions. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> |
||
|
|
a149180fbc |
x86: Add magic AMD return-thunk
Note: needs to be in a section distinct from Retpolines such that the Retpoline RET substitution cannot possibly use immediate jumps. ORC unwinding for zen_untrain_ret() and __x86_return_thunk() is a little tricky but works due to the fact that zen_untrain_ret() doesn't have any stack ops and as such will emit a single ORC entry at the start (+0x3f). Meanwhile, unwinding an IP, including the __x86_return_thunk() one (+0x40) will search for the largest ORC entry smaller or equal to the IP, these will find the one ORC entry (+0x3f) and all works. [ Alexandre: SVM part. ] [ bp: Build fix, massages. ] Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> |
||
|
|
951ddecf43 |
objtool: Treat .text.__x86.* as noinstr
Needed because zen_untrain_ret() will be called from noinstr code. Also makes sense since the thunks MUST NOT contain instrumentation nor be poked with dynamic instrumentation. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> |