Numerous production kernel configs (see [1, 2]) are choosing to enable
CONFIG_DEBUG_LIST, which is also being recommended by KSPP for hardened
configs [3]. The motivation behind this is that the option can be used
as a security hardening feature (e.g. CVE-2019-2215 and CVE-2019-2025
are mitigated by the option [4]).
The feature has never been designed with performance in mind, yet common
list manipulation is happening across hot paths all over the kernel.
Introduce CONFIG_LIST_HARDENED, which performs list pointer checking
inline, and only upon list corruption calls the reporting slow path.
To generate optimal machine code with CONFIG_LIST_HARDENED:
1. Elide checking for pointer values which upon dereference would
result in an immediate access fault (i.e. minimal hardening
checks). The trade-off is lower-quality error reports.
2. Use the __preserve_most function attribute (available with Clang,
but not yet with GCC) to minimize the code footprint for calling
the reporting slow path. As a result, function size of callers is
reduced by avoiding saving registers before calling the rarely
called reporting slow path.
Note that all TUs in lib/Makefile already disable function tracing,
including list_debug.c, and __preserve_most's implied notrace has
no effect in this case.
3. Because the inline checks are a subset of the full set of checks in
__list_*_valid_or_report(), always return false if the inline
checks failed. This avoids redundant compare and conditional
branch right after return from the slow path.
As a side-effect of the checks being inline, if the compiler can prove
some condition to always be true, it can completely elide some checks.
Since DEBUG_LIST is functionally a superset of LIST_HARDENED, the
Kconfig variables are changed to reflect that: DEBUG_LIST selects
LIST_HARDENED, whereas LIST_HARDENED itself has no dependency on
DEBUG_LIST.
Running netperf with CONFIG_LIST_HARDENED (using a Clang compiler with
"preserve_most") shows throughput improvements, in my case of ~7% on
average (up to 20-30% on some test cases).
Link: https://r.android.com/1266735 [1]
Link: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config [2]
Link: https://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project/Recommended_Settings [3]
Link: https://googleprojectzero.blogspot.com/2019/11/bad-binder-android-in-wild-exploit.html [4]
Signed-off-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20230811151847.1594958-3-elver@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Turn the list debug checking functions __list_*_valid() into inline
functions that wrap the out-of-line functions. Care is taken to ensure
the inline wrappers are always inlined, so that additional compiler
instrumentation (such as sanitizers) does not result in redundant
outlining.
This change is preparation for performing checks in the inline wrappers.
No functional change intended.
Signed-off-by: Marco Elver <elver@google.com>
Link: https://lore.kernel.org/r/20230811151847.1594958-2-elver@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
[1]: "On X86-64 and AArch64 targets, this attribute changes the calling
convention of a function. The preserve_most calling convention attempts
to make the code in the caller as unintrusive as possible. This
convention behaves identically to the C calling convention on how
arguments and return values are passed, but it uses a different set of
caller/callee-saved registers. This alleviates the burden of saving and
recovering a large register set before and after the call in the caller.
If the arguments are passed in callee-saved registers, then they will be
preserved by the callee across the call. This doesn't apply for values
returned in callee-saved registers.
* On X86-64 the callee preserves all general purpose registers, except
for R11. R11 can be used as a scratch register. Floating-point
registers (XMMs/YMMs) are not preserved and need to be saved by the
caller.
* On AArch64 the callee preserve all general purpose registers, except
x0-X8 and X16-X18."
[1] https://clang.llvm.org/docs/AttributeReference.html#preserve-most
Introduce the attribute to compiler_types.h as __preserve_most.
Use of this attribute results in better code generation for calls to
very rarely called functions, such as error-reporting functions, or
rarely executed slow paths.
Beware that the attribute conflicts with instrumentation calls inserted
on function entry which do not use __preserve_most themselves. Notably,
function tracing which assumes the normal C calling convention for the
given architecture. Where the attribute is supported, __preserve_most
will imply notrace. It is recommended to restrict use of the attribute
to functions that should or already disable tracing.
Note: The additional preprocessor check against architecture should not
be necessary if __has_attribute() only returns true where supported;
also see https://github.com/ClangBuiltLinux/linux/issues/1908. But until
__has_attribute() does the right thing, we also guard by known-supported
architectures to avoid build warnings on other architectures.
The attribute may be supported by a future GCC version (see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110899).
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230811151847.1594958-1-elver@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89
Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230706175804.2249018-1-azeemshaikh38@gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
# diff --git a/drivers/eisa/eisa-bus.c b/drivers/eisa/eisa-bus.c
# index 713582cc27d1..33f0ba11c6ad 100644
# --- a/drivers/eisa/eisa-bus.c
# +++ b/drivers/eisa/eisa-bus.c
# @@ -60,7 +60,7 @@ static void __init eisa_name_device(struct eisa_device *edev)
# int i;
# for (i = 0; i < EISA_INFOS; i++) {
# if (!strcmp(edev->id.sig, eisa_table[i].id.sig)) {
# - strlcpy(edev->pretty_name,
# + strscpy(edev->pretty_name,
# eisa_table[i].name,
# sizeof(edev->pretty_name));
# return;
Make it clearer in the one-line description and the verbose description
text that CONFIG_UBSAN_TRAP as currently implemented involves a tradeoff of
much less helpful oops messages in exchange for a smaller kernel image.
(With the additional effect of turning UBSAN warnings into crashes, which
may or may not be desired.)
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20230705215128.486054-1-jannh@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Pull xtensa fixes from Max Filippov:
- fix interaction between unaligned exception handler and load/store
exception handler
- fix parsing ISS network interface specification string
- add comment about etherdev freeing to ISS network driver
* tag 'xtensa-20230716' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: fix unaligned and load/store configuration interaction
xtensa: ISS: fix call to split_if_spec
xtensa: ISS: add comment about etherdev freeing
Pull perf fix from Borislav Petkov:
- Fix a lockdep warning when the event given is the first one, no event
group exists yet but the code still goes and iterates over event
siblings
* tag 'perf_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix lockdep warning in for_each_sibling_event() on SPR
Pull objtool fixes from Borislav Petkov:
- Mark copy_iovec_from_user() __noclone in order to prevent gcc from
doing an inter-procedural optimization and confuse objtool
- Initialize struct elf fully to avoid build failures
* tag 'objtool_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
iov_iter: Mark copy_iovec_from_user() noclone
objtool: initialize all of struct elf
Pull scheduler fixes from Borislav Petkov:
- Remove a cgroup from under a polling process properly
- Fix the idle sibling selection
* tag 'sched_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/psi: use kernfs polling functions for PSI trigger polling
sched/fair: Use recent_used_cpu to test p->cpus_ptr
Pull pin control fixes from Linus Walleij:
"I'm mostly on vacation but what would vacation be without a few
critical fixes so people can use their gaming laptops when hiding away
from the sun (or rain)?
- Fix a really annoying interrupt storm in the AMD driver affecting
Asus TUF gaming notebooks
- Fix device tree parsing in the Renesas driver"
* tag 'pinctrl-v6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: amd: Unify debounce handling into amd_pinconf_set()
pinctrl: amd: Drop pull up select configuration
pinctrl: amd: Use amd_pinconf_set() for all config options
pinctrl: amd: Only use special debounce behavior for GPIO 0
pinctrl: renesas: rzg2l: Handle non-unique subnode names
pinctrl: renesas: rzv2m: Handle non-unique subnode names
Pull smb client fixes from Steve French:
- Two reconnect fixes: important fix to address inFlight count to leak
(which can leak credits), and fix for better handling a deleted share
- DFS fix
- SMB1 cleanup fix
- deferred close fix
* tag '6.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix mid leak during reconnection after timeout threshold
cifs: is_network_name_deleted should return a bool
smb: client: fix missed ses refcounting
smb: client: Fix -Wstringop-overflow issues
cifs: if deferred close is disabled then close files immediately
Pull powerpc fixes from Michael Ellerman:
- Fix Speculation_Store_Bypass reporting in /proc/self/status on
Power10
- Fix HPT with 4K pages since recent changes by implementing pmd_same()
- Fix 64-bit native_hpte_remove() to be irq-safe
Thanks to Aneesh Kumar K.V, Nageswara R Sastry, and Russell Currey.
* tag 'powerpc-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm/book3s64/hash/4k: Add pmd_same callback for 4K page size
powerpc/64e: Fix obtool warnings in exceptions-64e.S
powerpc/security: Fix Speculation_Store_Bypass reporting on Power10
powerpc/64s: Fix native_hpte_remove() to be irq-safe
Pull hardening fixes from Kees Cook:
- Remove LTO-only suffixes from promoted global function symbols
(Yonghong Song)
- Remove unused .text..refcount section from vmlinux.lds.h (Petr Pavlu)
- Add missing __always_inline to sparc __arch_xchg() (Arnd Bergmann)
- Claim maintainership of string routines
* tag 'hardening-v6.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
sparc: mark __arch_xchg() as __always_inline
MAINTAINERS: Foolishly claim maintainership of string routines
kallsyms: strip LTO-only suffixes from promoted global functions
vmlinux.lds.h: Remove a reference to no longer used sections .text..refcount
Pull probe fixes from Masami Hiramatsu:
- fprobe: Add a comment why fprobe will be skipped if another kprobe is
running in fprobe_kprobe_handler().
- probe-events: Fix some issues related to fetch-arguments:
- Fix double counting of the string length for user-string and
symstr. This will require longer buffer in the array case.
- Fix not to count error code (minus value) for the total used
length in array argument. This makes the total used length
shorter.
- Fix to update dynamic used data size counter only if fetcharg uses
the dynamic size data. This may mis-count the used dynamic data
size and corrupt data.
- Revert "tracing: Add "(fault)" name injection to kernel probes"
because that did not work correctly with a bug, and we agreed the
current '(fault)' output (instead of '"(fault)"' like a string)
explains what happened more clearly.
- Fix to record 0-length (means fault access) data_loc data in fetch
function itself, instead of store_trace_args(). If we record an
array of string, this will fix to save fault access data on each
entry of the array correctly.
* tag 'probes-fixes-v6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails
Revert "tracing: Add "(fault)" name injection to kernel probes"
tracing/probes: Fix to update dynamic data counter if fetcharg uses it
tracing/probes: Fix not to count error code to total length
tracing/probes: Fix to avoid double count of the string length on the array
fprobes: Add a comment why fprobe_kprobe_handler exits if kprobe is running
Pull spi fixes from Mark Brown:
"A couple of fairly minor driver specific fixes here, plus a bunch of
maintainership and admin updates. Nothing too remarkable"
* tag 'spi-fix-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
mailmap: add entry for Jonas Gorski
MAINTAINERS: add myself for spi-bcm63xx
spi: s3c64xx: clear loopback bit after loopback test
spi: bcm63xx: fix max prepend length
MAINTAINERS: Add myself as a maintainer for Microchip SPI
Pull regmap fix from Mark Brown:
"One fix for an out of bounds access in the interupt code here"
* tag 'regmap-fix-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap-irq: Fix out-of-bounds access when allocating config buffers
Pull iommu fixes from Joerg Roedel:
- Fix a regression causing a crash on sysfs access of iommu-group
specific files
- Fix signedness bug in SVA code
* tag 'iommu-fixes-v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/sva: Fix signedness bug in iommu_sva_alloc_pasid()
iommu: Fix crash during syfs iommu_groups/N/type