126 Commits

Author SHA1 Message Date
Chunyan Zhang
2dfb75cd56 raid6: riscv: replace one load with a move to speed up the caculation
Since wp$$==wq$$, it doesn't need to load the same data twice, use move
instruction to replace one of the loads to let the program run faster.

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Link: https://lore.kernel.org/r/20250718072711.3865118-3-zhangchunyan@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-16 16:43:27 -06:00
Chunyan Zhang
f8a03516a5 raid6: riscv: Clean up unused header file inclusion
These two C files don't reference things defined in simd.h or types.h
so remove these redundant #inclusions.

Fixes: 6093faaf95 ("raid6: Add RISC-V SIMD syndrome and recovery calculations")
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Reviewed-by: Nutty Liu <liujingqi@lanxincomputing.com>
Link: https://lore.kernel.org/r/20250718072711.3865118-2-zhangchunyan@iscas.ac.cn
Signed-off-by: Paul Walmsley <pjw@kernel.org>
2025-09-16 16:29:55 -06:00
Linus Torvalds
e991acf1bc Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
 "Significant patch series in this pull request:

   - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
     us closer to being able to remove page->mapping

   - "relayfs: misc changes" (Jason Xing) does some maintenance and
     minor feature addition work in relayfs

   - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
     us from static preallocation of the kdump crashkernel's working
     memory over to dynamic allocation. So the difficulty of a-priori
     estimation of the second kernel's needs is removed and the first
     kernel obtains extra memory

   - "generalize panic_print's dump function to be used by other
     kernel parts" (Feng Tang) implements some consolidation and
     rationalization of the various ways in which a failing kernel
     splats information at the operator

* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
  tools/getdelays: add backward compatibility for taskstats version
  kho: add test for kexec handover
  delaytop: enhance error logging and add PSI feature description
  samples: Kconfig: fix spelling mistake "instancess" -> "instances"
  fat: fix too many log in fat_chain_add()
  scripts/spelling.txt: add notifer||notifier to spelling.txt
  xen/xenbus: fix typo "notifer"
  net: mvneta: fix typo "notifer"
  drm/xe: fix typo "notifer"
  cxl: mce: fix typo "notifer"
  KVM: x86: fix typo "notifer"
  MAINTAINERS: add maintainers for delaytop
  ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
  ucount: fix atomic_long_inc_below() argument type
  kexec: enable CMA based contiguous allocation
  stackdepot: make max number of pools boot-time configurable
  lib/xxhash: remove unused functions
  init/Kconfig: restore CONFIG_BROKEN help text
  lib/raid6: update recov_rvv.c zero page usage
  docs: update docs after introducing delaytop
  ...
2025-08-03 16:23:09 -07:00
Linus Torvalds
bc46b7cbc5 Merge tag 's390-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:

 - Standardize on the __ASSEMBLER__ macro that is provided by GCC and
   Clang compilers and replace __ASSEMBLY__ with __ASSEMBLER__ in both
   uapi and non-uapi headers

 - Explicitly include <linux/export.h> in architecture and driver files
   which contain an EXPORT_SYMBOL() and remove the include from the
   files which do not contain the EXPORT_SYMBOL()

 - Use the full title of "z/Architecture Principles of Operation" manual
   and the name of a section where facility bits are listed

 - Use -D__DISABLE_EXPORTS for files in arch/s390/boot to avoid
   unnecessary slowing down of the build and confusing external kABI
   tools that process symtypes data

 - Print additional unrecoverable machine check information to make the
   root cause analysis easier

 - Move cmpxchg_user_key() handling to uaccess library code, since the
   generated code is large anyway and there is no benefit if it is
   inlined

 - Fix a problem when cmpxchg_user_key() is executing a code with a
   non-default key: if a system is IPL-ed with "LOAD NORMAL", and the
   previous system used storage keys where the fetch-protection bit was
   set for some pages, and the cmpxchg_user_key() is located within such
   page, a protection exception happens

 - Either the external call or emergency signal order is used to send an
   IPI to a remote CPU. Use the external order only, since it is at
   least as good and sometimes even better, than the emergency signal

 - In case of an early crash the early program check handler prints more
   or less random value of the last breaking event address, since it is
   not initialized properly. Copy the last breaking event address from
   the lowcore to pt_regs to address this

 - During STP synchronization check udelay() can not be used, since the
   first CPU modifies tod_clock_base and get_tod_clock_monotonic() might
   return a non-monotonic time. Instead, busy-loop on other CPUs, while
   the the first CPU actually handles the synchronization operation

 - When debugging the early kernel boot using QEMU with the -S flag and
   GDB attached, skip the decompressor and start directly in kernel

 - Rename PAI Crypto event 4210 according to z16 and z17 "z/Architecture
   Principles of Operation" manual

 - Remove the in-kernel time steering support in favour of the new s390
   PTP driver, which allows the kernel clock steered more precisely

 - Remove a possible false-positive warning in pte_free_defer(), which
   could be triggered in a valid case KVM guest process is initializing

* tag 's390-6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (29 commits)
  s390/mm: Remove possible false-positive warning in pte_free_defer()
  s390/stp: Default to enabled
  s390/stp: Remove leap second support
  s390/time: Remove in-kernel time steering
  s390/sclp: Use monotonic clock in sclp_sync_wait()
  s390/smp: Use monotonic clock in smp_emergency_stop()
  s390/time: Use monotonic clock in get_cycles()
  s390/pai_crypto: Rename PAI Crypto event 4210
  scripts/gdb/symbols: make lx-symbols skip the s390 decompressor
  s390/boot: Introduce jump_to_kernel() function
  s390/stp: Remove udelay from stp_sync_clock()
  s390/early: Copy last breaking event address to pt_regs
  s390/smp: Remove conditional emergency signal order code usage
  s390/uaccess: Merge cmpxchg_user_key() inline assemblies
  s390/uaccess: Prevent kprobes on cmpxchg_user_key() functions
  s390/uaccess: Initialize code pages executed with non-default access key
  s390/skey: Provide infrastructure for executing with non-default access key
  s390/uaccess: Make cmpxchg_user_key() library code
  s390/page: Add memory clobber to page_set_storage_key()
  s390/page: Cleanup page_set_storage_key() inline assemblies
  ...
2025-07-29 20:17:08 -07:00
Herbert Xu
a9ed4422ad lib/raid6: update recov_rvv.c zero page usage
Update lib/raid6/recov_rvv.c, for 1857fcc847 ("lib/raid6: replace custom
zero page with ZERO_PAGE"), per Klara.

Link: https://lkml.kernel.org/r/aFkUnXWtxcgOTVkw@gondor.apana.org.au
Fixes: 1857fcc847 ("lib/raid6: replace custom zero page with ZERO_PAGE")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Song Liu <song@kernel.org>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:08:29 -07:00
Herbert Xu
1857fcc847 lib/raid6: replace custom zero page with ZERO_PAGE
Use the system-wide zero page instead of a custom zero page.

[herbert@gondor.apana.org.au: update lib/raid6/recov_rvv.c, per Klara]
  Link: https://lkml.kernel.org/r/aFkUnXWtxcgOTVkw@gondor.apana.org.au
Link: https://lkml.kernel.org/r/Z9flJNkWQICx0PXk@gondor.apana.org.au
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Song Liu <song@kernel.org>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09 22:57:54 -07:00
Heiko Carstens
7ecc694883 s390/drivers: Remove unnecessary include <linux/export.h>
Remove include <linux/export.h> from all files which do not contain an
EXPORT_SYMBOL().

See commit 7d95680d64 ("scripts/misc-check: check unnecessary #include
<linux/export.h> when W=1") for more details.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2025-06-17 18:18:03 +02:00
Chunyan Zhang
bc75552b80 raid6: riscv: Fix NULL pointer dereference caused by a missing clobber
When running the raid6 user-space test program on RISC-V QEMU, there's a
segmentation fault which seems caused by accessing a NULL pointer,
which is the pointer variable p/q in raid6_rvv*_gen/xor_syndrome_real(),
p/q should have been equal to dptr[x], but when I use GDB command to
see its value, which was 0x10 like below:

"
Program received signal SIGSEGV, Segmentation fault.
0x0000000000011062 in raid6_rvv2_xor_syndrome_real (disks=<optimized out>, start=0, stop=<optimized out>, bytes=4096, ptrs=<optimized out>) at rvv.c:386
(gdb) p p
$1 = (u8 *) 0x10 <error: Cannot access memory at address 0x10>
"

The issue was found to be related with:
1) Compile optimization
   There's no segmentation fault if compiling the raid6test program with
   the optimization flag -O0.
2) The RISC-V vector command vsetvli
   If not used t0 as the first parameter in vsetvli, there's no
   segmentation fault either.

This patch selects the 2nd solution to fix the issue.

[Palmer: The actual issue here is a missing clobber in the vsetvli code.
It's a little tricky: we've already probed for VLENB so we don't need to
look at the output register, we just need to have an X register in the
instruction as that's the form required to actually set VL.  Thus we
clobber a register, and without describing that we end up breaking
compilers.]

Fixes: 6093faaf95 ("raid6: Add RISC-V SIMD syndrome and recovery calculations")
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250610101234.1100660-3-zhangchunyan@iscas.ac.cn
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-12 12:21:48 -07:00
Linus Torvalds
119b1e61a7 Merge tag 'riscv-for-linus-6.16-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:

 - Support for the FWFT SBI extension, which is part of SBI 3.0 and a
   dependency for many new SBI and ISA extensions

 - Support for getrandom() in the VDSO

 - Support for mseal

 - Optimized routines for raid6 syndrome and recovery calculations

 - kexec_file() supports loading Image-formatted kernel binaries

 - Improvements to the instruction patching framework to allow for
   atomic instruction patching, along with rules as to how systems need
   to behave in order to function correctly

 - Support for a handful of new ISA extensions: Svinval, Zicbop, Zabha,
   some SiFive vendor extensions

 - Various fixes and cleanups, including: misaligned access handling,
   perf symbol mangling, module loading, PUD THPs, and improved uaccess
   routines

* tag 'riscv-for-linus-6.16-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (69 commits)
  riscv: uaccess: Only restore the CSR_STATUS SUM bit
  RISC-V: vDSO: Wire up getrandom() vDSO implementation
  riscv: enable mseal sysmap for RV64
  raid6: Add RISC-V SIMD syndrome and recovery calculations
  riscv: mm: Add support for Svinval extension
  RISC-V: Documentation: Add enough title underlines to CMODX
  riscv: Improve Kconfig help for RISCV_ISA_V_PREEMPTIVE
  MAINTAINERS: Update Atish's email address
  riscv: uaccess: do not do misaligned accesses in get/put_user()
  riscv: process: use unsigned int instead of unsigned long for put_user()
  riscv: make unsafe user copy routines use existing assembly routines
  riscv: hwprobe: export Zabha extension
  riscv: Make regs_irqs_disabled() more clear
  perf symbols: Ignore mapping symbols on riscv
  RISC-V: Kconfig: Fix help text of CMDLINE_EXTEND
  riscv: module: Optimize PLT/GOT entry counting
  riscv: Add support for PUD THP
  riscv: xchg: Prefetch the destination word for sc.w
  riscv: Add ARCH_HAS_PREFETCH[W] support with Zicbop
  riscv: Add support for Zicbop
  ...
2025-06-06 18:05:18 -07:00
Chunyan Zhang
6093faaf95 raid6: Add RISC-V SIMD syndrome and recovery calculations
The assembly is originally based on the ARM NEON and int.uc, but uses
RISC-V vector instructions to implement the RAID6 syndrome and
recovery calculations.

The functions are tested on QEMU running with the option "-icount shift=0":

  raid6: rvvx1    gen()  1008 MB/s
  raid6: rvvx2    gen()  1395 MB/s
  raid6: rvvx4    gen()  1584 MB/s
  raid6: rvvx8    gen()  1694 MB/s
  raid6: int64x8  gen()   113 MB/s
  raid6: int64x4  gen()   116 MB/s
  raid6: int64x2  gen()   272 MB/s
  raid6: int64x1  gen()   229 MB/s
  raid6: using algorithm rvvx8 gen() 1694 MB/s
  raid6: .... xor() 1000 MB/s, rmw enabled
  raid6: using rvv recovery algorithm

[Charlie: - Fixup vector options]

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20250305083707.74218-1-zhangchunyan@iscas.ac.cn
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@dabbelt.com>
2025-06-05 14:03:07 -07:00
Arnd Bergmann
5f5305dea0 raid6: skip avx512 checks
It is no longer necessary to check for CONFIG_AS_AVX512, since the minimum
assembler version is now from binutils-2.30 and this always supports it.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-04-30 21:53:48 +02:00
Heiko Carstens
db14f78ecb s390/vx: Convert cpu_has_vx() to cpu feature function
Instead of having a private cpu_has_vx() implementation use the new common
cpu feature method. Move the facility detection to the decompressor so it
matches all other cpu features.

Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-03-04 17:18:07 +01:00
Samuel Holland
4be073931c lib/raid6: use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source
tree, use it instead of duplicating the flags here.

Link: https://lkml.kernel.org/r/20240329072441.591471-7-samuel.holland@sifive.com
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com> 
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Xuerui <git@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-19 14:36:18 -07:00
Masahiro Yamada
b1992c3772 kbuild: use $(src) instead of $(srctree)/$(src) for source directory
Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for
checked-in source files. It is merely a convention without any functional
difference. In fact, $(obj) and $(src) are exactly the same, as defined
in scripts/Makefile.build:

    src := $(obj)

When the kernel is built in a separate output directory, $(src) does
not accurately reflect the source directory location. While Kbuild
resolves this discrepancy by specifying VPATH=$(srctree) to search for
source files, it does not cover all cases. For example, when adding a
header search path for local headers, -I$(srctree)/$(src) is typically
passed to the compiler.

This introduces inconsistency between upstream and downstream Makefiles
because $(src) is used instead of $(srctree)/$(src) for the latter.

To address this inconsistency, this commit changes the semantics of
$(src) so that it always points to the directory in the source tree.

Going forward, the variables used in Makefiles will have the following
meanings:

  $(obj)     - directory in the object tree
  $(src)     - directory in the source tree  (changed by this commit)
  $(objtree) - the top of the kernel object tree
  $(srctree) - the top of the kernel source tree

Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced
with $(src).

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
2024-05-10 04:34:52 +09:00
Linus Torvalds
e5eb28f6d1 Merge tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:

 - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min
   heap optimizations".

 - Kuan-Wei Chiu has also sped up the library sorting code in the series
   "lib/sort: Optimize the number of swaps and comparisons".

 - Alexey Gladkov has added the ability for code running within an IPC
   namespace to alter its IPC and MQ limits. The series is "Allow to
   change ipc/mq sysctls inside ipc namespace".

 - Geert Uytterhoeven has contributed some dhrystone maintenance work in
   the series "lib: dhry: miscellaneous cleanups".

 - Ryusuke Konishi continues nilfs2 maintenance work in the series

	"nilfs2: eliminate kmap and kmap_atomic calls"
	"nilfs2: fix kernel bug at submit_bh_wbc()"

 - Nathan Chancellor has updated our build tools requirements in the
   series "Bump the minimum supported version of LLVM to 13.0.1".

 - Muhammad Usama Anjum continues with the selftests maintenance work in
   the series "selftests/mm: Improve run_vmtests.sh".

 - Oleg Nesterov has done some maintenance work against the signal code
   in the series "get_signal: minor cleanups and fix".

Plus the usual shower of singleton patches in various parts of the tree.
Please see the individual changelogs for details.

* tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  nilfs2: prevent kernel bug at submit_bh_wbc()
  nilfs2: fix failure to detect DAT corruption in btree and direct mappings
  ocfs2: enable ocfs2_listxattr for special files
  ocfs2: remove SLAB_MEM_SPREAD flag usage
  assoc_array: fix the return value in assoc_array_insert_mid_shortcut()
  buildid: use kmap_local_page()
  watchdog/core: remove sysctl handlers from public header
  nilfs2: use div64_ul() instead of do_div()
  mul_u64_u64_div_u64: increase precision by conditionally swapping a and b
  kexec: copy only happens before uchunk goes to zero
  get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task
  get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig
  get_signal: don't abuse ksig->info.si_signo and ksig->sig
  const_structs.checkpatch: add device_type
  Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>"
  dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace()
  list: leverage list_is_head() for list_entry_is_head()
  nilfs2: MAINTAINERS: drop unreachable project mirror site
  smp: make __smp_processor_id() 0-argument macro
  fat: fix uninitialized field in nostale filehandles
  ...
2024-03-14 18:03:09 -07:00
Nathan Chancellor
2947a4567f treewide: update LLVM Bugzilla links
LLVM moved their issue tracker from their own Bugzilla instance to GitHub
issues.  While all of the links are still valid, they may not necessarily
show the most up to date information around the issues, as all updates
will occur on GitHub, not Bugzilla.

Another complication is that the Bugzilla issue number is not always the
same as the GitHub issue number.  Thankfully, LLVM maintains this mapping
through two shortlinks:

  https://llvm.org/bz<num> -> https://bugs.llvm.org/show_bug.cgi?id=<num>
  https://llvm.org/pr<num> -> https://github.com/llvm/llvm-project/issues/<mapped_num>

Switch all "https://bugs.llvm.org/show_bug.cgi?id=<num>" links to the
"https://llvm.org/pr<num>" shortlink so that the links show the most up to
date information.  Each migrated issue links back to the Bugzilla entry,
so there should be no loss of fidelity of information here.

Link: https://lkml.kernel.org/r/20240109-update-llvm-links-v1-3-eb09b59db071@kernel.org
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Fangrui Song <maskray@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22 15:38:51 -08:00
Heiko Carstens
c8dde11df1 s390/raid6: convert to use standard fpu_*() inline assemblies
Move the s390 specific raid6 inline assemblies, make them generic, and
reuse them to implement the raid6 gen/xor implementation.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:17 +01:00
Heiko Carstens
066c40918b s390/fpu: decrease stack usage for some cases
The kernel_fpu structure has a quite large size of 520 bytes. In order to
reduce stack footprint introduce several kernel fpu structures with
different and also smaller sizes. This way every kernel fpu user must use
the correct variant. A compile time check verifies that the correct variant
is used.

There are several users which use only 16 instead of all 32 vector
registers. For those users the new kernel_fpu_16 structure with a size of
only 266 bytes can be used.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:16 +01:00
Heiko Carstens
fd2527f209 s390/fpu: move, rename, and merge header files
Move, rename, and merge the fpu and vx header files. This way fpu header
files have a consistent naming scheme (fpu*.h).

Also get rid of the fpu subdirectory and move header files to asm
directory, so that all fpu and vx header files can be found at the same
location.

Merge internal.h header file into other header files, since the internal
helpers are used at many locations. so those helper functions are really
not internal.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:14 +01:00
Heiko Carstens
8d16ce1488 s390/fpu: make use of __uninitialized macro
Code sections in s390 specific kernel code which use floating point or
vector registers all come with a 520 byte stack variable to save already in
use registers, if required.

With INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO enabled this variable
will always be initialized on function entry in addition to saving register
contents, which contradicts the intention (performance improvement) of such
code sections.

Therefore provide a DECLARE_KERNEL_FPU_ONSTACK() macro which provides
struct kernel_fpu variables with an __uninitialized attribute, and convert
all existing code to use this.

This way only this specific type of stack variable will not be initialized,
regardless of config options.

Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240205154844.3757121-3-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:16 +01:00
Heiko Carstens
18564756ab s390/fpu: get rid of MACHINE_HAS_VX
Get rid of MACHINE_HAS_VX and replace it with cpu_has_vx() which is a
short readable wrapper for "test_facility(129)".

Facility bit 129 is set if the vector facility is present. test_facility()
returns also true for all bits which are set in the architecture level set
of the cpu that the kernel is compiled for. This means that
test_facility(129) is a compile time constant which returns true for z13
and later, since the vector facility bit is part of the z13 kernel ALS.

In result the compiled code will have less runtime checks, and less code.

Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2023-12-11 14:33:07 +01:00
Ard Biesheuvel
b089ea3cc3 lib/raid6: Drop IA64 support
Drop Itanium support from the RAID6 code, and along with it, the 16x and
32x unrolled versions, which were only used by IA64.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2023-09-11 08:13:18 +00:00
WANG Xuerui
f209132104 raid6: Add LoongArch SIMD recovery implementation
Similar to the syndrome calculation, the recovery algorithms also work
on 64 bytes at a time to align with the L1 cache line size of current
and future LoongArch cores (that we care about). Which means
unrolled-by-4 LSX and unrolled-by-2 LASX code.

The assembly is originally based on the x86 SSSE3/AVX2 ports, but
register allocation has been redone to take advantage of LSX/LASX's 32
vector registers, and instruction sequence has been optimized to suit
(e.g. LoongArch can perform per-byte srl and andi on vectors, but x86
cannot).

Performance numbers measured by instrumenting the raid6test code, on a
3A5000 system clocked at 2.5GHz:

> lasx  2data: 354.987 MiB/s
> lasx  datap: 350.430 MiB/s
> lsx   2data: 340.026 MiB/s
> lsx   datap: 337.318 MiB/s
> intx1 2data: 164.280 MiB/s
> intx1 datap: 187.966 MiB/s

Because recovery algorithms are chosen solely based on priority and
availability, lasx is marked as priority 2 and lsx priority 1. At least
for the current generation of LoongArch micro-architectures, LASX should
always be faster than LSX whenever supported, and have similar power
consumption characteristics (because the only known LASX-capable uarch,
the LA464, always compute the full 256-bit result for vector ops).

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:53:55 +08:00
WANG Xuerui
8f3f06dfd6 raid6: Add LoongArch SIMD syndrome calculation
The algorithms work on 64 bytes at a time, which is the L1 cache line
size of all current and future LoongArch cores (that we care about), as
confirmed by Huacai. The code is based on the generic int.uc algorithm,
unrolled 4 times for LSX and 2 times for LASX. Further unrolling does
not meaningfully improve the performance according to experiments.

Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:

> raid6: lasx     gen() 12726 MB/s
> raid6: lsx      gen() 10001 MB/s
> raid6: int64x8  gen()  2876 MB/s
> raid6: int64x4  gen()  3867 MB/s
> raid6: int64x2  gen()  2531 MB/s
> raid6: int64x1  gen()  1945 MB/s

Comparison of xor() speeds (from different boots but meaningful anyway):

> lasx:    11226 MB/s
> lsx:     6395 MB/s
> int64x4: 2147 MB/s

Performance as measured by raid6test:

> raid6: lasx     gen() 25109 MB/s
> raid6: lsx      gen() 13233 MB/s
> raid6: int64x8  gen()  4164 MB/s
> raid6: int64x4  gen()  6005 MB/s
> raid6: int64x2  gen()  5781 MB/s
> raid6: int64x1  gen()  4119 MB/s
> raid6: using algorithm lasx gen() 25109 MB/s
> raid6: .... xor() 14439 MB/s, rmw enabled

Acked-by: Song Liu <song@kernel.org>
Signed-off-by: WANG Xuerui <git@xen0n.name>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2023-09-06 22:53:55 +08:00
WANG Xuerui
7b3c70c43c raid6: test: only check for Altivec if building on powerpc hosts
Altivec is only available for powerpc hosts, so only check for its
availability when the host is powerpc, to avoid error messages being
shown on architectures other than x86, arm or powerpc.

Signed-off-by: WANG Xuerui <git@xen0n.name>
Link: https://lore.kernel.org/r/20230731104911.411964-6-kernel@xen0n.name
Signed-off-by: Song Liu <song@kernel.org>
2023-08-15 09:40:27 -07:00