asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
Pull more s390 updates from Vasily Gorbik:
- Clean up and improve vdso code: use SYM_* macros for function and
data annotations, add CFI annotations to fix GDB unwinding, optimize
the chacha20 implementation
- Add vfio-ap driver feature advertisement for use by libvirt and
mdevctl
* tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vfio-ap: Driver feature advertisement
s390/vdso: Use one large alternative instead of an alternative branch
s390/vdso: Use SYM_DATA_START_LOCAL()/SYM_DATA_END() for data objects
tools: Add additional SYM_*() stubs to linkage.h
s390/vdso: Use macros for annotation of asm functions
s390/vdso: Add CFI annotations to __arch_chacha20_blocks_nostack()
s390/vdso: Fix comment within __arch_chacha20_blocks_nostack()
s390/vdso: Get rid of permutation constants
Pull memblock updates from Mike Rapoport:
- new memblock_estimated_nr_free_pages() helper to replace
totalram_pages() which is less accurate when
CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
- fixes for memblock tests
* tag 'memblock-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
s390/mm: get estimated free pages by memblock api
kernel/fork.c: get estimated free pages by memblock api
mm/memblock: introduce a new helper memblock_estimated_nr_free_pages()
memblock test: fix implicit declaration of function 'strscpy'
memblock test: fix implicit declaration of function 'isspace'
memblock test: fix implicit declaration of function 'memparse'
memblock test: add the definition of __setup()
memblock test: fix implicit declaration of function 'virt_to_phys'
tools/testing: abstract two init.h into common include directory
memblock tests: include export.h in linkage.h as kernel dose
memblock tests: include memory_hotplug.h in mmzone.h as kernel dose
Pull RISC-V updates from Palmer Dabbelt:
- Support using Zkr to seed KASLR
- Support IPI-triggered CPU backtracing
- Support for generic CPU vulnerabilities reporting to userspace
- A few cleanups for missing licenses
- The size limit on the XIP kernel has been removed
- Support for tracing userspace stacks
- Support for the Svvptc extension
- Various cleanups and fixes throughout the tree
* tag 'riscv-for-linus-6.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (47 commits)
crash: Fix riscv64 crash memory reserve dead loop
perf/riscv-sbi: Add platform specific firmware event handling
tools: Optimize ring buffer for riscv
tools: Add riscv barrier implementation
RISC-V: Don't have MAX_PHYSMEM_BITS exceed phys_addr_t
ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
riscv: Enable bitops instrumentation
riscv: Omit optimized string routines when using KASAN
ACPI: RISCV: Make acpi_numa_get_nid() to be static
riscv: Randomize lower bits of stack address
selftests: riscv: Allow mmap test to compile on 32-bit
riscv: Make riscv_isa_vendor_ext_andes array static
riscv: Use LIST_HEAD() to simplify code
riscv: defconfig: Disable RZ/Five peripheral support
RISC-V: Implement kgdb_roundup_cpus() to enable future NMI Roundup
riscv: avoid Imbalance in RAS
riscv: cacheinfo: Add back init_cache_level() function
riscv: Remove unused _TIF_WORK_MASK
drivers/perf: riscv: Remove redundant macro check
riscv: define ILLEGAL_POINTER_VALUE for 64bit
...
Similar to commit f8d92fc527 ("selftests: vDSO: fix include order in
build of test_vdso_chacha") add SYM_DATA_START, SYM_DATA_START_LOCAL, and
SYM_DATA_END stubs to tools/include/linux/linkage.h so that the proper
macros can be used within the kernel's vdso code as well as in the
vdso_test_chacha selftest.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Pull perf tools updates from Arnaldo Carvalho de Melo:
- Use BPF + BTF to collect and pretty print syscall and tracepoint
arguments in 'perf trace', done as an GSoC activity
- Data-type profiling improvements:
- Cache debuginfo to speed up data type resolution
- Add the 'typecln' sort order, to show which cacheline in a target
is hot or cold. The following shows members in the cfs_rq's first
cache line:
$ perf report -s type,typecln,typeoff -H
...
- 2.67% struct cfs_rq
+ 1.23% struct cfs_rq: cache-line 2
+ 0.57% struct cfs_rq: cache-line 4
+ 0.46% struct cfs_rq: cache-line 6
- 0.41% struct cfs_rq: cache-line 0
0.39% struct cfs_rq +0x14 (h_nr_running)
0.02% struct cfs_rq +0x38 (tasks_timeline.rb_leftmost)
- When a typedef resolves to a unnamed struct, use the typedef name
- When a struct has just one basic type field (int, etc), resolve
the type sort order to the name of the struct, not the type of
the field
- Support type folding/unfolding in the data-type annotation TUI
- Fix bitfields offsets and sizes
- Initial support for PowerPC, using libcapstone and the usual
objdump disassembly parsing routines
- Add support for disassembling and addr2line using the LLVM libraries,
speeding up those operations
- Support --addr2line option in 'perf script' as with other tools
- Intel branch counters (LBR event logging) support, only available in
recent Intel processors, for instance, the new "brcntr" field can be
asked from 'perf script' to print the information collected from this
feature:
$ perf script -F +brstackinsn,+brcntr
# Branch counter abbr list:
# branch-instructions:ppp = A
# branch-misses = B
# '-' No event occurs
# '+' Event occurrences may be lost due to branch counter saturated
tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (home/sdp/test/tchain_edit)
f3+31:
0000000000401774 insn: eb 04 br_cntr: AA # PRED 5 cycles [5]
000000000040177a insn: 81 7d fc 0f 27 00 00
0000000000401781 insn: 7e e3 br_cntr: A # PRED 1 cycles [6] 2.00 IPC
0000000000401766 insn: 8b 45 fc
0000000000401769 insn: 83 e0 01
000000000040176c insn: 85 c0
000000000040176e insn: 74 06 br_cntr: A # PRED 1 cycles [7] 4.00 IPC
0000000000401776 insn: 83 45 fc 01
000000000040177a insn: 81 7d fc 0f 27 00 00
0000000000401781 insn: 7e e3 br_cntr: A # PRED 7 cycles [14] 0.43 IPC
- Support Timed PEBS (Precise Event-Based Sampling), a recent hardware
feature in Intel processors
- Add 'perf ftrace profile' subcommand, using ftrace's function-graph
tracer so that users can see the total, average, max execution time
as well as the number of invocations easily, for instance:
$ sudo perf ftrace profile -G __x64_sys_perf_event_open -- \
perf stat -e cycles -C1 true 2> /dev/null | head
# Total (us) Avg (us) Max (us) Count Function
65.611 65.611 65.611 1 __x64_sys_perf_event_open
30.527 30.527 30.527 1 anon_inode_getfile
30.260 30.260 30.260 1 __anon_inode_getfile
29.700 29.700 29.700 1 alloc_file_pseudo
17.578 17.578 17.578 1 d_alloc_pseudo
17.382 17.382 17.382 1 __d_alloc
16.738 16.738 16.738 1 kmem_cache_alloc_lru
15.686 15.686 15.686 1 perf_event_alloc
14.012 7.006 11.264 2 obj_cgroup_charge
- 'perf sched timehist' improvements, including the addition of
priority showing/filtering command line options
- Varios improvements to the 'perf probe', including 'perf test'
regression testings
- Introduce the 'perf check', initially to check if some feature is
in place, using it in 'perf test'
- Various fixes for 32-bit systems
- Address more leak sanitizer failures
- Fix memory leaks (LBR, disasm lock ops, etc)
- More reference counting fixes (branch_info, etc)
- Constify 'struct perf_tool' parameters to improve code generation
and reduce the chances of having its internals changed, which isn't
expected
- More constifications in various other places
- Add more build tests, including for JEVENTS
- Add more 'perf test' entries ('perf record LBR', pipe/inject,
--setup-filter, 'perf ftrace', 'cgroup sampling', etc)
- Inject build ids for all entries in a call chain in 'perf inject',
not just for the main sample
- Improve the BPF based sample filter, allowing root to setup filters
in bpffs that then can be used by non-root users
- Allow filtering by cgroups with the BPF based sample filter
- Allow a more compact way for 'perf mem report' using the
-T/--type-profile and also provide a --sort option similar to the one
in 'perf report', 'perf top', to setup the sort order manually
- Fix --group behavior in 'perf annotate' when leader has no samples,
where it was not showing anything even when other events in the group
had samples
- Fix spinlock and rwlock accounting in 'perf lock contention'
- Fix libsubcmd fixdep Makefile dependencies
- Improve 'perf ftrace' error message when ftrace isn't available
- Update various Intel JSON vendor event files
- ARM64 CoreSight hardware tracing infrastructure improvements, mostly
not visible to users
- Update power10 JSON events
* tag 'perf-tools-for-v6.12-1-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (310 commits)
perf trace: Mark the 'head' arg in the set_robust_list syscall as coming from user space
perf trace: Mark the 'rseq' arg in the rseq syscall as coming from user space
perf env: Find correct branch counter info on hybrid
perf evlist: Print hint for group
tools: Drop nonsensical -O6
perf pmu: To info add event_type_desc
perf evsel: Add accessor for tool_event
perf pmus: Fake PMU clean up
perf list: Avoid potential out of bounds memory read
perf help: Fix a typo ("bellow")
perf ftrace: Detect whether ftrace is enabled on system
perf test shell probe_vfs_getname: Remove extraneous '=' from probe line number regex
perf build: Require at least clang 16.0.6 to build BPF skeletons
perf trace: If a syscall arg is marked as 'const', assume it is coming _from_ userspace
perf parse-events: Remove duplicated include in parse-events.c
perf callchain: Allow symbols to be optional when resolving a callchain
perf inject: Lazy build-id mmap2 event insertion
perf inject: Add new mmap2-buildid-all option
perf inject: Fix build ID injection
perf annotate-data: Add pr_debug_scope()
...
Hook up the generic vDSO implementation to the aarch64 vDSO data page.
The _vdso_rng_data required data is placed within the _vdso_data vvar
page, by using a offset larger than the vdso_data.
The vDSO function requires a ChaCha20 implementation that does not write
to the stack, and that can do an entire ChaCha20 permutation. The one
provided uses NEON on the permute operation, with a fallback to the
syscall for chips that do not support AdvSIMD.
This also passes the vdso_test_chacha test along with
vdso_test_getrandom. The vdso_test_getrandom bench-single result on
Neoverse-N1 shows:
vdso: 25000000 times in 0.783884250 seconds
libc: 25000000 times in 8.780275399 seconds
syscall: 25000000 times in 8.786581518 seconds
A small fixup to arch/arm64/include/asm/mman.h was required to avoid
pulling kernel code into the vDSO, similar to what's already done in
arch/arm64/include/asm/rwonce.h.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Building test_vdso_chacha currently leads to following issue:
In file included from /home/chleroy/linux-powerpc/include/linux/limits.h:7,
from /opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/local_lim.h:38,
from /opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/posix1_lim.h:161,
from /opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/limits.h:195,
from /opt/powerpc64-e5500--glibc--stable-2024.02-1/lib/gcc/powerpc64-buildroot-linux-gnu/12.3.0/include-fixed/limits.h:203,
from /opt/powerpc64-e5500--glibc--stable-2024.02-1/lib/gcc/powerpc64-buildroot-linux-gnu/12.3.0/include-fixed/syslimits.h:7,
from /opt/powerpc64-e5500--glibc--stable-2024.02-1/lib/gcc/powerpc64-buildroot-linux-gnu/12.3.0/include-fixed/limits.h:34,
from /tmp/sodium/usr/local/include/sodium/export.h:7,
from /tmp/sodium/usr/local/include/sodium/crypto_stream_chacha20.h:14,
from vdso_test_chacha.c:6:
/opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/xopen_lim.h:99:6: error: missing binary operator before token "("
99 | # if INT_MAX == 32767
| ^~~~~~~
/opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/xopen_lim.h:102:7: error: missing binary operator before token "("
102 | # if INT_MAX == 2147483647
| ^~~~~~~
/opt/powerpc64-e5500--glibc--stable-2024.02-1/powerpc64-buildroot-linux-gnu/sysroot/usr/include/bits/xopen_lim.h:126:6: error: missing binary operator before token "("
126 | # if LONG_MAX == 2147483647
| ^~~~~~~~
This is due to kernel include/linux/limits.h being included instead of
libc's limits.h.
This is because directory include/ is added through option -isystem so
it goes prior to glibc's include directory.
Replace -isystem by -idirafter.
But this implies that now tools/include/linux/linkage.h is included
instead of include/linux/linkage.h, so define a stub for
SYM_FUNC_START() and SYM_FUNC_END().
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
During bootup, system may need the number of free pages in the whole system
to do some calculation before all pages are freed to buddy system. Usually
this number is get from totalram_pages(). Since we plan to move the free
pages accounting in __free_pages_core(), this value may not represent
total free pages at the early stage, especially when
CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled.
Instead of using raw memblock api, let's introduce a new helper for user
to get the estimated number of free pages from memblock point of view.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20240808001415.6298-1-richard.weiyang@gmail.com
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Currently, the perf tool infrastructure uses the disasm_line__parse
function to parse disassembled line.
Example snippet from objdump:
objdump --start-address=<address> --stop-address=<address> -d --no-show-raw-insn -C <vmlinux>
c0000000010224b4: lwz r10,0(r9)
This line "lwz r10,0(r9)" is parsed to extract instruction name,
registers names and offset.
In powerpc, the approach for data type profiling uses raw instruction
instead of result from objdump to identify the instruction category and
extract the source/target registers.
Example: 38 01 81 e8 ld r4,312(r1)
Here "38 01 81 e8" is the raw instruction representation. Add function
"disasm_line__parse_powerpc" to handle parsing of raw instruction.
Also update "struct disasm_line" to save the binary code/
With the change, function captures:
line -> "38 01 81 e8 ld r4,312(r1)"
raw instruction "38 01 81 e8"
Raw instruction is used later to extract the reg/offset fields. Macros
are added to extract opcode and register fields. "struct disasm_line"
is updated to carry union of "bytes" and "raw_insn" of 32 bit to carry raw
code (raw).
Function "disasm_line__parse_powerpc fills the raw instruction hex value
and can use macros to get opcode. There is no changes in existing code
paths, which parses the disassembled code. The size of raw instruction
depends on architecture.
In case of powerpc, the parsing the disasm line needs to handle cases
for reading binary code directly from DSO as well as parsing the objdump
result. Hence adding the logic into separate function instead of
updating "disasm_line__parse". The architecture using the instruction
name and present approach is not altered. Since this approach targets
powerpc, the macro implementation is added for powerpc as of now.
Since the disasm_line__parse is used in other cases (perf annotate) and
not only data tye profiling, the powerpc callback includes changes to
work with binary code as well as mnemonic representation.
Also in case if the DSO read fails and libcapstone is not supported, the
approach fallback to use objdump as option. Hence as option, patch has
changes to ensure objdump option also works well.
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Kajol Jain <kjain@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Akanksha J N <akanksha@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Link: https://lore.kernel.org/lkml/20240718084358.72242-5-atrajeev@linux.vnet.ibm.com
[ Add check for strndup() result ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull bitmap updates from Yury Norov:
"Random fixes"
* tag 'bitmap-6.11-rc1' of https://github.com:/norov/linux:
riscv: Remove unnecessary int cast in variable_fls()
radix tree test suite: put definition of bitmap_clear() into lib/bitmap.c
bitops: Add a comment explaining the double underscore macros
lib: bitmap: add missing MODULE_DESCRIPTION() macros
cpumask: introduce assign_cpu() macro
Pull slab updates from Vlastimil Babka:
"The most prominent change this time is the kmem_buckets based
hardening of kmalloc() allocations from Kees Cook.
We have also extended the kmalloc() alignment guarantees for
non-power-of-two sizes in a way that benefits rust.
The rest are various cleanups and non-critical fixups.
- Dedicated bucket allocator (Kees Cook)
This series [1] enhances the probabilistic defense against heap
spraying/grooming of CONFIG_RANDOM_KMALLOC_CACHES from last year.
kmalloc() users that are known to be useful for exploits can get
completely separate set of kmalloc caches that can't be shared with
other users. The first converted users are alloc_msg() and
memdup_user().
The hardening is enabled by CONFIG_SLAB_BUCKETS.
- Extended kmalloc() alignment guarantees (Vlastimil Babka)
For years now we have guaranteed natural alignment for power-of-two
allocations, but nothing was defined for other sizes (in practice,
we have two such buckets, kmalloc-96 and kmalloc-192).
To avoid unnecessary padding in the rust layer due to its alignment
rules, extend the guarantee so that the alignment is at least the
largest power-of-two divisor of the requested size.
This fits what rust needs, is a superset of the existing
power-of-two guarantee, and does not in practice change the layout
(and thus does not add overhead due to padding) of the kmalloc-96
and kmalloc-192 caches, unless slab debugging is enabled for them.
- Cleanups and non-critical fixups (Chengming Zhou, Suren
Baghdasaryan, Matthew Willcox, Alex Shi, and Vlastimil Babka)
Various tweaks related to the new alloc profiling code, folio
conversion, debugging and more leftovers after SLAB"
Link: https://lore.kernel.org/all/20240701190152.it.631-kees@kernel.org/ [1]
* tag 'slab-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/memcg: alignment memcg_data define condition
mm, slab: move prepare_slab_obj_exts_hook under CONFIG_MEM_ALLOC_PROFILING
mm, slab: move allocation tagging code in the alloc path into a hook
mm/util: Use dedicated slab buckets for memdup_user()
ipc, msg: Use dedicated slab buckets for alloc_msg()
mm/slab: Introduce kmem_buckets_create() and family
mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument
mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
mm/slab: Introduce kmem_buckets typedef
slab, rust: extend kmalloc() alignment guarantees to remove Rust padding
slab: delete useless RED_INACTIVE and RED_ACTIVE
slab: don't put freepointer outside of object if only orig_size
slab: make check_object() more consistent
mm: Reduce the number of slab->folio casts
mm, slab: don't wrap internal functions with alloc_hooks()
Pull memblock updates from Mike Rapoport:
- 'reserve_mem' command line parameter to allow creation of named
memory reservation at boot time.
The driving use-case is to improve the ability of pstore to retain
ramoops data across reboots.
- cleanups and small improvements in memblock and mm_init
- new tests cases in memblock test suite
* tag 'memblock-v6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock tests: fix implicit declaration of function 'numa_valid_node'
memblock: Move late alloc warning down to phys alloc
pstore/ramoops: Add ramoops.mem_name= command line option
mm/memblock: Add "reserve_mem" to reserved named memory at boot up
mm/mm_init.c: don't initialize page->lru again
mm/mm_init.c: not always search next deferred_init_pfn from very beginning
mm/mm_init.c: use deferred_init_mem_pfn_range_in_zone() to decide loop condition
mm/mm_init.c: get the highest zone directly
mm/mm_init.c: move nr_initialised reset down a bit
mm/memblock: fix a typo in description of for_each_mem_region()
mm/mm_init.c: use memblock_region_memory_base_pfn() to get startpfn
mm/memblock: use PAGE_ALIGN_DOWN to get pgend in free_memmap
mm/memblock: return true directly on finding overlap region
memblock tests: add memblock_overlaps_region_checks
mm/memblock: fix comment for memblock_isolate_range()
memblock tests: add memblock_reserve_many_may_conflict_check()
memblock tests: add memblock_reserve_all_locations_check()
mm/memblock: remove empty dummy entry
In tools/ directory, function bitmap_clear() is currently only used in
object file tools/testing/radix-tree/xarray.o.
But instead of keeping a bitmap.c with only bitmap_clear() definition in
radix-tree's own directory, it would be more proper to put it in common
directory lib/.
Sync the kernel definition and link some related libs, no functional
change is expected.
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Matthew Wilcox <willy@infradead.org>
CC: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
These seem useless since we use the SLUB_RED_INACTIVE and SLUB_RED_ACTIVE,
so just delete them, no functional change.
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
Reviewed-by: Christoph Lameter (Ampere) <cl@linux.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>