linux

mirror of https://github.com/AtlasLinux/linux.git synced 2026-02-02 15:22:09 -08:00

Author	SHA1	Message	Date
Linus Torvalds	67da125e30	Merge tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Paul McKenney: "Documentation updates: - Update whatisRCU.rst and checklist.rst for recent RCU API additions - Fix RCU documentation formatting and typos - Replace dead Ottawa Linux Symposium links in RTFP.txt Miscellaneous RCU updates: - Document that rcu_barrier() hurries RCU_LAZY callbacks - Remove redundant interrupt disabling from rcu_preempt_deferred_qs_handler() - Move list_for_each_rcu from list.h to rculist.h, and adjust the include directive in kernel/cgroup/dmem.c accordingly - Make initial set of changes to accommodate upcoming system_percpu_wq changes SRCU updates: - Create an srcu_read_lock_fast_notrace() for eventual use in tracing, including adding guards - Document the reliance on per-CPU operations as implicit RCU readers in __srcu_read_{,un}lock_fast() - Document the srcu_flip() function's memory-barrier D's relationship to SRCU-fast readers - Remove a redundant preempt_disable() and preempt_enable() pair from srcu_gp_start_if_needed() Torture-test updates: - Fix jitter.sh spin time so that it actually varies as advertised. It is still quite coarse-grained, but at least it does now vary - Update torture.sh help text to include the not-so-new --do-normal parameter, which permits (for example) testing KCSAN kernels without doing non-debug kernels - Fix a number of false-positive diagnostics that were being triggered by rcutorture starting before boot completed. Running multiple near-CPU-bound rcutorture processes when there is only the boot CPU is after all a bit excessive - Substitute kcalloc() for kzalloc() - Remove a redundant kfree() and NULL out kfree()ed objects" * tag 'rcu.2025.09.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits) rcu: WQ_UNBOUND added to sync_wq workqueue rcu: WQ_PERCPU added to alloc_workqueue users rcu: replace use of system_wq with system_percpu_wq refperf: Set reader_tasks to NULL after kfree() refperf: Remove redundant kfree() after torture_stop_kthread() srcu/tiny: Remove preempt_disable/enable() in srcu_gp_start_if_needed() srcu: Document srcu_flip() memory-barrier D relation to SRCU-fast srcu: Document __srcu_read_{,un}lock_fast() implicit RCU readers rculist: move list_for_each_rcu() to where it belongs refscale: Use kcalloc() instead of kzalloc() rcutorture: Use kcalloc() instead of kzalloc() docs: rcu: Replace multiple dead OLS links in RTFP.txt doc: Fix typo in RCU's torture.rst documentation Documentation: RCU: Retitle toctree index Documentation: RCU: Reduce toctree depth Documentation: RCU: Wrap kvm-remote.sh rerun snippet in literal code block rcu: docs: Requirements.rst: Abide by conventions of kernel documentation doc: Add RCU guards to checklist.rst doc: Update whatisRCU.rst for recent RCU API additions rcutorture: Delay forward-progress testing until boot completes ...	2025-10-04 11:28:45 -07:00
Linus Torvalds	ae28ed4578	Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc (Amery Hung) Applied as a stable branch in bpf-next and net-next trees. - Support reading skb metadata via bpf_dynptr (Jakub Sitnicki) Also a stable branch in bpf-next and net-next trees. - Enforce expected_attach_type for tailcall compatibility (Daniel Borkmann) - Replace path-sensitive with path-insensitive live stack analysis in the verifier (Eduard Zingerman) This is a significant change in the verification logic. More details, motivation, long term plans are in the cover letter/merge commit. - Support signed BPF programs (KP Singh) This is another major feature that took years to materialize. Algorithm details are in the cover letter/marge commit - Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich) - Add support for may_goto instruction to arm64 JIT (Puranjay Mohan) - Fix USDT SIB argument handling in libbpf (Jiawei Zhao) - Allow uprobe-bpf program to change context registers (Jiri Olsa) - Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and Puranjay Mohan) - Allow access to union arguments in tracing programs (Leon Hwang) - Optimize rcu_read_lock() + migrate_disable() combination where it's used in BPF subsystem (Menglong Dong) - Introduce bpf_task_work_schedule() kfuncs to schedule deferred execution of BPF callback in the context of a specific task using the kernel’s task_work infrastructure (Mykyta Yatsenko) - Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya Dwivedi) - Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi) - Improve the precision of tnum multiplier verifier operation (Nandakumar Edamana) - Use tnums to improve is_branch_taken() logic (Paul Chaignon) - Add support for atomic operations in arena in riscv JIT (Pu Lehui) - Report arena faults to BPF error stream (Puranjay Mohan) - Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin Monnet) - Add bpf_strcasecmp() kfunc (Rong Tao) - Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao Chen) tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits) libbpf: Replace AF_ALG with open coded SHA-256 selftests/bpf: Add stress test for rqspinlock in NMI selftests/bpf: Add test case for different expected_attach_type bpf: Enforce expected_attach_type for tailcall compatibility bpftool: Remove duplicate string.h header bpf: Remove duplicate crypto/sha2.h header libbpf: Fix error when st-prefix_ops and ops from differ btf selftests/bpf: Test changing packet data from kfunc selftests/bpf: Add stacktrace map lookup_and_delete_elem test case selftests/bpf: Refactor stacktrace_map case with skeleton bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE selftests/bpf: Fix flaky bpf_cookie selftest selftests/bpf: Test changing packet data from global functions with a kfunc bpf: Emit struct bpf_xdp_sock type in vmlinux BTF selftests/bpf: Task_work selftest cleanup fixes MAINTAINERS: Delete inactive maintainers from AF_XDP bpf: Mark kfuncs as __noclone selftests/bpf: Add kprobe multi write ctx attach test selftests/bpf: Add kprobe write ctx attach test selftests/bpf: Add uprobe context ip register change test ...	2025-09-30 17:58:11 -07:00
Linus Torvalds	755fa5b4fb	Merge tag 'cgroup-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Extensive cpuset code cleanup and refactoring work with no functional changes: CPU mask computation logic refactoring, introducing new helpers, removing redundant code paths, and improving error handling for better maintainability. - A few bug fixes to cpuset including fixes for partition creation failures when isolcpus is in use, missing error returns, and null pointer access prevention in free_tmpmasks(). - Core cgroup changes include replacing the global percpu_rwsem with per-threadgroup rwsem when writing to cgroup.procs for better scalability, workqueue conversions to use WQ_PERCPU and system_percpu_wq to prepare for workqueue default switching from percpu to unbound, and removal of unused code including the post_attach callback. - New cgroup.stat.local time accounting feature that tracks frozen time duration. - Misc changes including selftests updates (new freezer time tests and backward compatibility fixes), documentation sync, string function safety improvements, and 64-bit division fixes. * tag 'cgroup-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (39 commits) cpuset: remove is_prs_invalid helper cpuset: remove impossible warning in update_parent_effective_cpumask cpuset: remove redundant special case for null input in node mask update cpuset: fix missing error return in update_cpumask cpuset: Use new excpus for nocpu error check when enabling root partition cpuset: fix failure to enable isolated partition when containing isolcpus Documentation: cgroup-v2: Sync manual toctree cpuset: use partition_cpus_change for setting exclusive cpus cpuset: use parse_cpulist for setting cpus.exclusive cpuset: introduce partition_cpus_change cpuset: refactor cpus_allowed_validate_change cpuset: refactor out validate_partition cpuset: introduce cpus_excl_conflict and mems_excl_conflict helpers cpuset: refactor CPU mask buffer parsing logic cpuset: Refactor exclusive CPU mask computation logic cpuset: change return type of is_partition_[in]valid to bool cpuset: remove unused assignment to trialcs->partition_root_state cpuset: move the root cpuset write check earlier cgroup/cpuset: Remove redundant rcu_read_lock/unlock() in spin_lock cgroup: Remove redundant rcu_read_lock/unlock() in spin_lock ...	2025-09-30 09:55:41 -07:00
Linus Torvalds	18b19abc37	Merge tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull namespace updates from Christian Brauner: "This contains a larger set of changes around the generic namespace infrastructure of the kernel. Each specific namespace type (net, cgroup, mnt, ...) embedds a struct ns_common which carries the reference count of the namespace and so on. We open-coded and cargo-culted so many quirks for each namespace type that it just wasn't scalable anymore. So given there's a bunch of new changes coming in that area I've started cleaning all of this up. The core change is to make it possible to correctly initialize every namespace uniformly and derive the correct initialization settings from the type of the namespace such as namespace operations, namespace type and so on. This leaves the new ns_common_init() function with a single parameter which is the specific namespace type which derives the correct parameters statically. This also means the compiler will yell as soon as someone does something remotely fishy. The ns_common_init() addition also allows us to remove ns_alloc_inum() and drops any special-casing of the initial network namespace in the network namespace initialization code that Linus complained about. Another part is reworking the reference counting. The reference counting was open-coded and copy-pasted for each namespace type even though they all followed the same rules. This also removes all open accesses to the reference count and makes it private and only uses a very small set of dedicated helpers to manipulate them just like we do for e.g., files. In addition this generalizes the mount namespace iteration infrastructure introduced a few cycles ago. As reminder, the vfs makes it possible to iterate sequentially and bidirectionally through all mount namespaces on the system or all mount namespaces that the caller holds privilege over. This allow userspace to iterate over all mounts in all mount namespaces using the listmount() and statmount() system call. Each mount namespace has a unique identifier for the lifetime of the systems that is exposed to userspace. The network namespace also has a unique identifier working exactly the same way. This extends the concept to all other namespace types. The new nstree type makes it possible to lookup namespaces purely by their identifier and to walk the namespace list sequentially and bidirectionally for all namespace types, allowing userspace to iterate through all namespaces. Looking up namespaces in the namespace tree works completely locklessly. This also means we can move the mount namespace onto the generic infrastructure and remove a bunch of code and members from struct mnt_namespace itself. There's a bunch of stuff coming on top of this in the future but for now this uses the generic namespace tree to extend a concept introduced first for pidfs a few cycles ago. For a while now we have supported pidfs file handles for pidfds. This has proven to be very useful. This extends the concept to cover namespaces as well. It is possible to encode and decode namespace file handles using the common name_to_handle_at() and open_by_handle_at() apis. As with pidfs file handles, namespace file handles are exhaustive, meaning it is not required to actually hold a reference to nsfs in able to decode aka open_by_handle_at() a namespace file handle. Instead the FD_NSFS_ROOT constant can be passed which will let the kernel grab a reference to the root of nsfs internally and thus decode the file handle. Namespaces file descriptors can already be derived from pidfds which means they aren't subject to overmount protection bugs. IOW, it's irrelevant if the caller would not have access to an appropriate /proc/<pid>/ns/ directory as they could always just derive the namespace based on a pidfd already. It has the same advantage as pidfds. It's possible to reliably and for the lifetime of the system refer to a namespace without pinning any resources and to compare them trivially. Permission checking is kept simple. If the caller is located in the namespace the file handle refers to they are able to open it otherwise they must hold privilege over the owning namespace of the relevant namespace. The namespace file handle layout is exposed as uapi and has a stable and extensible format. For now it simply contains the namespace identifier, the namespace type, and the inode number. The stable format means that userspace may construct its own namespace file handles without going through name_to_handle_at() as they are already allowed for pidfs and cgroup file handles" * tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (65 commits) ns: drop assert ns: move ns type into struct ns_common nstree: make struct ns_tree private ns: add ns_debug() ns: simplify ns_common_init() further cgroup: add missing ns_common include ns: use inode initializer for initial namespaces selftests/namespaces: verify initial namespace inode numbers ns: rename to __ns_ref nsfs: port to ns_ref_() helpers net: port to ns_ref_() helpers uts: port to ns_ref_() helpers ipv4: use check_net() net: use check_net() net-sysfs: use check_net() user: port to ns_ref_() helpers time: port to ns_ref_() helpers pid: port to ns_ref_() helpers ipc: port to ns_ref_() helpers cgroup: port to ns_ref_() helpers ...	2025-09-29 11:20:29 -07:00
Linus Torvalds	722df25ddf	Merge tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull copy_process updates from Christian Brauner: "This contains the changes to enable support for clone3() on nios2 which apparently is still a thing. The more exciting part of this is that it cleans up the inconsistency in how the 64-bit flag argument is passed from copy_process() into the various other copy_() helpers" [ Fixed up rv ltl_monitor 32-bit support as per Sasha Levin in the merge ] tag 'kernel-6.18-rc1.clone3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nios2: implement architecture-specific portion of sys_clone3 arch: copy_thread: pass clone_flags as u64 copy_process: pass clone_flags as u64 across calltree copy_sighand: Handle architectures where sizeof(unsigned long) < sizeof(u64)	2025-09-29 10:36:50 -07:00
Christian Brauner	4055526d35	ns: move ns type into struct ns_common It's misplaced in struct proc_ns_operations and ns->ops might be NULL if the namespace is compiled out but we still want to know the type of the namespace for the initial namespace struct. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-25 09:23:54 +02:00
Chen Ridong	8f0fdbd4a0	cpuset: remove is_prs_invalid helper The is_prs_invalid helper function is redundant as it serves a similar purpose to is_partition_invalid. It can be fully replaced by the existing is_partition_invalid function, so this patch removes the is_prs_invalid helper. Signed-off-by: Chen Ridong <chenridong@huawei.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-09-22 12:57:46 -10:00
Chen Ridong	39431592e9	cpuset: remove impossible warning in update_parent_effective_cpumask If the parent is not a valid partition, an error will be returned before any partition update command is processed. This means the WARN_ON_ONCE(!is_partition_valid(parent)) can never be triggered, so it is safe to remove. Signed-off-by: Chen Ridong <chenridong@huawei.com> Acked-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-09-22 12:57:46 -10:00
Chen Ridong	b72af996b6	cpuset: remove redundant special case for null input in node mask update The nodelist_parse function already handles empty nodemask input appropriately, making it unnecessary to handle this case separately during the node mask update process. Signed-off-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-09-22 12:57:46 -10:00
Christian Brauner	d7610cb745	ns: simplify ns_common_init() further Simply derive the ns operations from the namespace type. Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-22 14:47:10 +02:00
Chen Ridong	51840f7ba3	cpuset: fix missing error return in update_cpumask The commit `c636673980` ("cpuset: refactor cpus_allowed_validate_change") inadvertently removed the error return when cpus_allowed_validate_change() fails. This patch restores the proper error handling by returning retval when the validation check fails. Fixes: `c636673980` ("cpuset: refactor cpus_allowed_validate_change") Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-09-19 06:43:27 -10:00
Chen Ridong	59d5de3655	cpuset: Use new excpus for nocpu error check when enabling root partition A previous patch fixed a bug where new_prs should be assigned before checking housekeeping conflicts. This patch addresses another potential issue: the nocpu error check currently uses the xcpus which is not updated. Although no issue has been observed so far, the check should be performed using the new effective exclusive cpus. The comment has been removed because the function returns an error if nocpu checking fails, which is unrelated to the parent. Signed-off-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-09-19 06:41:26 -10:00
Chen Ridong	216217ebee	cpuset: fix failure to enable isolated partition when containing isolcpus The 'isolcpus' parameter specified at boot time can be assigned to an isolated partition. While it is valid put the 'isolcpus' in an isolated partition, attempting to change a member cpuset to an isolated partition will fail if the cpuset contains any 'isolcpus'. For example, the system boots with 'isolcpus=9', and the following configuration works correctly: # cd /sys/fs/cgroup/ # mkdir test # echo 1 > test/cpuset.cpus # echo isolated > test/cpuset.cpus.partition # cat test/cpuset.cpus.partition isolated # echo 9 > test/cpuset.cpus # cat test/cpuset.cpus.partition isolated # cat test/cpuset.cpus 9 However, the following steps to convert a member cpuset to an isolated partition will fail: # cd /sys/fs/cgroup/ # mkdir test # echo 9 > test/cpuset.cpus # echo isolated > test/cpuset.cpus.partition # cat test/cpuset.cpus.partition isolated invalid (partition config conflicts with housekeeping setup) The issue occurs because the new partition state (new_prs) is used for validation against housekeeping constraints before it has been properly updated. To resolve this, move the assignment of new_prs before the housekeeping validation check when enabling a root partition. Fixes: `4a74e41888` ("cgroup/cpuset: Check partition conflict with housekeeping setup") Signed-off-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-09-19 06:40:59 -10:00
Christian Brauner	7cf7303211	ns: use inode initializer for initial namespaces Just use the common helper we have. Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-19 16:22:38 +02:00
Christian Brauner	024596a4e2	ns: rename to __ns_ref Make it easier to grep and rename to ns_count. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-19 16:22:38 +02:00
Christian Brauner	be5f21d398	ns: add ns_common_free() And drop ns_free_inum(). Anything common that can be wasted centrally should be wasted in the new common helper. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-19 16:22:36 +02:00
Christian Brauner	5612ff3ec5	nscommon: simplify initialization There's a lot of information that namespace implementers don't need to know about at all. Encapsulate this all in the initialization helper. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-19 14:26:19 +02:00
Christian Brauner	d7afdf8895	ns: add to_<type>_ns() to respective headers Every namespace type has a container_of(ns, <ns_type>, ns) static inline function that is currently not exposed in the header. So we have a bunch of places that open-code it via container_of(). Move it to the headers so we can use it directly. Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-19 14:26:16 +02:00
Christian Brauner	7c60593985	cgroup: support ns lookup Support the generic ns lookup infrastructure to support file handles for namespaces. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-19 14:26:15 +02:00
Christian Brauner	7914f15c5e	Merge branch 'no-rebase-mnt_ns_tree_remove' Bring in the fix for removing a mount namespace from the mount namespace rbtree and list. Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-19 14:26:14 +02:00
Christian Brauner	0b40774ef0	cgroup: use ns_common_init() Don't cargo-cult the same thing over and over. Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-09-19 14:26:13 +02:00
Chen Ridong	c49b5e89c4	cpuset: use partition_cpus_change for setting exclusive cpus A previous patch has introduced a new helper function partition_cpus_change(). Now replace the exclusive cpus setting logic with this helper function. Signed-off-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-09-17 08:37:31 -10:00
Chen Ridong	de9f15e21c	cpuset: use parse_cpulist for setting cpus.exclusive Previous patches made parse_cpulist handle empty cpu mask input. Now use this helper for exclusive cpus setting. Also, compute_trialcs_xcpus can be called with empty cpus and handles it correctly. Signed-off-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-09-17 08:37:31 -10:00
Chen Ridong	27db824600	cpuset: introduce partition_cpus_change Introduce the partition_cpus_change function to handle both regular CPU set updates and exclusive CPU modifications, either of which may trigger partition state changes. This generalized function will also be utilized for exclusive CPU updates in subsequent patches. With the introduction of compute_trialcs_excpus in a previous patch, the trialcs->effective_xcpus field is now consistently computed and maintained. Consequently, the legacy logic which assigned trialcs->allowed_cpus to a local 'xcpus' variable when trialcs->effective_xcpus was empty has been removed. This removal is safe because when trialcs is not a partition member, trialcs->effective_xcpus is now correctly populated with the intersection of user_xcpus and the parent's effective_xcpus. This calculation inherently covers the scenario previously handled by the removed code. Signed-off-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-09-17 08:37:31 -10:00
Chen Ridong	c636673980	cpuset: refactor cpus_allowed_validate_change Refactor cpus_allowed_validate_change to handle the special case where cpuset.cpus can be set even when violating partition sibling CPU exclusivity rules. This differs from the general validation logic in validate_change. Add a wrapper function to properly handle this exceptional case. The trialcs->prs_err field is cleared before performing validation checks for both CPU changes and partition errors. If cpus_allowed_validate_change fails its validation, trialcs->prs_err is set to PERR_NOTEXCL. If partition validation fails, the specific error code returned by validate_partition is assigned to trialcs->prs_err. With the partition validation status now directly available through trialcs->prs_err, the local boolean variable 'invalidate' becomes redundant and can be safely removed. Signed-off-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-09-17 08:37:31 -10:00

1 2 3 4 5 ...

907 Commits