linux

mirror of https://github.com/Dasharo/linux.git synced 2026-03-06 15:25:10 -08:00

Author	SHA1	Message	Date
Linus Torvalds	cb90c8df91	Merge tag 'sched-urgent-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Revert a scheduler performance optimization that regressed other workloads" * tag 'sched-urgent-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "sched/core: Reduce cost of sched_move_task when config autogroup"	2025-03-21 08:48:40 -07:00
Linus Torvalds	a1cffe8cc8	Merge tag 'dma-mapping-6.14-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fix from Marek Szyprowski: - fix missing clear bdr in check_ram_in_range_map() (Baochen Qiang) * tag 'dma-mapping-6.14-2025-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: fix missing clear bdr in check_ram_in_range_map()	2025-03-20 16:55:24 -07:00
Linus Torvalds	47c7efa4f0	Merge tag 'probes-fixes-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - Clean up tprobe correctly when module unload Tracepoint probes do not set TRACEPOINT_STUB on the 'tpoint' pointer when unloading a module, thus they show as a normal 'fprobe' instead of 'tprobe' and never come back - Fix leakage of tprobe module refcount When a tprobe's target module is loaded, it gets the module's refcount in the module notifier but forgot to put it after registering the probe on it. Fix it by getting the refcount only when registering tprobe. * tag 'probes-fixes-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: tprobe-events: Fix leakage of module refcount tracing: tprobe-events: Fix to clean up tprobe correctly when module unload	2025-03-17 14:30:31 -07:00
Linus Torvalds	ad87a8d0c4	Merge tag 'trace-v6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: "Fix ref count of trace_array in error path of histogram file open Tracing instances have a ref count to keep them around while files within their directories are open. This prevents them from being deleted while they are used. The histogram code had some files that needed to take the ref count and that was added, but the error paths did not decrement the ref counts. This caused the instances from ever being removed if a histogram file failed to open due to some error" * tag 'trace-v6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Correct the refcount if the hist/hist_debug file fails to open	2025-03-16 09:05:00 -10:00
Dietmar Eggemann	76f970ce51	Revert "sched/core: Reduce cost of sched_move_task when config autogroup" This reverts commit `eff6c8ce8d`. Hazem reported a 30% drop in UnixBench spawn test with commit `eff6c8ce8d` ("sched/core: Reduce cost of sched_move_task when config autogroup") on a m6g.xlarge AWS EC2 instance with 4 vCPUs and 16 GiB RAM (aarch64) (single level MC sched domain): https://lkml.kernel.org/r/20250205151026.13061-1-hagarhem@amazon.com There is an early bail from sched_move_task() if p->sched_task_group is equal to p's 'cpu cgroup' (sched_get_task_group()). E.g. both are pointing to taskgroup '/user.slice/user-1000.slice/session-1.scope' (Ubuntu '22.04.5 LTS'). So in: do_exit() sched_autogroup_exit_task() sched_move_task() if sched_get_task_group(p) == p->sched_task_group return /* p is enqueued */ dequeue_task() \ sched_change_group() \| task_change_group_fair() \| detach_task_cfs_rq() \| (1) set_task_rq() \| attach_task_cfs_rq() \| enqueue_task() / (1) isn't called for p anymore. Turns out that the regression is related to sgs->group_util in group_is_overloaded() and group_has_capacity(). If (1) isn't called for all the 'spawn' tasks then sgs->group_util is ~900 and sgs->group_capacity = 1024 (single CPU sched domain) and this leads to group_is_overloaded() returning true (2) and group_has_capacity() false (3) much more often compared to the case when (1) is called. I.e. there are much more cases of 'group_is_overloaded' and 'group_fully_busy' in WF_FORK wakeup sched_balance_find_dst_cpu() which then returns much more often a CPU != smp_processor_id() (5). This isn't good for these extremely short running tasks (FORK + EXIT) and also involves calling sched_balance_find_dst_group_cpu() unnecessary (single CPU sched domain). Instead if (1) is called for 'p->flags & PF_EXITING' then the path (4),(6) is taken much more often. select_task_rq_fair(..., wake_flags = WF_FORK) cpu = smp_processor_id() new_cpu = sched_balance_find_dst_cpu(..., cpu, ...) group = sched_balance_find_dst_group(..., cpu) do { update_sg_wakeup_stats() sgs->group_type = group_classify() if group_is_overloaded() (2) return group_overloaded if !group_has_capacity() (3) return group_fully_busy return group_has_spare (4) } while group if local_sgs.group_type > idlest_sgs.group_type return idlest (5) case group_has_spare: if local_sgs.idle_cpus >= idlest_sgs.idle_cpus return NULL (6) Unixbench Tests './Run -c 4 spawn' on: (a) VM AWS instance (m7gd.16xlarge) with v6.13 ('maxcpus=4 nr_cpus=4') and Ubuntu 22.04.5 LTS (aarch64). Shell & test run in '/user.slice/user-1000.slice/session-1.scope'. w/o patch w/ patch 21005 27120 (b) i7-13700K with tip/sched/core ('nosmt maxcpus=8 nr_cpus=8') and Ubuntu 22.04.5 LTS (x86_64). Shell & test run in '/A'. w/o patch w/ patch 67675 88806 CONFIG_SCHED_AUTOGROUP=y & /sys/proc/kernel/sched_autogroup_enabled equal 0 or 1. Reported-by: Hazem Mohamed Abuelfotoh <abuehaze@amazon.com> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Hagar Hemdan <hagarhem@amazon.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250314151345.275739-1-dietmar.eggemann@arm.com	2025-03-15 10:34:27 +01:00
Masami Hiramatsu (Google)	ac91052f0a	tracing: tprobe-events: Fix leakage of module refcount When enabling the tracepoint at loading module, the target module refcount is incremented by find_tracepoint_in_module(). But it is unnecessary because the module is not unloaded while processing module loading callbacks. Moreover, the refcount is not decremented in that function. To be clear the module refcount handling, move the try_module_get() callsite to trace_fprobe_create_internal(), where it is actually required. Link: https://lore.kernel.org/all/174182761071.83274.18334217580449925882.stgit@devnote2/ Fixes: `57a7e6de9e` ("tracing/fprobe: Support raw tracepoints on future loaded modules") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: stable@vger.kernel.org	2025-03-15 08:37:47 +09:00
Masami Hiramatsu (Google)	0a8bb688aa	tracing: tprobe-events: Fix to clean up tprobe correctly when module unload When unloading module, the tprobe events are not correctly cleaned up. Thus it becomes `fprobe-event` and never be enabled again even if loading the same module again. For example; # cd /sys/kernel/tracing # modprobe trace_events_sample # echo 't:my_tprobe foo_bar' >> dynamic_events # cat dynamic_events t:tracepoints/my_tprobe foo_bar # rmmod trace_events_sample # cat dynamic_events f:tracepoints/my_tprobe foo_bar As you can see, the second time my_tprobe starts with 'f' instead of 't'. This unregisters the fprobe and tracepoint callback when module is unloaded but marks the fprobe-event is tprobe-event. Link: https://lore.kernel.org/all/174158724946.189309.15826571379395619524.stgit@mhiramat.tok.corp.google.com/ Fixes: `57a7e6de9e` ("tracing/fprobe: Support raw tracepoints on future loaded modules") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>	2025-03-15 08:37:12 +09:00
Linus Torvalds	a22ea738f4	Merge tag 'sched-urgent-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a sleeping-while-atomic bug caused by a recent optimization utilizing static keys that didn't consider that the static_key_disable() call could be triggered in atomic context. Revert the optimization" * tag 'sched-urgent-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/clock: Don't define sched_clock_irqtime as static key	2025-03-14 09:56:46 -10:00
Linus Torvalds	28c50999c9	Merge tag 'locking-urgent-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc locking fixes from Ingo Molnar: - Restrict the Rust runtime from unintended access to dynamically allocated LockClassKeys - KernelDoc annotation fix - Fix a lock ordering bug in semaphore::up(), related to trying to printk() and wake up the console within critical sections * tag 'locking-urgent-2025-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/semaphore: Use wake_q to wake up processes outside lock critical section locking/rtmutex: Use the 'struct' keyword in kernel-doc comment rust: lockdep: Remove support for dynamically allocated LockClassKeys	2025-03-14 09:41:36 -10:00
Tengda Wu	0b4ffbe488	tracing: Correct the refcount if the hist/hist_debug file fails to open The function event_{hist,hist_debug}_open() maintains the refcount of 'file->tr' and 'file' through tracing_open_file_tr(). However, it does not roll back these counts on subsequent failure paths, resulting in a refcount leak. A very obvious case is that if the hist/hist_debug file belongs to a specific instance, the refcount leak will prevent the deletion of that instance, as it relies on the condition 'tr->ref == 1' within __remove_instance(). Fix this by calling tracing_release_file_tr() on all failure paths in event_{hist,hist_debug}_open() to correct the refcount. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Zheng Yejian <zhengyejian1@huawei.com> Link: https://lore.kernel.org/20250314065335.1202817-1-wutengda@huaweicloud.com Fixes: `1cc111b9cd` ("tracing: Fix uaf issue when open the hist or hist_debug file") Signed-off-by: Tengda Wu <wutengda@huaweicloud.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-03-14 08:29:12 -04:00
Linus Torvalds	b7f94fcf55	Merge tag 'sched_ext-for-6.14-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fix from Tejun Heo: "BPF schedulers could trigger a crash by passing in an invalid CPU to the scx_bpf_select_cpu_dfl() helper. Fix it by verifying input validity" * tag 'sched_ext-for-6.14-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl()	2025-03-12 11:52:04 -10:00
Baochen Qiang	8324993f60	dma-mapping: fix missing clear bdr in check_ram_in_range_map() As discussed in [1], if 'bdr' is set once, it would never get cleared, hence 0 is always returned. Refactor the range check hunk into a new helper dma_find_range(), which allows 'bdr' to be cleared in each iteration. Link: https://lore.kernel.org/all/64931fac-085b-4ff3-9314-84bac2fa9bdb@quicinc.com/ # [1] Fixes: `a409d96009` ("dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM") Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com> Link: https://lore.kernel.org/r/20250307030350.69144-1-quic_bqiang@quicinc.com Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>	2025-03-12 13:41:44 +01:00
Yafang Shao	f3fa0e40df	sched/clock: Don't define sched_clock_irqtime as static key The sched_clock_irqtime was defined as a static key in: `8722903cbb` ("sched: Define sched_clock_irqtime as static key") However, this change introduces a 'sleeping in atomic context' warning: arch/x86/kernel/tsc.c:1214 mark_tsc_unstable() warn: sleeping in atomic context As analyzed by Dan, the affected code path is as follows: vcpu_load() <- disables preempt -> kvm_arch_vcpu_load() -> mark_tsc_unstable() <- sleeps virt/kvm/kvm_main.c 166 void vcpu_load(struct kvm_vcpu vcpu) 167 { 168 int cpu = get_cpu(); ^^^^^^^^^^ This get_cpu() disables preemption. 169 170 __this_cpu_write(kvm_running_vcpu, vcpu); 171 preempt_notifier_register(&vcpu->preempt_notifier); 172 kvm_arch_vcpu_load(vcpu, cpu); 173 put_cpu(); 174 } arch/x86/kvm/x86.c 4979 if (unlikely(vcpu->cpu != cpu) \|\| kvm_check_tsc_unstable()) { 4980 s64 tsc_delta = !vcpu->arch.last_host_tsc ? 0 : 4981 rdtsc() - vcpu->arch.last_host_tsc; 4982 if (tsc_delta < 0) 4983 mark_tsc_unstable("KVM discovered backwards TSC"); arch/x86/kernel/tsc.c 1206 void mark_tsc_unstable(char reason) 1207 { 1208 if (tsc_unstable) 1209 return; 1210 1211 tsc_unstable = 1; 1212 if (using_native_sched_clock()) 1213 clear_sched_clock_stable(); --> 1214 disable_sched_clock_irqtime(); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ kernel/jump_label.c 245 void static_key_disable(struct static_key key) 246 { 247 cpus_read_lock(); ^^^^^^^^^^^^^^^^ This lock has a might_sleep() in it which triggers the static checker warning. 248 static_key_disable_cpuslocked(key); 249 cpus_read_unlock(); 250 } Let revert this change for now as {disable,enable}_sched_clock_irqtime are used in many places, as pointed out by Sean, including the following: The code path in clocksource_watchdog(): clocksource_watchdog() \| -> spin_lock(&watchdog_lock); \| -> __clocksource_unstable() \| -> clocksource.mark_unstable() == tsc_cs_mark_unstable() \| -> disable_sched_clock_irqtime() And the code path in sched_clock_register(): / Cannot register a sched_clock with interrupts on / local_irq_save(flags); ... / Enable IRQ time accounting if we have a fast enough sched_clock() */ if (irqtime > 0 \|\| (irqtime == -1 && rate >= 1000000)) enable_sched_clock_irqtime(); local_irq_restore(flags); [ lkp@intel.com: reported a build error in the prev version ] [ mingo: cherry-picked it over into sched/urgent ] Closes: https://lore.kernel.org/kvm/37a79ba3-9ce0-479c-a5b0-2bd75d573ed3@stanley.mountain/ Fixes: `8722903cbb` ("sched: Define sched_clock_irqtime as static key") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Debugged-by: Dan Carpenter <dan.carpenter@linaro.org> Debugged-by: Sean Christopherson <seanjc@google.com> Debugged-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20250205032438.14668-1-laoar.shao@gmail.com	2025-03-10 14:22:58 +01:00
Waiman Long	85b2b9c16d	locking/semaphore: Use wake_q to wake up processes outside lock critical section A circular lock dependency splat has been seen involving down_trylock(): ====================================================== WARNING: possible circular locking dependency detected 6.12.0-41.el10.s390x+debug ------------------------------------------------------ dd/32479 is trying to acquire lock: 0015a20accd0d4f8 ((console_sem).lock){-.-.}-{2:2}, at: down_trylock+0x26/0x90 but task is already holding lock: 000000017e461698 (&zone->lock){-.-.}-{2:2}, at: rmqueue_bulk+0xac/0x8f0 the existing dependency chain (in reverse order) is: -> #4 (&zone->lock){-.-.}-{2:2}: -> #3 (hrtimer_bases.lock){-.-.}-{2:2}: -> #2 (&rq->__lock){-.-.}-{2:2}: -> #1 (&p->pi_lock){-.-.}-{2:2}: -> #0 ((console_sem).lock){-.-.}-{2:2}: The console_sem -> pi_lock dependency is due to calling try_to_wake_up() while holding the console_sem raw_spinlock. This dependency can be broken by using wake_q to do the wakeup instead of calling try_to_wake_up() under the console_sem lock. This will also make the semaphore's raw_spinlock become a terminal lock without taking any further locks underneath it. The hrtimer_bases.lock is a raw_spinlock while zone->lock is a spinlock. The hrtimer_bases.lock -> zone->lock dependency happens via the debug_objects_fill_pool() helper function in the debugobjects code. -> #4 (&zone->lock){-.-.}-{2:2}: __lock_acquire+0xe86/0x1cc0 lock_acquire.part.0+0x258/0x630 lock_acquire+0xb8/0xe0 _raw_spin_lock_irqsave+0xb4/0x120 rmqueue_bulk+0xac/0x8f0 __rmqueue_pcplist+0x580/0x830 rmqueue_pcplist+0xfc/0x470 rmqueue.isra.0+0xdec/0x11b0 get_page_from_freelist+0x2ee/0xeb0 __alloc_pages_noprof+0x2c2/0x520 alloc_pages_mpol_noprof+0x1fc/0x4d0 alloc_pages_noprof+0x8c/0xe0 allocate_slab+0x320/0x460 ___slab_alloc+0xa58/0x12b0 __slab_alloc.isra.0+0x42/0x60 kmem_cache_alloc_noprof+0x304/0x350 fill_pool+0xf6/0x450 debug_object_activate+0xfe/0x360 enqueue_hrtimer+0x34/0x190 __run_hrtimer+0x3c8/0x4c0 __hrtimer_run_queues+0x1b2/0x260 hrtimer_interrupt+0x316/0x760 do_IRQ+0x9a/0xe0 do_irq_async+0xf6/0x160 Normally a raw_spinlock to spinlock dependency is not legitimate and will be warned if CONFIG_PROVE_RAW_LOCK_NESTING is enabled, but debug_objects_fill_pool() is an exception as it explicitly allows this dependency for non-PREEMPT_RT kernel without causing PROVE_RAW_LOCK_NESTING lockdep splat. As a result, this dependency is legitimate and not a bug. Anyway, semaphore is the only locking primitive left that is still using try_to_wake_up() to do wakeup inside critical section, all the other locking primitives had been migrated to use wake_q to do wakeup outside of the critical section. It is also possible that there are other circular locking dependencies involving printk/console_sem or other existing/new semaphores lurking somewhere which may show up in the future. Let just do the migration now to wake_q to avoid headache like this. Reported-by: yzbot+ed801a886dfdbfe7136d@syzkaller.appspotmail.com Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250307232717.1759087-3-boqun.feng@gmail.com	2025-03-08 00:52:01 +01:00
Randy Dunlap	b3c5ec8b79	locking/rtmutex: Use the 'struct' keyword in kernel-doc comment Add the "struct" keyword to prevent a kernel-doc warning: rtmutex_common.h:67: warning: cannot understand function prototype: 'struct rt_wake_q_head ' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20250307232717.1759087-2-boqun.feng@gmail.com	2025-03-08 00:52:01 +01:00
Linus Torvalds	1c5183aa6e	Merge tag 'sched-urgent-2025-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc scheduler fixes from Ingo Molnar: - Fix deadline scheduler sysctl parameter setting bug - Fix RT scheduler sysctl parameter setting bug - Fix possible memory corruption in child_cfs_rq_on_list() * tag 'sched-urgent-2025-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt: Update limit of sched_rt sysctl in documentation sched/deadline: Use online cpus for validating runtime sched/fair: Fix potential memory corruption in child_cfs_rq_on_list	2025-03-07 10:58:54 -10:00
Linus Torvalds	ab60bd5731	Merge tag 'perf-urgent-2025-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fixes from Ingo Molnar: "Fix a race between PMU registration and event creation, and fix pmus_lock vs. pmus_srcu lock ordering" * tag 'perf-urgent-2025-03-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix perf_pmu_register() vs. perf_init_event() perf/core: Fix pmus_lock vs. pmus_srcu ordering	2025-03-07 10:38:33 -10:00
Linus Torvalds	7f0e9ee5e4	Merge tag 'vfs-6.14-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix spelling mistakes in idmappings.rst - Fix RCU warnings in override_creds()/revert_creds() - Create new pid namespaces with default limit now that pid_max is namespaced * tag 'vfs-6.14-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: pid: Do not set pid_max in new pid namespaces doc: correcting two prefix errors in idmappings.rst cred: Fix RCU warnings in override/revert_creds	2025-03-06 08:04:49 -10:00
Shrikanth Hegde	14672f059d	sched/deadline: Use online cpus for validating runtime The ftrace selftest reported a failure because writing -1 to sched_rt_runtime_us returns -EBUSY. This happens when the possible CPUs are different from active CPUs. Active CPUs are part of one root domain, while remaining CPUs are part of def_root_domain. Since active cpumask is being used, this results in cpus=0 when a non active CPUs is used in the loop. Fix it by looping over the online CPUs instead for validating the bandwidth calculations. Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20250306052954.452005-2-sshegde@linux.ibm.com	2025-03-06 10:21:31 +01:00
Michal Koutný	d385c8bceb	pid: Do not set pid_max in new pid namespaces It is already difficult for users to troubleshoot which of multiple pid limits restricts their workload. The per-(hierarchical-)NS pid_max would contribute to the confusion. Also, the implementation copies the limit upon creation from parent, this pattern showed cumbersome with some attributes in legacy cgroup controllers -- it's subject to race condition between parent's limit modification and children creation and once copied it must be changed in the descendant. Let's do what other places do (ucounts or cgroup limits) -- create new pid namespaces without any limit at all. The global limit (actually any ancestor's limit) is still effectively in place, we avoid the set/unshare race and bumps of global (ancestral) limit have the desired effect on pid namespace that do not care. Link: https://lore.kernel.org/r/20240408145819.8787-1-mkoutny@suse.com/ Link: https://lore.kernel.org/r/20250221170249.890014-1-mkoutny@suse.com/ Fixes: `7863dcc72d` ("pid: allow pid_max to be set per pid namespace") Signed-off-by: Michal Koutný <mkoutny@suse.com> Link: https://lore.kernel.org/r/20250305145849.55491-1-mkoutny@suse.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-03-06 10:18:36 +01:00
Zecheng Li	3b4035ddbf	sched/fair: Fix potential memory corruption in child_cfs_rq_on_list child_cfs_rq_on_list attempts to convert a 'prev' pointer to a cfs_rq. This 'prev' pointer can originate from struct rq's leaf_cfs_rq_list, making the conversion invalid and potentially leading to memory corruption. Depending on the relative positions of leaf_cfs_rq_list and the task group (tg) pointer within the struct, this can cause a memory fault or access garbage data. The issue arises in list_add_leaf_cfs_rq, where both cfs_rq->leaf_cfs_rq_list and rq->leaf_cfs_rq_list are added to the same leaf list. Also, rq->tmp_alone_branch can be set to rq->leaf_cfs_rq_list. This adds a check `if (prev == &rq->leaf_cfs_rq_list)` after the main conditional in child_cfs_rq_on_list. This ensures that the container_of operation will convert a correct cfs_rq struct. This check is sufficient because only cfs_rqs on the same CPU are added to the list, so verifying the 'prev' pointer against the current rq's list head is enough. Fixes a potential memory corruption issue that due to current struct layout might not be manifesting as a crash but could lead to unpredictable behavior when the layout changes. Fixes: `fdaba61ef8` ("sched/fair: Ensure that the CFS parent is added after unthrottling") Signed-off-by: Zecheng Li <zecheng@google.com> Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20250304214031.2882646-1-zecheng@google.com	2025-03-05 17:30:54 +01:00
Andrea Righi	9360dfe4cb	sched_ext: Validate prev_cpu in scx_bpf_select_cpu_dfl() If a BPF scheduler provides an invalid CPU (outside the nr_cpu_ids range) as prev_cpu to scx_bpf_select_cpu_dfl() it can cause a kernel crash. To prevent this, validate prev_cpu in scx_bpf_select_cpu_dfl() and trigger an scx error if an invalid CPU is specified. Fixes: `f0e1a0643a` ("sched_ext: Implement BPF extensible scheduler class") Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2025-03-03 07:55:48 -10:00
Linus Torvalds	26edad06d5	Merge tag 'probes-fixes-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probe events fixes from Masami Hiramatsu: - probe-events: Remove unused MAX_ARG_BUF_LEN macro - it is not used - fprobe-events: Log error for exceeding the number of entry args. Since the max number of entry args is limited, it should be checked and rejected when the parser detects it. - tprobe-events: Reject invalid tracepoint name If a user specifies an invalid tracepoint name (e.g. including '/') then the new event is not defined correctly in the eventfs. - tprobe-events: Fix a memory leak when tprobe defined with $retval There is a memory leak if tprobe is defined with $retval. * tag 'probes-fixes-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: probe-events: Remove unused MAX_ARG_BUF_LEN macro tracing: fprobe-events: Log error for exceeding the number of entry args tracing: tprobe-events: Reject invalid tracepoint name tracing: tprobe-events: Fix a memory leak when tprobe with $retval	2025-03-03 07:28:15 -10:00
Masami Hiramatsu (Google)	fd5ba38390	tracing: probe-events: Remove unused MAX_ARG_BUF_LEN macro Commit `18b1e870a4` ("tracing/probes: Add $arg* meta argument for all function args") introduced MAX_ARG_BUF_LEN but it is not used. Remove it. Link: https://lore.kernel.org/all/174055075876.4079315.8805416872155957588.stgit@mhiramat.tok.corp.google.com/ Fixes: `18b1e870a4` ("tracing/probes: Add $arg* meta argument for all function args") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-03-03 11:17:54 +09:00
Peter Zijlstra	003659fec9	perf/core: Fix perf_pmu_register() vs. perf_init_event() There is a fairly obvious race between perf_init_event() doing idr_find() and perf_pmu_register() doing idr_alloc() with an incompletely initialized PMU pointer. Avoid by doing idr_alloc() on a NULL pointer to register the id, and swizzling the real struct pmu pointer at the end using idr_replace(). Also making sure to not set struct pmu members after publishing the struct pmu, duh. [ introduce idr_cmpxchg() in order to better handle the idr_replace() error case -- if it were to return an unexpected pointer, it will already have replaced the value and there is no going back. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20241104135517.858805880@infradead.org	2025-03-01 19:38:42 +01:00

1 2 3 4 5 ...

47147 Commits