Pull perf fix from Borislav Petkov:
- A single data race fix on the perf event cleanup path to avoid
endless loops due to insufficient locking
* tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix data race between perf_event_set_output() and perf_mmap_close()
Pull printk fix from Petr Mladek:
- Make pr_flush() fast when consoles are suspended.
* tag 'printk-for-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: do not wait for consoles when suspended
Pyll sysctl fix from Luis Chamberlain:
"Only one fix for sysctl"
* tag 'sysctl-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE
The console_stop() and console_start() functions call pr_flush().
When suspending, these functions are called by the serial subsystem
while the serial port is suspended. In this scenario, if there are
any pending messages, a call to pr_flush() will always result in a
timeout because the serial port cannot make forward progress. This
causes longer suspend and resume times.
Add a check in pr_flush() so that it will immediately timeout if
the consoles are suspended.
Fixes: 3b604ca812 ("printk: add pr_flush()")
Reported-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20220715061042.373640-2-john.ogness@linutronix.de
"numa_stat" should not be included in the scope of CONFIG_HUGETLB_PAGE, if
CONFIG_HUGETLB_PAGE is not configured even if CONFIG_NUMA is configured,
"numa_stat" is missed form /proc. Move it out of CONFIG_HUGETLB_PAGE to
fix it.
Fixes: 4518085e12 ("mm, sysctl: make NUMA stats configurable")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, bpf and wireless.
Still no major regressions, the release continues to be calm. An
uptick of fixes this time around due to trivial data race fixes and
patches flowing down from subtrees.
There has been a few driver fixes (particularly a few fixes for false
positives due to 66e4c8d950 which went into -next in May!) that make
me worry the wide testing is not exactly fully through.
So "calm" but not "let's just cut the final ASAP" vibes over here.
Current release - regressions:
- wifi: rtw88: fix write to const table of channel parameters
Current release - new code bugs:
- mac80211: add gfp_t arg to ieeee80211_obss_color_collision_notify
- mlx5:
- TC, allow offload from uplink to other PF's VF
- Lag, decouple FDB selection and shared FDB
- Lag, correct get the port select mode str
- bnxt_en: fix and simplify XDP transmit path
- r8152: fix accessing unset transport header
Previous releases - regressions:
- conntrack: fix crash due to confirmed bit load reordering (after
atomic -> refcount conversion)
- stmmac: dwc-qos: disable split header for Tegra194
Previous releases - always broken:
- mlx5e: ring the TX doorbell on DMA errors
- bpf: make sure mac_header was set before using it
- mac80211: do not wake queues on a vif that is being stopped
- mac80211: fix queue selection for mesh/OCB interfaces
- ip: fix dflt addr selection for connected nexthop
- seg6: fix skb checksums for SRH encapsulation/insertion
- xdp: fix spurious packet loss in generic XDP TX path
- bunch of sysctl data race fixes
- nf_log: incorrect offset to network header
Misc:
- bpf: add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs"
* tag 'net-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
nfp: flower: configure tunnel neighbour on cmsg rx
net/tls: Check for errors in tls_device_init
MAINTAINERS: Add an additional maintainer to the AMD XGBE driver
xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue
selftests/net: test nexthop without gw
ip: fix dflt addr selection for connected nexthop
net: atlantic: remove aq_nic_deinit() when resume
net: atlantic: remove deep parameter on suspend/resume functions
sfc: fix kernel panic when creating VF
seg6: bpf: fix skb checksum in bpf_push_seg6_encap()
seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors
seg6: fix skb checksum evaluation in SRH encapsulation/insertion
sfc: fix use after free when disabling sriov
net: sunhme: output link status with a single print.
r8152: fix accessing unset transport header
net: stmmac: fix leaks in probe
net: ftgmac100: Hold reference returned by of_get_child_by_name()
nexthop: Fix data-races around nexthop_compat_mode.
ipv4: Fix data-races around sysctl_ip_dynaddr.
tcp: Fix a data-race around sysctl_tcp_ecn_fallback.
...
Pull integrity fixes from Mimi Zohar:
"Here are a number of fixes for recently found bugs.
Only 'ima: fix violation measurement list record' was introduced in
the current release. The rest address existing bugs"
* tag 'integrity-v5.19-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: Fix potential memory leak in ima_init_crypto()
ima: force signature verification when CONFIG_KEXEC_SIG is configured
ima: Fix a potential integer overflow in ima_appraise_measurement
ima: fix violation measurement list record
Revert "evm: Fix memleak in init_desc"
Pull cgroup fix from Tejun Heo:
"Fix an old and subtle bug in the migration path.
css_sets are used to track tasks and migrations are tasks moving from
a group of css_sets to another group of css_sets. The migration path
pins all source and destination css_sets in the prep stage.
Unfortunately, it was overloading the same list_head entry to track
sources and destinations, which got confused for migrations which are
partially identity leading to use-after-frees.
Fixed by using dedicated list_heads for tracking sources and
destinations"
* tag 'cgroup-for-5.19-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Use separate src/dst nodes when preloading css_sets for migration
Currently, an unsigned kernel could be kexec'ed when IMA arch specific
policy is configured unless lockdown is enabled. Enforce kernel
signature verification check in the kexec_file_load syscall when IMA
arch specific policy is configured.
Fixes: 99d5cadfde ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE")
Reported-and-suggested-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Coiby Xu <coxu@redhat.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.
This patch changes proc_dointvec_ms_jiffies() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side. For now,
proc_dointvec_ms_jiffies() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.
This patch changes proc_dou8vec_minmax() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side. For now,
proc_dou8vec_minmax() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.
Fixes: cb94441306 ("sysctl: add proc_dou8vec_minmax()")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Jihing reported a race between perf_event_set_output() and
perf_mmap_close():
CPU1 CPU2
perf_mmap_close(e2)
if (atomic_dec_and_test(&e2->rb->mmap_count)) // 1 - > 0
detach_rest = true
ioctl(e1, IOC_SET_OUTPUT, e2)
perf_event_set_output(e1, e2)
...
list_for_each_entry_rcu(e, &e2->rb->event_list, rb_entry)
ring_buffer_attach(e, NULL);
// e1 isn't yet added and
// therefore not detached
ring_buffer_attach(e1, e2->rb)
list_add_rcu(&e1->rb_entry,
&e2->rb->event_list)
After this; e1 is attached to an unmapped rb and a subsequent
perf_mmap() will loop forever more:
again:
mutex_lock(&e->mmap_mutex);
if (event->rb) {
...
if (!atomic_inc_not_zero(&e->rb->mmap_count)) {
...
mutex_unlock(&e->mmap_mutex);
goto again;
}
}
The loop in perf_mmap_close() holds e2->mmap_mutex, while the attach
in perf_event_set_output() holds e1->mmap_mutex. As such there is no
serialization to avoid this race.
Change perf_event_set_output() to take both e1->mmap_mutex and
e2->mmap_mutex to alleviate that problem. Additionally, have the loop
in perf_mmap() detach the rb directly, this avoids having to wait for
the concurrent perf_mmap_close() to get around to doing it to make
progress.
Fixes: 9bb5d40cd9 ("perf: Fix mmap() accounting hole")
Reported-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Yang Jihong <yangjihong1@huawei.com>
Link: https://lkml.kernel.org/r/YsQ3jm2GR38SW7uD@worktop.programming.kicks-ass.net
Pull tracing fixes from Steven Rostedt:
"Fixes and minor clean ups for tracing:
- Fix memory leak by reverting what was thought to be a double free.
A static tool had gave a false positive that a double free was
possible in the error path, but it was actually a different
location that confused the static analyzer (and those of us that
reviewed it).
- Move use of static buffers by ftrace_dump() to a location that can
be used by kgdb's ftdump(), as it needs it for the same reasons.
- Clarify in the Kconfig description that function tracing has
negligible impact on x86, but may have a bit bigger impact on other
architectures.
- Remove unnecessary extra semicolon in trace event.
- Make a local variable static that is used in the fprobes sample
- Use KSYM_NAME_LEN for length of function in kprobe sample and get
rid of unneeded macro for the same purpose"
* tag 'trace-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
samples: Use KSYM_NAME_LEN for kprobes
fprobe/samples: Make sample_probe static
blk-iocost: tracing: atomic64_read(&ioc->vtime_rate) is assigned an extra semicolon
ftrace: Be more specific about arch impact when function tracer is enabled
tracing: Fix sleeping while atomic in kdb ftdump
tracing/histograms: Fix memory leak problem
It was brought up that on ARMv7, that because the FUNCTION_TRACER does not
use nops to keep function tracing disabled because of the use of a link
register, it does have some performance impact.
The start of functions when -pg is used to compile the kernel is:
push {lr}
bl 8010e7c0 <__gnu_mcount_nc>
When function tracing is tuned off, it becomes:
push {lr}
add sp, sp, #4
Which just puts the stack back to its normal location. But these two
instructions at the start of every function does incur some overhead.
Be more honest in the Kconfig FUNCTION_TRACER description and specify that
the overhead being in the noise was x86 specific, but other architectures
may vary.
Link: https://lore.kernel.org/all/20220705105416.GE5208@pengutronix.de/
Link: https://lkml.kernel.org/r/20220706161231.085a83da@gandalf.local.home
Reported-by: Sascha Hauer <sha@pengutronix.de>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
If you drop into kdb and type "ftdump" you'll get a sleeping while
atomic warning from memory allocation in trace_find_next_entry().
This appears to have been caused by commit ff895103a8 ("tracing:
Save off entry when peeking at next entry"), which added the
allocation in that path. The problematic commit was already fixed by
commit 8e99cf91b9 ("tracing: Do not allocate buffer in
trace_find_next_entry() in atomic") but that fix missed the kdb case.
The fix here is easy: just move the assignment of the static buffer to
the place where it should have been to begin with:
trace_init_global_iter(). That function is called in two places, once
is right before the assignment of the static buffer added by the
previous fix and once is in kdb.
Note that it appears that there's a second static buffer that we need
to assign that was added in commit efbbdaa22b ("tracing: Show real
address for trace event arguments"), so we'll move that too.
Link: https://lkml.kernel.org/r/20220708170919.1.I75844e5038d9425add2ad853a608cb44bb39df40@changeid
Fixes: ff895103a8 ("tracing: Save off entry when peeking at next entry")
Fixes: efbbdaa22b ("tracing: Show real address for trace event arguments")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
This reverts commit 46bbe5c671.
As commit 46bbe5c671 ("tracing: fix double free") said, the
"double free" problem reported by clang static analyzer is:
> In parse_var_defs() if there is a problem allocating
> var_defs.expr, the earlier var_defs.name is freed.
> This free is duplicated by free_var_defs() which frees
> the rest of the list.
However, if there is a problem allocating N-th var_defs.expr:
+ in parse_var_defs(), the freed 'earlier var_defs.name' is
actually the N-th var_defs.name;
+ then in free_var_defs(), the names from 0th to (N-1)-th are freed;
IF ALLOCATING PROBLEM HAPPENED HERE!!! -+
\
|
0th 1th (N-1)-th N-th V
+-------------+-------------+-----+-------------+-----------
var_defs: | name | expr | name | expr | ... | name | expr | name | ///
+-------------+-------------+-----+-------------+-----------
These two frees don't act on same name, so there was no "double free"
problem before. Conversely, after that commit, we get a "memory leak"
problem because the above "N-th var_defs.name" is not freed.
If enable CONFIG_DEBUG_KMEMLEAK and inject a fault at where the N-th
var_defs.expr allocated, then execute on shell like:
$ echo 'hist:key=call_site:val=$v1,$v2:v1=bytes_req,v2=bytes_alloc' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
Then kmemleak reports:
unreferenced object 0xffff8fb100ef3518 (size 8):
comm "bash", pid 196, jiffies 4295681690 (age 28.538s)
hex dump (first 8 bytes):
76 31 00 00 b1 8f ff ff v1......
backtrace:
[<0000000038fe4895>] kstrdup+0x2d/0x60
[<00000000c99c049a>] event_hist_trigger_parse+0x206f/0x20e0
[<00000000ae70d2cc>] trigger_process_regex+0xc0/0x110
[<0000000066737a4c>] event_trigger_write+0x75/0xd0
[<000000007341e40c>] vfs_write+0xbb/0x2a0
[<0000000087fde4c2>] ksys_write+0x59/0xd0
[<00000000581e9cdf>] do_syscall_64+0x3a/0x80
[<00000000cf3b065c>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Link: https://lkml.kernel.org/r/20220711014731.69520-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Fixes: 46bbe5c671 ("tracing: fix double free")
Reported-by: Hulk Robot <hulkci@huawei.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull module fixes from Luis Chamberlain:
"Although most of the move of code in in v5.19-rc1 should have not
introduced a regression patch review on one of the file changes
captured a checkpatch warning which advised to use strscpy() and it
caused a buffer overflow when an incorrect length is passed.
Another change which checkpatch complained about was an odd RCU usage,
but that was properly addressed in a separate patch to the move by
Aaron. That caused a regression with PREEMPT_RT=y due to an unbounded
latency.
This series fixes both and adjusts documentation which we forgot to do
for the move"
* tag 'modules-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module: kallsyms: Ensure preemption in add_kallsyms() with PREEMPT_RT
doc: module: update file references
module: Fix "warning: variable 'exit' set but not used"
module: Fix selfAssignment cppcheck warning
modules: Fix corruption of /proc/kallsyms
The commit 08126db5ff ("module: kallsyms: Fix suspicious rcu usage")
under PREEMPT_RT=y, disabling preemption introduced an unbounded
latency since the loop is not fixed. This change caused a regression
since previously preemption was not disabled and we would dereference
RCU-protected pointers explicitly. That being said, these pointers
cannot change.
Before kallsyms-specific data is prepared/or set-up, we ensure that
the unformed module is known to be unique i.e. does not already exist
(see load_module()). Therefore, we can fix this by using the common and
more appropriate RCU flavour as this section of code can be safely
preempted.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Fixes: 08126db5ff ("module: kallsyms: Fix suspicious rcu usage")
Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
As Chris explains, the comment above exit_itimers() is not correct,
we can race with proc_timers_seq_ops. Change exit_itimers() to clear
signal->posix_timers with ->siglock held.
Cc: <stable@vger.kernel.org>
Reported-by: chris@accessvector.net
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CI reported the following splat while running the strace testsuite:
WARNING: CPU: 1 PID: 3570031 at kernel/ptrace.c:272 ptrace_check_attach+0x12e/0x178
CPU: 1 PID: 3570031 Comm: strace Tainted: G OE 5.19.0-20220624.rc3.git0.ee819a77d4e7.300.fc36.s390x #1
Hardware name: IBM 3906 M04 704 (z/VM 7.1.0)
Call Trace:
[<00000000ab4b645a>] ptrace_check_attach+0x132/0x178
([<00000000ab4b6450>] ptrace_check_attach+0x128/0x178)
[<00000000ab4b6cde>] __s390x_sys_ptrace+0x86/0x160
[<00000000ac03fcec>] __do_syscall+0x1d4/0x200
[<00000000ac04e312>] system_call+0x82/0xb0
Last Breaking-Event-Address:
[<00000000ab4ea3c8>] wait_task_inactive+0x98/0x190
This is because JOBCTL_TRACED is set, but the task is not in TASK_TRACED
state. Caused by ptrace_unfreeze_traced() which does:
task->jobctl &= ~TASK_TRACED
but it should be:
task->jobctl &= ~JOBCTL_TRACED
Fixes: 31cae1eaae ("sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Tested-by: Alexander Gordeev <agordeev@linux.ibm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Borkmann says:
====================
bpf 2022-07-08
We've added 3 non-merge commits during the last 2 day(s) which contain
a total of 7 files changed, 40 insertions(+), 24 deletions(-).
The main changes are:
1) Fix cBPF splat triggered by skb not having a mac header, from Eric Dumazet.
2) Fix spurious packet loss in generic XDP when pushing packets out (note
that native XDP is not affected by the issue), from Johan Almbladh.
3) Fix bpf_dynptr_{read,write}() helper signatures with flag argument before
its set in stone as UAPI, from Joanne Koong.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs
bpf: Make sure mac_header was set before using it
xdp: Fix spurious packet loss in generic XDP TX path
====================
Link: https://lore.kernel.org/r/20220708213418.19626-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.
This patch changes proc_dointvec_jiffies() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side. For now,
proc_dointvec_jiffies() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.
This patch changes proc_doulongvec_minmax() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side. For now,
proc_doulongvec_minmax() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.
This patch changes proc_douintvec_minmax() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side. For now,
proc_douintvec_minmax() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.
Fixes: 61d9b56a89 ("sysctl: add unsigned int range support")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>