mirror of
https://github.com/Dasharo/linux.git
synced 2026-03-06 15:25:10 -08:00
cd46eb89dff7f6390eaeb11013bb23aae196bbfc
29413 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
212146f080 |
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar: "A couple of kernel side fixes: - Fix the Intel uncore driver on certain hardware configurations - Fix a CPU hotplug related memory allocation bug - Remove a spurious WARN() ... plus also a handful of perf tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf script python: Add Python3 support to tests/attr.py perf trace: Support multiple "vfs_getname" probes perf symbols: Filter out hidden symbols from labels perf symbols: Add fallback definitions for GELF_ST_VISIBILITY() tools headers uapi: Sync linux/in.h copy from the kernel sources perf clang: Do not use 'return std::move(something)' perf mem/c2c: Fix perf_mem_events to support powerpc perf tests evsel-tp-sched: Fix bitwise operator perf/core: Don't WARN() for impossible ring-buffer sizes perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu() perf/x86/intel/uncore: Add Node ID mask |
||
|
|
d2a6aae99f |
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar: "An rtmutex (PI-futex) deadlock scenario fix, plus a locking documentation fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Handle early deadlock return correctly futex: Fix barrier comment |
||
|
|
6b2912cedc |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull signal fixes from Eric Biederman: "This contains four small fixes for signal handling. A missing range check, a regression fix, prioritizing signals we have already started a signal group exit for, and better detection of synchronous signals. The confused decision of which signals to handle failed spectacularly when a timer was pointed at SIGBUS and the stack overflowed. Resulting in an unkillable process in an infinite loop instead of a SIGSEGV and core dump" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signal: Better detection of synchronous signals signal: Always notice exiting tasks signal: Always attempt to allocate siginfo for SIGSTOP signal: Make siginmask safe when passed a signal of 0 |
||
|
|
27b4ad621e |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
"This pull request is dedicated to the upcoming snowpocalypse parts 2
and 3 in the Pacific Northwest:
1) Drop profiles are broken because some drivers use dev_kfree_skb*
instead of dev_consume_skb*, from Yang Wei.
2) Fix IWLWIFI kconfig deps, from Luca Coelho.
3) Fix percpu maps updating in bpftool, from Paolo Abeni.
4) Missing station release in batman-adv, from Felix Fietkau.
5) Fix some networking compat ioctl bugs, from Johannes Berg.
6) ucc_geth must reset the BQL queue state when stopping the device,
from Mathias Thore.
7) Several XDP bug fixes in virtio_net from Toshiaki Makita.
8) TSO packets must be sent always on queue 0 in stmmac, from Jose
Abreu.
9) Fix socket refcounting bug in RDS, from Eric Dumazet.
10) Handle sparse cpu allocations in bpf selftests, from Martynas
Pumputis.
11) Make sure mgmt frames have enough tailroom in mac80211, from Felix
Feitkau.
12) Use safe list walking in sctp_sendmsg() asoc list traversal, from
Greg Kroah-Hartman.
13) Make DCCP's ccid_hc_[rt]x_parse_options always check for NULL
ccid, from Eric Dumazet.
14) Need to reload WoL password into bcmsysport device after deep
sleeps, from Florian Fainelli.
15) Remove filter from mask before freeing in cls_flower, from Petr
Machata.
16) Missing release and use after free in error paths of s390 qeth
code, from Julian Wiedmann.
17) Fix lockdep false positive in dsa code, from Marc Zyngier.
18) Fix counting of ATU violations in mv88e6xxx, from Andrew Lunn.
19) Fix EQ firmware assert in qed driver, from Manish Chopra.
20) Don't default Caivum PTP to Y in kconfig, from Bjorn Helgaas"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net: dsa: b53: Fix for failure when irq is not defined in dt
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
geneve: should not call rt6_lookup() when ipv6 was disabled
net: Don't default Cavium PTP driver to 'y'
net: broadcom: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: via-velocity: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tehuti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: sun: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fsl_ucc_hdlc: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: fec_mpc52xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: smsc: epic100: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: dscc4: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: tulip: de2104x: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net: defxx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
net/mlx5e: Don't overwrite pedit action when multiple pedit used
net/mlx5e: Update hw flows when encap source mac changed
qed*: Advance drivers version to 8.37.0.20
qed: Change verbosity for coalescing message.
qede: Fix system crash on configuring channels.
qed: Consider TX tcs while deriving the max num_queues for PF.
...
|
||
|
|
8c8e62cc98 |
Merge tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH: "Here are some driver core fixes for 5.0-rc6. Well, not so much "driver core" as "debugfs". There's a lot of outstanding debugfs cleanup patches coming in through different subsystem trees, and in that process the debugfs core was found that it really should return errors when something bad happens, to prevent random files from showing up in the root of debugfs afterward. So debugfs was fixed up to handle this properly, and then two fixes for the relay and blk-mq code was needed as it was making invalid assumptions about debugfs return values. There's also a cacheinfo fix in here that resolves a tiny issue. All of these have been in linux-next for over a week with no reported problems" * tag 'driver-core-5.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: blk-mq: protect debugfs_create_files() from failures relay: check return of create_buf_file() properly debugfs: debugfs_lookup() should return NULL if not found debugfs: return error values, not NULL debugfs: fix debugfs_rename parameter checking cacheinfo: Keep the old value if of_property_read_u32 fails |
||
|
|
1a1fb985f2 |
futex: Handle early deadlock return correctly
commit |
||
|
|
6f568ebe2a |
futex: Fix barrier comment
The current comment for the barrier that guarantees that waiter increment is always before taking the hb spinlock (barrier (A)) needs to be fixed as it is misplaced. This is obviously referring to hb_waiters_inc, which is a full barrier. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190206185602.949-1-dave@stgolabs.net |
||
|
|
7146db3317 |
signal: Better detection of synchronous signals
Recently syzkaller was able to create unkillablle processes by
creating a timer that is delivered as a thread local signal on SIGHUP,
and receiving SIGHUP SA_NODEFERER. Ultimately causing a loop failing
to deliver SIGHUP but always trying.
When the stack overflows delivery of SIGHUP fails and force_sigsegv is
called. Unfortunately because SIGSEGV is numerically higher than
SIGHUP next_signal tries again to deliver a SIGHUP.
From a quality of implementation standpoint attempting to deliver the
timer SIGHUP signal is wrong. We should attempt to deliver the
synchronous SIGSEGV signal we just forced.
We can make that happening in a fairly straight forward manner by
instead of just looking at the signal number we also look at the
si_code. In particular for exceptions (aka synchronous signals) the
si_code is always greater than 0.
That still has the potential to pick up a number of asynchronous
signals as in a few cases the same si_codes that are used
for synchronous signals are also used for asynchronous signals,
and SI_KERNEL is also included in the list of possible si_codes.
Still the heuristic is much better and timer signals are definitely
excluded. Which is enough to prevent all known ways for someone
sending a process signals fast enough to cause unexpected and
arguably incorrect behavior.
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
35634ffa17 |
signal: Always notice exiting tasks
Recently syzkaller was able to create unkillablle processes by creating a timer that is delivered as a thread local signal on SIGHUP, and receiving SIGHUP SA_NODEFERER. Ultimately causing a loop failing to deliver SIGHUP but always trying. Upon examination it turns out part of the problem is actually most of the solution. Since 2.5 signal delivery has found all fatal signals, marked the signal group for death, and queued SIGKILL in every threads thread queue relying on signal->group_exit_code to preserve the information of which was the actual fatal signal. The conversion of all fatal signals to SIGKILL results in the synchronous signal heuristic in next_signal kicking in and preferring SIGHUP to SIGKILL. Which is especially problematic as all fatal signals have already been transformed into SIGKILL. Instead of dequeueing signals and depending upon SIGKILL to be the first signal dequeued, first test if the signal group has already been marked for death. This guarantees that nothing in the signal queue can prevent a process that needs to exit from exiting. Cc: stable@vger.kernel.org Tested-by: Dmitry Vyukov <dvyukov@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Ref: ebf5ebe31d2c ("[PATCH] signal-fixes-2.5.59-A4") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
|
|
4879f11615 |
Merge tag 'trace-v5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"This has two fixes for uprobe code.
- Cut and paste fix to have uprobe printks say "uprobe" and not
"kprobe"
- Add terminating '\0' byte when copying function arguments"
* tag 'trace-v5.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/uprobes: Fix output for multiple string arguments
tracing: uprobes: Fix typo in pr_fmt string
|
||
|
|
a692933a87 |
signal: Always attempt to allocate siginfo for SIGSTOP
Since 2.5.34 the code has had the potential to not allocate siginfo
for SIGSTOP signals. Except for ptrace this is perfectly fine as only
ptrace can use PTRACE_PEEK_SIGINFO and see what the contents of
the delivered siginfo are.
Users of PTRACE_PEEK_SIGINFO that care about the contents siginfo
for SIGSTOP are rare, but they do exist. A seccomp self test
has cared and lldb cares.
Jack Andersen <jackoalan@gmail.com> writes:
> The patch titled
> `signal: Never allocate siginfo for SIGKILL or SIGSTOP`
> created a regression for users of PTRACE_GETSIGINFO needing to
> discern signals that were raised via the tgkill syscall.
>
> A notable user of this tgkill+ptrace combination is lldb while
> debugging a multithreaded program. Without the ability to detect a
> SIGSTOP originating from tgkill, lldb does not have a way to
> synchronize on a per-thread basis and falls back to SIGSTOP-ing the
> entire process.
Everyone affected by this please note. The kernel can still fail to
allocate a siginfo structure. The allocation is with GFP_KERNEL and
is best effort only. If memory is tight when the signal allocation
comes in this will fail to allocate a siginfo.
So I strongly recommend looking at more robust solutions for
synchronizing with a single thread such as PTRACE_INTERRUPT. Or if
that does not work persuading your friendly local kernel developer to
build the interface you need.
Reported-by: Tycho Andersen <tycho@tycho.ws>
Reported-by: Kees Cook <keescook@chromium.org>
Reported-by: Jack Andersen <jackoalan@gmail.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Christian Brauner <christian@brauner.io>
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
9dff0aa95a |
perf/core: Don't WARN() for impossible ring-buffer sizes
The perf tool uses /proc/sys/kernel/perf_event_mlock_kb to determine how large its ringbuffer mmap should be. This can be configured to arbitrary values, which can be larger than the maximum possible allocation from kmalloc. When this is configured to a suitably large value (e.g. thanks to the perf fuzzer), attempting to use perf record triggers a WARN_ON_ONCE() in __alloc_pages_nodemask(): WARNING: CPU: 2 PID: 5666 at mm/page_alloc.c:4511 __alloc_pages_nodemask+0x3f8/0xbc8 Let's avoid this by checking that the requested allocation is possible before calling kzalloc. Reported-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20190110142745.25495-1-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
cc6810e36b |
Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull cpu hotplug fixes from Thomas Gleixner:
"Two fixes for the cpu hotplug machinery:
- Replace the overly clever 'SMT disabled by BIOS' detection logic as
it breaks KVM scenarios and prevents speculation control updates
when the Hyperthreads are brought online late after boot.
- Remove a redundant invocation of the speculation control update
function"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
x86/speculation: Remove redundant arch_smt_update() invocation
|
||
|
|
58f6d4287a |
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A pile of perf updates:
- Fix broken sanity check in the /proc/sys/kernel/perf_cpu_time_max_percent
write handler
- Cure a perf script crash which caused by an unitinialized data
structure
- Highlight the hottest instruction in perf top and not a random one
- Cure yet another clang issue when building perf python
- Handle topology entries with no CPU correctly in the tools
- Handle perf data which contains both tracepoints and performance
counter entries correctly.
- Add a missing NULL pointer check in perf ordered_events_free()"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf script: Fix crash when processing recorded stat data
perf top: Fix wrong hottest instruction highlighted
perf tools: Handle TOPOLOGY headers with no CPU
perf python: Remove -fstack-clash-protection when building with some clang versions
perf core: Fix perf_proc_update_handler() bug
perf script: Fix crash with printing mixed trace point and other events
perf ordered_events: Fix crash in ordered_events__free
|
||
|
|
1b69ac6b40 |
psi: fix aggregation idle shut-off
psi has provisions to shut off the periodic aggregation worker when
there is a period of no task activity - and thus no data that needs
aggregating. However, while developing psi monitoring, Suren noticed
that the aggregation clock currently won't stay shut off for good.
Debugging this revealed a flaw in the idle design: an aggregation run
will see no task activity and decide to go to sleep; shortly thereafter,
the kworker thread that executed the aggregation will go idle and cause
a scheduling change, during which the psi callback will kick the
!pending worker again. This will ping-pong forever, and is equivalent
to having no shut-off logic at all (but with more code!)
Fix this by exempting aggregation workers from psi's clock waking logic
when the state change is them going to sleep. To do this, tag workers
with the last work function they executed, and if in psi we see a worker
going to sleep after aggregating psi data, we will not reschedule the
aggregation work item.
What if the worker is also executing other items before or after?
Any psi state times that were incurred by work items preceding the
aggregation work will have been collected from the per-cpu buckets
during the aggregation itself. If there are work items following the
aggregation work, the worker's last_func tag will be overwritten and the
aggregator will be kept alive to process this genuine new activity.
If the aggregation work is the last thing the worker does, and we decide
to go idle, the brief period of non-idle time incurred between the
aggregation run and the kworker's dequeue will be stranded in the
per-cpu buckets until the clock is woken by later activity. But that
should not be a problem. The buckets can hold 4s worth of time, and
future activity will wake the clock with a 2s delay, giving us 2s worth
of data we can leave behind when disabling aggregation. If it takes a
worker more than two seconds to go idle after it finishes its last work
item, we likely have bigger problems in the system, and won't notice one
sample that was averaged with a bogus per-CPU weight.
Link: http://lkml.kernel.org/r/20190116193501.1910-1-hannes@cmpxchg.org
Fixes:
|
||
|
|
8fb335e078 |
kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
Currently, exit_ptrace() adds all ptraced tasks in a dead list, then
zap_pid_ns_processes() waits on all tasks in a current pidns, and only
then are tasks from the dead list released.
zap_pid_ns_processes() can get stuck on waiting tasks from the dead
list. In this case, we will have one unkillable process with one or
more dead children.
Thanks to Oleg for the advice to release tasks in find_child_reaper().
Link: http://lkml.kernel.org/r/20190110175200.12442-1-avagin@gmail.com
Fixes:
|
||
|
|
e7b816415e |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says: ==================== pull-request: bpf 2019-01-31 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) disable preemption in sender side of socket filters, from Alexei. 2) fix two potential deadlocks in syscall bpf lookup and prog_register, from Martin and Alexei. 3) fix BTF to allow typedef on func_proto, from Yonghong. 4) two bpftool fixes, from Jiri and Paolo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
7c4cd051ad |
bpf: Fix syscall's stackmap lookup potential deadlock
The map_lookup_elem used to not acquiring spinlock in order to optimize the reader. It was true until commit |
||
|
|
e16ec34039 |
bpf: fix potential deadlock in bpf_prog_register
Lockdep found a potential deadlock between cpu_hotplug_lock, bpf_event_mutex, and cpuctx_mutex:
[ 13.007000] WARNING: possible circular locking dependency detected
[ 13.007587] 5.0.0-rc3-00018-g2fa53f892422-dirty #477 Not tainted
[ 13.008124] ------------------------------------------------------
[ 13.008624] test_progs/246 is trying to acquire lock:
[ 13.009030] 0000000094160d1d (tracepoints_mutex){+.+.}, at: tracepoint_probe_register_prio+0x2d/0x300
[ 13.009770]
[ 13.009770] but task is already holding lock:
[ 13.010239] 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[ 13.010877]
[ 13.010877] which lock already depends on the new lock.
[ 13.010877]
[ 13.011532]
[ 13.011532] the existing dependency chain (in reverse order) is:
[ 13.012129]
[ 13.012129] -> #4 (bpf_event_mutex){+.+.}:
[ 13.012582] perf_event_query_prog_array+0x9b/0x130
[ 13.013016] _perf_ioctl+0x3aa/0x830
[ 13.013354] perf_ioctl+0x2e/0x50
[ 13.013668] do_vfs_ioctl+0x8f/0x6a0
[ 13.014003] ksys_ioctl+0x70/0x80
[ 13.014320] __x64_sys_ioctl+0x16/0x20
[ 13.014668] do_syscall_64+0x4a/0x180
[ 13.015007] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.015469]
[ 13.015469] -> #3 (&cpuctx_mutex){+.+.}:
[ 13.015910] perf_event_init_cpu+0x5a/0x90
[ 13.016291] perf_event_init+0x1b2/0x1de
[ 13.016654] start_kernel+0x2b8/0x42a
[ 13.016995] secondary_startup_64+0xa4/0xb0
[ 13.017382]
[ 13.017382] -> #2 (pmus_lock){+.+.}:
[ 13.017794] perf_event_init_cpu+0x21/0x90
[ 13.018172] cpuhp_invoke_callback+0xb3/0x960
[ 13.018573] _cpu_up+0xa7/0x140
[ 13.018871] do_cpu_up+0xa4/0xc0
[ 13.019178] smp_init+0xcd/0xd2
[ 13.019483] kernel_init_freeable+0x123/0x24f
[ 13.019878] kernel_init+0xa/0x110
[ 13.020201] ret_from_fork+0x24/0x30
[ 13.020541]
[ 13.020541] -> #1 (cpu_hotplug_lock.rw_sem){++++}:
[ 13.021051] static_key_slow_inc+0xe/0x20
[ 13.021424] tracepoint_probe_register_prio+0x28c/0x300
[ 13.021891] perf_trace_event_init+0x11f/0x250
[ 13.022297] perf_trace_init+0x6b/0xa0
[ 13.022644] perf_tp_event_init+0x25/0x40
[ 13.023011] perf_try_init_event+0x6b/0x90
[ 13.023386] perf_event_alloc+0x9a8/0xc40
[ 13.023754] __do_sys_perf_event_open+0x1dd/0xd30
[ 13.024173] do_syscall_64+0x4a/0x180
[ 13.024519] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.024968]
[ 13.024968] -> #0 (tracepoints_mutex){+.+.}:
[ 13.025434] __mutex_lock+0x86/0x970
[ 13.025764] tracepoint_probe_register_prio+0x2d/0x300
[ 13.026215] bpf_probe_register+0x40/0x60
[ 13.026584] bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[ 13.027042] __do_sys_bpf+0x94f/0x1a90
[ 13.027389] do_syscall_64+0x4a/0x180
[ 13.027727] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.028171]
[ 13.028171] other info that might help us debug this:
[ 13.028171]
[ 13.028807] Chain exists of:
[ 13.028807] tracepoints_mutex --> &cpuctx_mutex --> bpf_event_mutex
[ 13.028807]
[ 13.029666] Possible unsafe locking scenario:
[ 13.029666]
[ 13.030140] CPU0 CPU1
[ 13.030510] ---- ----
[ 13.030875] lock(bpf_event_mutex);
[ 13.031166] lock(&cpuctx_mutex);
[ 13.031645] lock(bpf_event_mutex);
[ 13.032135] lock(tracepoints_mutex);
[ 13.032441]
[ 13.032441] *** DEADLOCK ***
[ 13.032441]
[ 13.032911] 1 lock held by test_progs/246:
[ 13.033239] #0: 00000000d663ef86 (bpf_event_mutex){+.+.}, at: bpf_probe_register+0x1d/0x60
[ 13.033909]
[ 13.033909] stack backtrace:
[ 13.034258] CPU: 1 PID: 246 Comm: test_progs Not tainted 5.0.0-rc3-00018-g2fa53f892422-dirty #477
[ 13.034964] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[ 13.035657] Call Trace:
[ 13.035859] dump_stack+0x5f/0x8b
[ 13.036130] print_circular_bug.isra.37+0x1ce/0x1db
[ 13.036526] __lock_acquire+0x1158/0x1350
[ 13.036852] ? lock_acquire+0x98/0x190
[ 13.037154] lock_acquire+0x98/0x190
[ 13.037447] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.037876] __mutex_lock+0x86/0x970
[ 13.038167] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.038600] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.039028] ? __mutex_lock+0x86/0x970
[ 13.039337] ? __mutex_lock+0x24a/0x970
[ 13.039649] ? bpf_probe_register+0x1d/0x60
[ 13.039992] ? __bpf_trace_sched_wake_idle_without_ipi+0x10/0x10
[ 13.040478] ? tracepoint_probe_register_prio+0x2d/0x300
[ 13.040906] tracepoint_probe_register_prio+0x2d/0x300
[ 13.041325] bpf_probe_register+0x40/0x60
[ 13.041649] bpf_raw_tracepoint_open.isra.34+0xa4/0x130
[ 13.042068] ? __might_fault+0x3e/0x90
[ 13.042374] __do_sys_bpf+0x94f/0x1a90
[ 13.042678] do_syscall_64+0x4a/0x180
[ 13.042975] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 13.043382] RIP: 0033:0x7f23b10a07f9
[ 13.045155] RSP: 002b:00007ffdef42fdd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[ 13.045759] RAX: ffffffffffffffda RBX: 00007ffdef42ff70 RCX: 00007f23b10a07f9
[ 13.046326] RDX: 0000000000000070 RSI: 00007ffdef42fe10 RDI: 0000000000000011
[ 13.046893] RBP: 00007ffdef42fdf0 R08: 0000000000000038 R09: 00007ffdef42fe10
[ 13.047462] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[ 13.048029] R13: 0000000000000016 R14: 00007f23b1db4690 R15: 0000000000000000
Since tracepoints_mutex will be taken in tracepoint_probe_register/unregister()
there is no need to take bpf_event_mutex too.
bpf_event_mutex is protecting modifications to prog array used in kprobe/perf bpf progs.
bpf_raw_tracepoints don't need to take this mutex.
Fixes:
|
||
|
|
a89fac57b5 |
bpf: fix lockdep false positive in percpu_freelist
Lockdep warns about false positive:
[ 12.492084] 00000000e6b28347 (&head->lock){+...}, at: pcpu_freelist_push+0x2a/0x40
[ 12.492696] but this lock was taken by another, HARDIRQ-safe lock in the past:
[ 12.493275] (&rq->lock){-.-.}
[ 12.493276]
[ 12.493276]
[ 12.493276] and interrupts could create inverse lock ordering between them.
[ 12.493276]
[ 12.494435]
[ 12.494435] other info that might help us debug this:
[ 12.494979] Possible interrupt unsafe locking scenario:
[ 12.494979]
[ 12.495518] CPU0 CPU1
[ 12.495879] ---- ----
[ 12.496243] lock(&head->lock);
[ 12.496502] local_irq_disable();
[ 12.496969] lock(&rq->lock);
[ 12.497431] lock(&head->lock);
[ 12.497890] <Interrupt>
[ 12.498104] lock(&rq->lock);
[ 12.498368]
[ 12.498368] *** DEADLOCK ***
[ 12.498368]
[ 12.498837] 1 lock held by dd/276:
[ 12.499110] #0: 00000000c58cb2ee (rcu_read_lock){....}, at: trace_call_bpf+0x5e/0x240
[ 12.499747]
[ 12.499747] the shortest dependencies between 2nd lock and 1st lock:
[ 12.500389] -> (&rq->lock){-.-.} {
[ 12.500669] IN-HARDIRQ-W at:
[ 12.500934] _raw_spin_lock+0x2f/0x40
[ 12.501373] scheduler_tick+0x4c/0xf0
[ 12.501812] update_process_times+0x40/0x50
[ 12.502294] tick_periodic+0x27/0xb0
[ 12.502723] tick_handle_periodic+0x1f/0x60
[ 12.503203] timer_interrupt+0x11/0x20
[ 12.503651] __handle_irq_event_percpu+0x43/0x2c0
[ 12.504167] handle_irq_event_percpu+0x20/0x50
[ 12.504674] handle_irq_event+0x37/0x60
[ 12.505139] handle_level_irq+0xa7/0x120
[ 12.505601] handle_irq+0xa1/0x150
[ 12.506018] do_IRQ+0x77/0x140
[ 12.506411] ret_from_intr+0x0/0x1d
[ 12.506834] _raw_spin_unlock_irqrestore+0x53/0x60
[ 12.507362] __setup_irq+0x481/0x730
[ 12.507789] setup_irq+0x49/0x80
[ 12.508195] hpet_time_init+0x21/0x32
[ 12.508644] x86_late_time_init+0xb/0x16
[ 12.509106] start_kernel+0x390/0x42a
[ 12.509554] secondary_startup_64+0xa4/0xb0
[ 12.510034] IN-SOFTIRQ-W at:
[ 12.510305] _raw_spin_lock+0x2f/0x40
[ 12.510772] try_to_wake_up+0x1c7/0x4e0
[ 12.511220] swake_up_locked+0x20/0x40
[ 12.511657] swake_up_one+0x1a/0x30
[ 12.512070] rcu_process_callbacks+0xc5/0x650
[ 12.512553] __do_softirq+0xe6/0x47b
[ 12.512978] irq_exit+0xc3/0xd0
[ 12.513372] smp_apic_timer_interrupt+0xa9/0x250
[ 12.513876] apic_timer_interrupt+0xf/0x20
[ 12.514343] default_idle+0x1c/0x170
[ 12.514765] do_idle+0x199/0x240
[ 12.515159] cpu_startup_entry+0x19/0x20
[ 12.515614] start_kernel+0x422/0x42a
[ 12.516045] secondary_startup_64+0xa4/0xb0
[ 12.516521] INITIAL USE at:
[ 12.516774] _raw_spin_lock_irqsave+0x38/0x50
[ 12.517258] rq_attach_root+0x16/0xd0
[ 12.517685] sched_init+0x2f2/0x3eb
[ 12.518096] start_kernel+0x1fb/0x42a
[ 12.518525] secondary_startup_64+0xa4/0xb0
[ 12.518986] }
[ 12.519132] ... key at: [<ffffffff82b7bc28>] __key.71384+0x0/0x8
[ 12.519649] ... acquired at:
[ 12.519892] pcpu_freelist_pop+0x7b/0xd0
[ 12.520221] bpf_get_stackid+0x1d2/0x4d0
[ 12.520563] ___bpf_prog_run+0x8b4/0x11a0
[ 12.520887]
[ 12.521008] -> (&head->lock){+...} {
[ 12.521292] HARDIRQ-ON-W at:
[ 12.521539] _raw_spin_lock+0x2f/0x40
[ 12.521950] pcpu_freelist_push+0x2a/0x40
[ 12.522396] bpf_get_stackid+0x494/0x4d0
[ 12.522828] ___bpf_prog_run+0x8b4/0x11a0
[ 12.523296] INITIAL USE at:
[ 12.523537] _raw_spin_lock+0x2f/0x40
[ 12.523944] pcpu_freelist_populate+0xc0/0x120
[ 12.524417] htab_map_alloc+0x405/0x500
[ 12.524835] __do_sys_bpf+0x1a3/0x1a90
[ 12.525253] do_syscall_64+0x4a/0x180
[ 12.525659] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 12.526167] }
[ 12.526311] ... key at: [<ffffffff838f7668>] __key.13130+0x0/0x8
[ 12.526812] ... acquired at:
[ 12.527047] __lock_acquire+0x521/0x1350
[ 12.527371] lock_acquire+0x98/0x190
[ 12.527680] _raw_spin_lock+0x2f/0x40
[ 12.527994] pcpu_freelist_push+0x2a/0x40
[ 12.528325] bpf_get_stackid+0x494/0x4d0
[ 12.528645] ___bpf_prog_run+0x8b4/0x11a0
[ 12.528970]
[ 12.529092]
[ 12.529092] stack backtrace:
[ 12.529444] CPU: 0 PID: 276 Comm: dd Not tainted 5.0.0-rc3-00018-g2fa53f892422 #475
[ 12.530043] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
[ 12.530750] Call Trace:
[ 12.530948] dump_stack+0x5f/0x8b
[ 12.531248] check_usage_backwards+0x10c/0x120
[ 12.531598] ? ___bpf_prog_run+0x8b4/0x11a0
[ 12.531935] ? mark_lock+0x382/0x560
[ 12.532229] mark_lock+0x382/0x560
[ 12.532496] ? print_shortest_lock_dependencies+0x180/0x180
[ 12.532928] __lock_acquire+0x521/0x1350
[ 12.533271] ? find_get_entry+0x17f/0x2e0
[ 12.533586] ? find_get_entry+0x19c/0x2e0
[ 12.533902] ? lock_acquire+0x98/0x190
[ 12.534196] lock_acquire+0x98/0x190
[ 12.534482] ? pcpu_freelist_push+0x2a/0x40
[ 12.534810] _raw_spin_lock+0x2f/0x40
[ 12.535099] ? pcpu_freelist_push+0x2a/0x40
[ 12.535432] pcpu_freelist_push+0x2a/0x40
[ 12.535750] bpf_get_stackid+0x494/0x4d0
[ 12.536062] ___bpf_prog_run+0x8b4/0x11a0
It has been explained that is a false positive here:
https://lkml.org/lkml/2018/7/25/756
Recap:
- stackmap uses pcpu_freelist
- The lock in pcpu_freelist is a percpu lock
- stackmap is only used by tracing bpf_prog
- A tracing bpf_prog cannot be run if another bpf_prog
has already been running (ensured by the percpu bpf_prog_active counter).
Eric pointed out that this lockdep splats stops other
legit lockdep splats in selftests/bpf/test_progs.c.
Fix this by calling local_irq_save/restore for stackmap.
Another false positive had also been worked around by calling
local_irq_save in commit
|
||
|
|
6cab5e90ab |
bpf: run bpf programs with preemption disabled
Disabled preemption is necessary for proper access to per-cpu maps from BPF programs. But the sender side of socket filters didn't have preemption disabled: unix_dgram_sendmsg->sk_filter->sk_filter_trim_cap->bpf_prog_run_save_cb->BPF_PROG_RUN and a combination of af_packet with tun device didn't disable either: tpacket_snd->packet_direct_xmit->packet_pick_tx_queue->ndo_select_queue-> tun_select_queue->tun_ebpf_select_queue->bpf_prog_run_clear_cb->BPF_PROG_RUN Disable preemption before executing BPF programs (both classic and extended). Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> |
||
|
|
2c1cf00eea |
relay: check return of create_buf_file() properly
If create_buf_file() returns an error, don't try to reference it later
as a valid dentry pointer.
This problem was exposed when debugfs started to return errors instead
of just NULL for some calls when they do not succeed properly.
Also, the check for WARN_ON(dentry) was just wrong :)
Reported-by: Kees Cook <keescook@chromium.org>
Reported-and-tested-by: syzbot+16c3a70e1e9b29346c43@syzkaller.appspotmail.com
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Fixes:
|
||
|
|
b284909aba |
cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM
With the following commit: |
||
|
|
81f5c6f5db |
bpf: btf: allow typedef func_proto
Current implementation does not allow typedef func_proto.
But it is actually allowed.
-bash-4.4$ cat t.c
typedef int (f) (int);
f *g;
-bash-4.4$ clang -O2 -g -c -target bpf t.c -Xclang -target-feature -Xclang +dwarfris
-bash-4.4$ pahole -JV t.o
File t.o:
[1] PTR (anon) type_id=2
[2] TYPEDEF f type_id=3
[3] FUNC_PROTO (anon) return=4 args=(4 (anon))
[4] INT int size=4 bit_offset=0 nr_bits=32 encoding=SIGNED
-bash-4.4$
This patch related btf verifier to allow such (typedef func_proto)
patterns.
Fixes:
|
||
|
|
34d66caf25 |
x86/speculation: Remove redundant arch_smt_update() invocation
With commit |