Pull timer fixes from Thomas Gleixner:
"A small set of fixes for time(keeping):
- Add a missing include to prevent compiler warnings.
- Make the VDSO implementation of clock_getres() POSIX compliant
again. A recent change dropped the NULL pointer guard which is
required as NULL is a valid pointer value for this function.
- Fix two function documentation typos"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Fix two trivial comments
timers/sched_clock: Include local timekeeping.h for missing declarations
lib/vdso: Make clock_getres() POSIX compliant again
Pull perf fixes from Thomas Gleixner:
"A set of perf fixes:
kernel:
- Unbreak the tracking of auxiliary buffer allocations which got
imbalanced causing recource limit failures.
- Fix the fallout of splitting of ToPA entries which missed to shift
the base entry PA correctly.
- Use the correct context to lookup the AUX event when unmapping the
associated AUX buffer so the event can be stopped and the buffer
reference dropped.
tools:
- Fix buildiid-cache mode setting in copyfile_mode_ns() when copying
/proc/kcore
- Fix freeing id arrays in the event list so the correct event is
closed.
- Sync sched.h anc kvm.h headers with the kernel sources.
- Link jvmti against tools/lib/ctype.o to have weak strlcpy().
- Fix multiple memory and file descriptor leaks, found by coverity in
perf annotate.
- Fix leaks in error handling paths in 'perf c2c', 'perf kmem', found
by a static analysis tool"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/aux: Fix AUX output stopping
perf/aux: Fix tracking of auxiliary trace buffer allocation
perf/x86/intel/pt: Fix base for single entry topa
perf kmem: Fix memory leak in compact_gfp_flags()
tools headers UAPI: Sync sched.h with the kernel
tools headers kvm: Sync kvm.h headers with the kernel sources
tools headers kvm: Sync kvm headers with the kernel sources
tools headers kvm: Sync kvm headers with the kernel sources
perf c2c: Fix memory leak in build_cl_output()
perf tools: Fix mode setting in copyfile_mode_ns()
perf annotate: Fix multiple memory and file descriptor leaks
perf tools: Fix resource leak of closedir() on the error paths
perf evlist: Fix fix for freed id arrays
perf jvmti: Link against tools/lib/ctype.h to have weak strlcpy()
Pull power management fixes from Rafael Wysocki:
"These fix problems related to frequency limits management in cpufreq
that were introduced during the 5.3 cycle (when PM QoS had started to
be used for that), fix a few issues in the OPP (operating performance
points) library code and fix up the recently added haltpoll cpuidle
driver.
The cpufreq changes are somewhat bigger that I would like them to be
at this stage of the cycle, but the problems fixed by them include
crashes on boot and shutdown in some cases (among other things) and in
my view it is better to address the root of the issue right away.
Specifics:
- Using device PM QoS of CPU devices for managing frequency limits in
cpufreq does not work, so introduce frequency QoS (based on the
original low-level PM QoS) for this purpose, switch cpufreq and
related code over to using it and fix a race involving deferred
updates of frequency limits on top of that (Rafael Wysocki, Sudeep
Holla).
- Avoid calling regulator_enable()/disable() from the OPP framework
to avoid side-effects on boot-enabled regulators that may change
their initial voltage due to performing initial voltage balancing
without all restrictions from the consumers (Marek Szyprowski).
- Avoid a kref management issue in the OPP library code and drop an
incorrectly added lockdep_assert_held() from it (Viresh Kumar).
- Make the recently added haltpoll cpuidle driver take the 'idle='
override into account as appropriate (Zhenzhong Duan)"
* tag 'pm-5.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
opp: Reinitialize the list_kref before adding the static OPPs again
cpufreq: Cancel policy update work scheduled before freeing
cpuidle: haltpoll: Take 'idle=' override into account
opp: core: Revert "add regulators enable and disable"
PM: QoS: Drop frequency QoS types from device PM QoS
cpufreq: Use per-policy frequency QoS
PM: QoS: Introduce frequency QoS
opp: of: drop incorrect lockdep_assert_held()
Pull tracing fixes from Steven Rostedt:
"Two minor fixes:
- A race in perf trace initialization (missing mutexes)
- Minor fix to represent gfp_t in synthetic events as properly
signed"
* tag 'trace-v5.4-rc3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix race in perf_trace_buf initialization
tracing: Fix "gfp_t" format for synthetic events
Recent changes modified the function arguments of
thread_group_sample_cputime() and task_cputimers_expired(), but forgot to
update the comments. Fix it up.
[ tglx: Changed the argument name of task_cputimers_expired() as the pointer
points to an array of samples. ]
Fixes: b7be4ef136 ("posix-cpu-timers: Switch thread group sampling to array")
Fixes: 001f797143 ("posix-cpu-timers: Make expiry checks array based")
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/1571643852-21848-1-git-send-email-wang.yi59@zte.com.cn
Include the timekeeping.h header to get the declaration of the
sched_clock_{suspend,resume} functions. Fixes the following sparse
warnings:
kernel/time/sched_clock.c:275:5: warning: symbol 'sched_clock_suspend' was not declared. Should it be static?
kernel/time/sched_clock.c:286:6: warning: symbol 'sched_clock_resume' was not declared. Should it be static?
Signed-off-by: Ben Dooks (Codethink) <ben.dooks@codethink.co.uk>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191022131226.11465-1-ben.dooks@codethink.co.uk
Commit:
8a58ddae23 ("perf/core: Fix exclusive events' grouping")
allows CAP_EXCLUSIVE events to be grouped with other events. Since all
of those also happen to be AUX events (which is not the case the other
way around, because arch/s390), this changes the rules for stopping the
output: the AUX event may not be on its PMU's context any more, if it's
grouped with a HW event, in which case it will be on that HW event's
context instead. If that's the case, munmap() of the AUX buffer can't
find and stop the AUX event, potentially leaving the last reference with
the atomic context, which will then end up freeing the AUX buffer. This
will then trip warnings:
Fix this by using the context's PMU context when looking for events
to stop, instead of the event's PMU context.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20191022073940.61814-1-alexander.shishkin@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following commit from the v5.4 merge window:
d44248a413 ("perf/core: Rework memory accounting in perf_mmap()")
... breaks auxiliary trace buffer tracking.
If I run command 'perf record -e rbd000' to record samples and saving
them in the **auxiliary** trace buffer then the value of 'locked_vm' becomes
negative after all trace buffers have been allocated and released:
During allocation the values increase:
[52.250027] perf_mmap user->locked_vm:0x87 pinned_vm:0x0 ret:0
[52.250115] perf_mmap user->locked_vm:0x107 pinned_vm:0x0 ret:0
[52.250251] perf_mmap user->locked_vm:0x188 pinned_vm:0x0 ret:0
[52.250326] perf_mmap user->locked_vm:0x208 pinned_vm:0x0 ret:0
[52.250441] perf_mmap user->locked_vm:0x289 pinned_vm:0x0 ret:0
[52.250498] perf_mmap user->locked_vm:0x309 pinned_vm:0x0 ret:0
[52.250613] perf_mmap user->locked_vm:0x38a pinned_vm:0x0 ret:0
[52.250715] perf_mmap user->locked_vm:0x408 pinned_vm:0x2 ret:0
[52.250834] perf_mmap user->locked_vm:0x408 pinned_vm:0x83 ret:0
[52.250915] perf_mmap user->locked_vm:0x408 pinned_vm:0x103 ret:0
[52.251061] perf_mmap user->locked_vm:0x408 pinned_vm:0x184 ret:0
[52.251146] perf_mmap user->locked_vm:0x408 pinned_vm:0x204 ret:0
[52.251299] perf_mmap user->locked_vm:0x408 pinned_vm:0x285 ret:0
[52.251383] perf_mmap user->locked_vm:0x408 pinned_vm:0x305 ret:0
[52.251544] perf_mmap user->locked_vm:0x408 pinned_vm:0x386 ret:0
[52.251634] perf_mmap user->locked_vm:0x408 pinned_vm:0x406 ret:0
[52.253018] perf_mmap user->locked_vm:0x408 pinned_vm:0x487 ret:0
[52.253197] perf_mmap user->locked_vm:0x408 pinned_vm:0x508 ret:0
[52.253374] perf_mmap user->locked_vm:0x408 pinned_vm:0x589 ret:0
[52.253550] perf_mmap user->locked_vm:0x408 pinned_vm:0x60a ret:0
[52.253726] perf_mmap user->locked_vm:0x408 pinned_vm:0x68b ret:0
[52.253903] perf_mmap user->locked_vm:0x408 pinned_vm:0x70c ret:0
[52.254084] perf_mmap user->locked_vm:0x408 pinned_vm:0x78d ret:0
[52.254263] perf_mmap user->locked_vm:0x408 pinned_vm:0x80e ret:0
The value of user->locked_vm increases to a limit then the memory
is tracked by pinned_vm.
During deallocation the size is subtracted from pinned_vm until
it hits a limit. Then a larger value is subtracted from locked_vm
leading to a large number (because of type unsigned):
[64.267797] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x78d
[64.267826] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x70c
[64.267848] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x68b
[64.267869] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x60a
[64.267891] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x589
[64.267911] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x508
[64.267933] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x487
[64.267952] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x406
[64.268883] perf_mmap_close mmap_user->locked_vm:0x307 pinned_vm:0x406
[64.269117] perf_mmap_close mmap_user->locked_vm:0x206 pinned_vm:0x406
[64.269433] perf_mmap_close mmap_user->locked_vm:0x105 pinned_vm:0x406
[64.269536] perf_mmap_close mmap_user->locked_vm:0x4 pinned_vm:0x404
[64.269797] perf_mmap_close mmap_user->locked_vm:0xffffffffffffff84 pinned_vm:0x303
[64.270105] perf_mmap_close mmap_user->locked_vm:0xffffffffffffff04 pinned_vm:0x202
[64.270374] perf_mmap_close mmap_user->locked_vm:0xfffffffffffffe84 pinned_vm:0x101
[64.270628] perf_mmap_close mmap_user->locked_vm:0xfffffffffffffe04 pinned_vm:0x0
This value sticks for the user until system is rebooted, causing
follow-on system calls using locked_vm resource limit to fail.
Note: There is no issue using the normal trace buffer.
In fact the issue is in perf_mmap_close(). During allocation auxiliary
trace buffer memory is either traced as 'extra' and added to 'pinned_vm'
or trace as 'user_extra' and added to 'locked_vm'. This applies for
normal trace buffers and auxiliary trace buffer.
However in function perf_mmap_close() all auxiliary trace buffer is
subtraced from 'locked_vm' and never from 'pinned_vm'. This breaks the
ballance.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: gor@linux.ibm.com
Cc: hechaol@fb.com
Cc: heiko.carstens@de.ibm.com
Cc: linux-perf-users@vger.kernel.org
Cc: songliubraving@fb.com
Fixes: d44248a413 ("perf/core: Rework memory accounting in perf_mmap()")
Link: https://lkml.kernel.org/r/20191021083354.67868-1-tmricht@linux.ibm.com
[ Minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce frequency QoS, based on the "raw" low-level PM QoS, to
represent min and max frequency requests and aggregate constraints.
The min and max frequency requests are to be represented by
struct freq_qos_request objects and the aggregate constraints are to
be represented by struct freq_constraints objects. The latter are
expected to be initialized with the help of freq_constraints_init().
The freq_qos_read_value() helper is defined to retrieve the aggregate
constraints values from a given struct freq_constraints object and
there are the freq_qos_add_request(), freq_qos_update_request() and
freq_qos_remove_request() helpers to manipulate the min and max
frequency requests. It is assumed that the the helpers will not
run concurrently with each other for the same struct freq_qos_request
object, so if that may be the case, their uses must ensure proper
synchronization between them (e.g. through locking).
In addition, freq_qos_add_notifier() and freq_qos_remove_notifier()
are provided to add and remove notifiers that will trigger on aggregate
constraint changes to and from a given struct freq_constraints object,
respectively.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Pull more Kbuild fixes from Masahiro Yamada:
- fix a bashism of setlocalversion
- do not use the too new --sort option of tar
* tag 'kbuild-fixes-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kheaders: substituting --sort in archive creation
scripts: setlocalversion: fix a bashism
kbuild: update comment about KBUILD_ALLDIRS
Pull hrtimer fixlet from Thomas Gleixner:
"A single commit annotating the lockcless access to timer->base with
READ_ONCE() and adding the WRITE_ONCE() counterparts for completeness"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Annotate lockless access to timer->base
Pull stop-machine fix from Thomas Gleixner:
"A single fix, amending stop machine with WRITE/READ_ONCE() to address
the fallout of KCSAN"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stop_machine: Avoid potential race behaviour
Attaching uprobe to text section in THP splits the PMD mapped page table
into PTE mapped entries. On uprobe detach, we would like to regroup PMD
mapped page table entry to regain performance benefit of THP.
However, the regroup is broken For perf_event based trace_uprobe. This
is because perf_event based trace_uprobe calls uprobe_unregister twice
on close: first in TRACE_REG_PERF_CLOSE, then in
TRACE_REG_PERF_UNREGISTER. The second call will split the PMD mapped
page table entry, which is not the desired behavior.
Fix this by only use FOLL_SPLIT_PMD for uprobe register case.
Add a WARN() to confirm uprobe unregister never work on huge pages, and
abort the operation when this WARN() triggers.
Link: http://lkml.kernel.org/r/20191017164223.2762148-6-songliubraving@fb.com
Fixes: 5a52c9df62 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: William Kucharski <william.kucharski@oracle.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In the format of synthetic events, the "gfp_t" is shown as "signed:1",
but in fact the "gfp_t" is "unsigned", should be shown as "signed:0".
The issue can be reproduced by the following commands:
echo 'memlatency u64 lat; unsigned int order; gfp_t gfp_flags; int migratetype' > /sys/kernel/debug/tracing/synthetic_events
cat /sys/kernel/debug/tracing/events/synthetic/memlatency/format
name: memlatency
ID: 2233
format:
field:unsigned short common_type; offset:0; size:2; signed:0;
field:unsigned char common_flags; offset:2; size:1; signed:0;
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
field:int common_pid; offset:4; size:4; signed:1;
field:u64 lat; offset:8; size:8; signed:0;
field:unsigned int order; offset:16; size:4; signed:0;
field:gfp_t gfp_flags; offset:24; size:4; signed:1;
field:int migratetype; offset:32; size:4; signed:1;
print fmt: "lat=%llu, order=%u, gfp_flags=%x, migratetype=%d", REC->lat, REC->order, REC->gfp_flags, REC->migratetype
Link: http://lkml.kernel.org/r/20191018012034.6404-1-zhengjun.xing@linux.intel.com
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull power management fixes from Rafael Wysocki:
"These include a fix for a recent regression in the ACPI CPU
performance scaling code, a PCI device power management fix,
a system shutdown fix related to cpufreq, a removal of an ACPI
suspend-to-idle blacklist entry and a build warning fix.
Specifics:
- Fix possible NULL pointer dereference in the ACPI processor scaling
initialization code introduced by a recent cpufreq update (Rafael
Wysocki).
- Fix possible deadlock due to suspending cpufreq too late during
system shutdown (Rafael Wysocki).
- Make the PCI device system resume code path be more consistent with
its PM-runtime counterpart to fix an issue with missing delay on
transitions from D3cold to D0 during system resume from
suspend-to-idle on some systems (Rafael Wysocki).
- Drop Dell XPS13 9360 from the LPS0 Idle _DSM blacklist to make it
use suspend-to-idle by default (Mario Limonciello).
- Fix build warning in the core system suspend support code (Ben
Dooks)"
* tag 'pm-5.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: processor: Avoid NULL pointer dereferences at init time
PCI: PM: Fix pci_power_up()
PM: sleep: include <linux/pm_runtime.h> for pm_wq
cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
ACPI: PM: Drop Dell XPS13 9360 from LPS0 Idle _DSM blacklist
* pm-cpufreq:
ACPI: processor: Avoid NULL pointer dereferences at init time
cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
* pm-sleep:
PM: sleep: include <linux/pm_runtime.h> for pm_wq
ACPI: PM: Drop Dell XPS13 9360 from LPS0 Idle _DSM blacklist
Both multi_cpu_stop() and set_state() access multi_stop_data::state
racily using plain accesses. These are subject to compiler
transformations which could break the intended behaviour of the code,
and this situation is detected by KCSAN on both arm64 and x86 (splats
below).
Improve matters by using READ_ONCE() and WRITE_ONCE() to ensure that the
compiler cannot elide, replay, or tear loads and stores.
In multi_cpu_stop() the two loads of multi_stop_data::state are expected to
be a consistent value, so snapshot the value into a temporary variable to
ensure this.
The state transitions are serialized by atomic manipulation of
multi_stop_data::num_threads, and other fields in multi_stop_data are not
modified while subject to concurrent reads.
KCSAN splat on arm64:
| BUG: KCSAN: data-race in multi_cpu_stop+0xa8/0x198 and set_state+0x80/0xb0
|
| write to 0xffff00001003bd00 of 4 bytes by task 24 on cpu 3:
| set_state+0x80/0xb0
| multi_cpu_stop+0x16c/0x198
| cpu_stopper_thread+0x170/0x298
| smpboot_thread_fn+0x40c/0x560
| kthread+0x1a8/0x1b0
| ret_from_fork+0x10/0x18
|
| read to 0xffff00001003bd00 of 4 bytes by task 14 on cpu 1:
| multi_cpu_stop+0xa8/0x198
| cpu_stopper_thread+0x170/0x298
| smpboot_thread_fn+0x40c/0x560
| kthread+0x1a8/0x1b0
| ret_from_fork+0x10/0x18
|
| Reported by Kernel Concurrency Sanitizer on:
| CPU: 1 PID: 14 Comm: migration/1 Not tainted 5.3.0-00007-g67ab35a199f4-dirty #3
| Hardware name: linux,dummy-virt (DT)
KCSAN splat on x86:
| write to 0xffffb0bac0013e18 of 4 bytes by task 19 on cpu 2:
| set_state kernel/stop_machine.c:170 [inline]
| ack_state kernel/stop_machine.c:177 [inline]
| multi_cpu_stop+0x1a4/0x220 kernel/stop_machine.c:227
| cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516
| smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165
| kthread+0x1b5/0x200 kernel/kthread.c:255
| ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352
|
| read to 0xffffb0bac0013e18 of 4 bytes by task 44 on cpu 7:
| multi_cpu_stop+0xb4/0x220 kernel/stop_machine.c:213
| cpu_stopper_thread+0x19e/0x280 kernel/stop_machine.c:516
| smpboot_thread_fn+0x1a8/0x300 kernel/smpboot.c:165
| kthread+0x1b5/0x200 kernel/kthread.c:255
| ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:352
|
| Reported by Kernel Concurrency Sanitizer on:
| CPU: 7 PID: 44 Comm: migration/7 Not tainted 5.3.0+ #1
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20191007104536.27276-1-mark.rutland@arm.com
The option --sort=ORDER was only introduced in tar 1.28 (2014), which
is rather new and might not be available in some setups.
This patch tries to replicate the previous behaviour as closely as
possible to fix the kheaders build for older environments. It does
not produce identical archives compared to the previous version due
to minor sorting differences but produces reproducible results itself
in my tests.
Reported-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: Dmitry Goldin <dgoldin+lkml@protonmail.ch>
Tested-by: Andreas Schwab <schwab@suse.de>
Tested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
The __kthread_queue_delayed_work is not exported so
make it static, to avoid the following sparse warning:
kernel/kthread.c:869:6: warning: symbol '__kthread_queue_delayed_work' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull parisc fixes from Helge Deller:
- Fix a parisc-specific fallout of Christoph's
dma_set_mask_and_coherent() patches (Sven)
- Fix a vmap memory leak in ioremap()/ioremap() (Helge)
- Some minor cleanups and documentation updates (Nick, Helge)
* 'parisc-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Remove 32-bit DMA enforcement from sba_iommu
parisc: Fix vmap memory leak in ioremap()/iounmap()
parisc: prefer __section from compiler_attributes.h
parisc: sysctl.c: Use CONFIG_PARISC instead of __hppa_ define
MAINTAINERS: Add hp_sdc drivers to parisc arch
Followup to commit dd2261ed45 ("hrtimer: Protect lockless access
to timer->base")
lock_hrtimer_base() fetches timer->base without lock exclusion.
Compiler is allowed to read timer->base twice (even if considered dumb)
which could end up trying to lock migration_base and return
&migration_base.
base = timer->base;
if (likely(base != &migration_base)) {
/* compiler reads timer->base again, and now (base == &migration_base)
raw_spin_lock_irqsave(&base->cpu_base->lock, *flags);
if (likely(base == timer->base))
return base; /* == &migration_base ! */
Similarly the write sides must use WRITE_ONCE() to avoid store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20191008173204.180879-1-edumazet@google.com
Pull tracing fixes from Steven Rostedt:
"A few tracing fixes:
- Remove lockdown from tracefs itself and moved it to the trace
directory. Have the open functions there do the lockdown checks.
- Fix a few races with opening an instance file and the instance
being deleted (Discovered during the lockdown updates). Kept
separate from the clean up code such that they can be backported to
stable easier.
- Clean up and consolidated the checks done when opening a trace
file, as there were multiple checks that need to be done, and it
did not make sense having them done in each open instance.
- Fix a regression in the record mcount code.
- Small hw_lat detector tracer fixes.
- A trace_pipe read fix due to not initializing trace_seq"
* tag 'trace-v5.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Initialize iter->seq after zeroing in tracing_read_pipe()
tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency
tracing/hwlat: Report total time spent in all NMIs during the sample
recordmcount: Fix nop_mcount() function
tracing: Do not create tracefs files if tracefs lockdown is in effect
tracing: Add locked_down checks to the open calls of files created for tracefs
tracing: Add tracing_check_open_get_tr()
tracing: Have trace events system open call tracing_open_generic_tr()
tracing: Get trace_array reference for available_tracers files
ftrace: Get a reference counter for the trace_array on filter files
tracefs: Revert ccbd54ff54 ("tracefs: Restrict tracefs when the kernel is locked down")
A customer reported the following softlockup:
[899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464]
[899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] Kernel panic - not syncing: softlockup: hung tasks
[899688.160002] RIP: 0010:up_write+0x1a/0x30
[899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12
[899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000
[899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00
[899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
[899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8
[899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000
[899688.160002] tracing_read_pipe+0x336/0x3c0
[899688.160002] __vfs_read+0x26/0x140
[899688.160002] vfs_read+0x87/0x130
[899688.160002] SyS_read+0x42/0x90
[899688.160002] do_syscall_64+0x74/0x160
It caught the process in the middle of trace_access_unlock(). There is
no loop. So, it must be looping in the caller tracing_read_pipe()
via the "waitagain" label.
Crashdump analyze uncovered that iter->seq was completely zeroed
at this point, including iter->seq.seq.size. It means that
print_trace_line() was never able to print anything and
there was no forward progress.
The culprit seems to be in the code:
/* reset all but tr, trace, and overruns */
memset(&iter->seq, 0,
sizeof(struct trace_iterator) -
offsetof(struct trace_iterator, seq));
It was added by the commit 53d0aa7730 ("ftrace:
add logic to record overruns"). It was v2.6.27-rc1.
It was the time when iter->seq looked like:
struct trace_seq {
unsigned char buffer[PAGE_SIZE];
unsigned int len;
};
There was no "size" variable and zeroing was perfectly fine.
The solution is to reinitialize the structure after or without
zeroing.
Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>