When passing "test_suspend=mem" to the kernel:
PM: can't test 'mem' suspend state
and the suspend test is not run.
Commit 406e79385f ("PM / sleep: System sleep state selection
interface rework") changed pm_labels[] from a contiguous NULL-terminated
array to a sparse array (with the first element unpopulated), breaking
the assumptions of the iterator in setup_test_suspend().
Iterate from PM_SUSPEND_MIN to PM_SUSPEND_MAX - 1 to fix this.
Fixes: 406e79385f (PM / sleep: System sleep state selection interface rework)
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Long standing issue with JITed programs is that stack traces from
function tracing check whether a given address is kernel code
through {__,}kernel_text_address(), which checks for code in core
kernel, modules and dynamically allocated ftrace trampolines. But
what is still missing is BPF JITed programs (interpreted programs
are not an issue as __bpf_prog_run() will be attributed to them),
thus when a stack trace is triggered, the code walking the stack
won't see any of the JITed ones. The same for address correlation
done from user space via reading /proc/kallsyms. This is read by
tools like perf, but the latter is also useful for permanent live
tracing with eBPF itself in combination with stack maps when other
eBPF types are part of the callchain. See offwaketime example on
dumping stack from a map.
This work tries to tackle that issue by making the addresses and
symbols known to the kernel. The lookup from *kernel_text_address()
is implemented through a latched RB tree that can be read under
RCU in fast-path that is also shared for symbol/size/offset lookup
for a specific given address in kallsyms. The slow-path iteration
through all symbols in the seq file done via RCU list, which holds
a tiny fraction of all exported ksyms, usually below 0.1 percent.
Function symbols are exported as bpf_prog_<tag>, in order to aide
debugging and attribution. This facility is currently enabled for
root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
is active in any mode. The rationale behind this is that still a lot
of systems ship with world read permissions on kallsyms thus addresses
should not get suddenly exposed for them. If that situation gets
much better in future, we always have the option to change the
default on this. Likewise, unprivileged programs are not allowed
to add entries there either, but that is less of a concern as most
such programs types relevant in this context are for root-only anyway.
If enabled, call graphs and stack traces will then show a correct
attribution; one example is illustrated below, where the trace is
now visible in tooling such as perf script --kallsyms=/proc/kallsyms
and friends.
Before:
7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
After:
7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
[...]
7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make
that a single __weak function in the core that can be overridden
similarly to the eBPF one. Also remove stale pr_err() mentions
of bpf_jit_compile.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All map types and prog types are registered to the BPF core through
bpf_register_map_type() and bpf_register_prog_type() during init and
remain unchanged thereafter. As by design we don't (and never will)
have any pluggable code that can register to that at any later point
in time, lets mark all the existing bpf_{map,prog}_type_list objects
in the tree as __ro_after_init, so they can be moved to read-only
section from then onwards.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 24b91e360e and commit
7bdb59f1ad ("tick/nohz: Fix possible missing clock reprog after tick
soft restart") that depends on it,
Pavel reports that it causes occasional boot hangs for him that seem to
depend on just how the machine was booted. In particular, his machine
hangs at around the PCI fixups of the EHCI USB host controller, but only
hangs from cold boot, not from a warm boot.
Thomas Gleixner suspecs it's a CPU hotplug interaction, particularly
since Pavel also saw suspend/resume issues that seem to be related.
We're reverting for now while trying to figure out the root cause.
Reported-bisected-and-tested-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org # reverted commits were marked for stable
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull networking fixes from David Miller:
1) In order to avoid problems in the future, make cgroup bpf overriding
explicit using BPF_F_ALLOW_OVERRIDE. From Alexei Staovoitov.
2) LLC sets skb->sk without proper skb->destructor and this explodes,
fix from Eric Dumazet.
3) Make sure when we have an ipv4 mapped source address, the
destination is either also an ipv4 mapped address or
ipv6_addr_any(). Fix from Jonathan T. Leighton.
4) Avoid packet loss in fec driver by programming the multicast filter
more intelligently. From Rui Sousa.
5) Handle multiple threads invoking fanout_add(), fix from Eric
Dumazet.
6) Since we can invoke the TCP input path in process context, without
BH being disabled, we have to accomodate that in the locking of the
TCP probe. Also from Eric Dumazet.
7) Fix erroneous emission of NETEVENT_DELAY_PROBE_TIME_UPDATE when we
aren't even updating that sysctl value. From Marcus Huewe.
8) Fix endian bugs in ibmvnic driver, from Thomas Falcon.
[ This is the second version of the pull that reverts the nested
rhashtable changes that looked a bit too scary for this late in the
release - Linus ]
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
rhashtable: Revert nested table changes.
ibmvnic: Fix endian errors in error reporting output
ibmvnic: Fix endian error when requesting device capabilities
net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification
net: xilinx_emaclite: fix freezes due to unordered I/O
net: xilinx_emaclite: fix receive buffer overflow
bpf: kernel header files need to be copied into the tools directory
tcp: tcp_probe: use spin_lock_bh()
uapi: fix linux/if_pppol2tp.h userspace compilation errors
packet: fix races in fanout_add()
ibmvnic: Fix initial MTU settings
net: ethernet: ti: cpsw: fix cpsw assignment in resume
kcm: fix a null pointer dereference in kcm_sendmsg()
net: fec: fix multicast filtering hardware setup
ipv6: Handle IPv4-mapped src to in6addr_any dst.
ipv6: Inhibit IPv4-mapped src address on the wire.
net/mlx5e: Disable preemption when doing TC statistics upcall
rhashtable: Add nested tables
tipc: Fix tipc_sk_reinit race conditions
gfs2: Use rhashtable walk interface in glock_hash_walk
...
Although irqreturn_t is an enum, we treat it (and its enumeration
constants) as a bitmask.
However, bad_action_ret() uses a less-than operator to determine whether
an irqreturn_t falls within allowable bit values, which means we need to
know the signededness of an enum type to read the logic, which is
implementation-dependent.
This change explicitly uses an unsigned type for the comparison. We do
this instead of changing to a bitwise test, as the latter compiles to
increased instructions in this hot path.
It looks like we get the correct behaviour currently (bad_action_ret(-1)
returns 1), so this is purely a readability fix.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Link: http://lkml.kernel.org/r/1487219049-4061-1-git-send-email-jk@ozlabs.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Fixes the following warnings:
kernel/bpf/verifier.c: In function ‘may_access_direct_pkt_data’:
kernel/bpf/verifier.c:702:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (t == BPF_WRITE)
^
kernel/bpf/verifier.c:704:2: note: here
case BPF_PROG_TYPE_SCHED_CLS:
^~~~
kernel/bpf/verifier.c: In function ‘reg_set_min_max_inv’:
kernel/bpf/verifier.c:2057:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
true_reg->min_value = 0;
~~~~~~~~~~~~~~~~~~~~^~~
kernel/bpf/verifier.c:2058:2: note: here
case BPF_JSGT:
^~~~
kernel/bpf/verifier.c:2068:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
true_reg->min_value = 0;
~~~~~~~~~~~~~~~~~~~~^~~
kernel/bpf/verifier.c:2069:2: note: here
case BPF_JSGE:
^~~~
kernel/bpf/verifier.c: In function ‘reg_set_min_max’:
kernel/bpf/verifier.c:2009:24: warning: this statement may fall through [-Wimplicit-fallthrough=]
false_reg->min_value = 0;
~~~~~~~~~~~~~~~~~~~~~^~~
kernel/bpf/verifier.c:2010:2: note: here
case BPF_JSGT:
^~~~
kernel/bpf/verifier.c:2019:24: warning: this statement may fall through [-Wimplicit-fallthrough=]
false_reg->min_value = 0;
~~~~~~~~~~~~~~~~~~~~~^~~
kernel/bpf/verifier.c:2020:2: note: here
case BPF_JSGE:
^~~~
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UEVENT user mode helper is enabled before the initcalls are executed
and is available when the root filesystem has been mounted.
The user mode helper is triggered by device init calls and the executable
might use the futex syscall.
futex_init() is marked __initcall which maps to device_initcall, but there
is no guarantee that futex_init() is invoked _before_ the first device init
call which triggers the UEVENT user mode helper.
If the user mode helper uses the futex syscall before futex_init() then the
syscall crashes with a NULL pointer dereference because the futex subsystem
has not been initialized yet.
Move futex_init() to core_initcall so futexes are initialized before the
root filesystem is mounted and the usermode helper becomes available.
[ tglx: Rewrote changelog ]
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: jiang.biao2@zte.com.cn
Cc: jiang.zhengxiong@zte.com.cn
Cc: zhong.weidong@zte.com.cn
Cc: deng.huali@zte.com.cn
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
tick_broadcast_lock is taken from interrupt context, but the following call
chain takes the lock without disabling interrupts:
[ 12.703736] _raw_spin_lock+0x3b/0x50
[ 12.703738] tick_broadcast_control+0x5a/0x1a0
[ 12.703742] intel_idle_cpu_online+0x22/0x100
[ 12.703744] cpuhp_invoke_callback+0x245/0x9d0
[ 12.703752] cpuhp_thread_fun+0x52/0x110
[ 12.703754] smpboot_thread_fn+0x276/0x320
So the following deadlock can happen:
lock(tick_broadcast_lock);
<Interrupt>
lock(tick_broadcast_lock);
intel_idle_cpu_online() is the only place which violates the calling
convention of tick_broadcast_control(). This was caused by the removal of
the smp function call in course of the cpu hotplug rework.
Instead of slapping local_irq_disable/enable() at the call site, we can
relax the calling convention and handle it in the core code, which makes
the whole machinery more robust.
Fixes: 29d7bbada9 ("intel_idle: Remove superfluous SMP fuction call")
Reported-by: Gabriel C <nix.or.die@gmail.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ruslan Ruslichenko <rruslich@cisco.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: lwn@lwn.net
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: stable <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1486953115.5912.4.camel@gmx.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.
Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X
2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.
3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.
In the future this behavior may be extended with a chain of
non-overridable programs.
Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.
Add several testcases and adjust libbpf.
Fixes: 3007098494 ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull timer fix from Ingo Molnar:
"Fix a sporadic missed timer hw reprogramming bug that can result in
random delays"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/nohz: Fix possible missing clock reprog after tick soft restart