mirror of
https://github.com/linux-apfs/linux-apfs.git
synced 2026-05-01 15:00:59 -07:00
906535b04671c84ec8731030d4968570d8d2a16c
24268 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
6f14f443d3 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Mostly simple cases of overlapping changes (adding code nearby, a function whose name changes, for example). Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
ea6b1720ce |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Reject invalid updates to netfilter expectation policies, from Pablo
Neira Ayuso.
2) Fix memory leak in nfnl_cthelper, from Jeffy Chen.
3) Don't do stupid things if we get a neigh_probe() on a neigh entry
whose ops lack a solicit method. From Eric Dumazet.
4) Don't transmit packets in r8152 driver when the carrier is off, from
Hayes Wang.
5) Fix ipv6 packet type detection in aquantia driver, from Pavel
Belous.
6) Don't write uninitialized data into hw registers in bna driver, from
Arnd Bergmann.
7) Fix locking in ping_unhash(), from Eric Dumazet.
8) Make BPF verifier range checks able to understand certain sequences
emitted by LLVM, from Alexei Starovoitov.
9) Fix use after free in ipconfig, from Mark Rutland.
10) Fix refcount leak on force commit in openvswitch, from Jarno
Rajahalme.
11) Fix various overflow checks in AF_PACKET, from Andrey Konovalov.
12) Fix endianness bug in be2net driver, from Suresh Reddy.
13) Don't forget to wake TX queues when processing a timeout, from
Grygorii Strashko.
14) ARP header on-stack storage is wrong in flow dissector, from Simon
Horman.
15) Lost retransmit and reordering SNMP stats in TCP can be
underreported. From Yuchung Cheng.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (82 commits)
nfp: fix potential use after free on xdp prog
tcp: fix reordering SNMP under-counting
tcp: fix lost retransmit SNMP under-counting
sctp: get sock from transport in sctp_transport_update_pmtu
net: ethernet: ti: cpsw: fix race condition during open()
l2tp: fix PPP pseudo-wire auto-loading
bnx2x: fix spelling mistake in macros HW_INTERRUT_ASSERT_SET_*
l2tp: take reference on sessions being dumped
tcp: minimize false-positives on TCP/GRO check
sctp: check for dst and pathmtu update in sctp_packet_config
flow dissector: correct size of storage for ARP
net: ethernet: ti: cpsw: wake tx queues on ndo_tx_timeout
l2tp: take a reference on sessions used in genetlink handlers
l2tp: hold session while sending creation notifications
l2tp: fix duplicate session creation
l2tp: ensure session can't get removed during pppol2tp_session_ioctl()
l2tp: fix race in l2tp_recv_common()
sctp: use right in and out stream cnt
bpf: add various verifier test cases for self-tests
bpf, verifier: fix rejection of unaligned access checks for map_value_adj
...
|
||
|
|
128c434a70 |
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"This update provides:
- make the scheduler clock switch to unstable mode smooth so the
timestamps stay at microseconds granularity instead of switching to
tick granularity.
- unbreak perf test tsc by taking the new offset into account which
was added in order to proveide better sched clock continuity
- switching sched clock to unstable mode runs all clock related
computations which affect the sched clock output itself from a work
queue. In case of preemption sched clock uses half updated data and
provides wrong timestamps. Keep the math in the protected context
and delegate only the static key switch to workqueue context.
- remove a duplicate header include"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/headers: Remove duplicate #include <linux/sched/debug.h> line
sched/clock: Fix broken stable to unstable transfer
sched/clock, x86/perf: Fix "perf test tsc"
sched/clock: Fix clear_sched_clock_stable() preempt wobbly
|
||
|
|
1cf1cae963 |
bpf: introduce BPF_PROG_TEST_RUN command
development and testing of networking bpf programs is quite cumbersome. Despite availability of user space bpf interpreters the kernel is the ultimate authority and execution environment. Current test frameworks for TC include creation of netns, veth, qdiscs and use of various packet generators just to test functionality of a bpf program. XDP testing is even more complicated, since qemu needs to be started with gro/gso disabled and precise queue configuration, transferring of xdp program from host into guest, attaching to virtio/eth0 and generating traffic from the host while capturing the results from the guest. Moreover analyzing performance bottlenecks in XDP program is impossible in virtio environment, since cost of running the program is tiny comparing to the overhead of virtio packet processing, so performance testing can only be done on physical nic with another server generating traffic. Furthermore ongoing changes to user space control plane of production applications cannot be run on the test servers leaving bpf programs stubbed out for testing. Last but not least, the upstream llvm changes are validated by the bpf backend testsuite which has no ability to test the code generated. To improve this situation introduce BPF_PROG_TEST_RUN command to test and performance benchmark bpf programs. Joint work with Daniel Borkmann. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
79adffcd64 |
bpf, verifier: fix rejection of unaligned access checks for map_value_adj
Currently, the verifier doesn't reject unaligned access for map_value_adj register types. Commit |
||
|
|
fce366a9dd |
bpf, verifier: fix alu ops against map_value{, _adj} register types
While looking into map_value_adj, I noticed that alu operations
directly on the map_value() resp. map_value_adj() register (any
alu operation on a map_value() register will turn it into a
map_value_adj() typed register) are not sufficiently protected
against some of the operations. Two non-exhaustive examples are
provided that the verifier needs to reject:
i) BPF_AND on r0 (map_value_adj):
0: (bf) r2 = r10
1: (07) r2 += -8
2: (7a) *(u64 *)(r2 +0) = 0
3: (18) r1 = 0xbf842a00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+2
R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
7: (57) r0 &= 8
8: (7a) *(u64 *)(r0 +0) = 22
R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=8 R10=fp
9: (95) exit
from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
9: (95) exit
processed 10 insns
ii) BPF_ADD in 32 bit mode on r0 (map_value_adj):
0: (bf) r2 = r10
1: (07) r2 += -8
2: (7a) *(u64 *)(r2 +0) = 0
3: (18) r1 = 0xc24eee00
5: (85) call bpf_map_lookup_elem#1
6: (15) if r0 == 0x0 goto pc+2
R0=map_value(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
7: (04) (u32) r0 += (u32) 0
8: (7a) *(u64 *)(r0 +0) = 22
R0=map_value_adj(ks=8,vs=48,id=0),min_value=0,max_value=0 R10=fp
9: (95) exit
from 6 to 9: R0=inv,min_value=0,max_value=0 R10=fp
9: (95) exit
processed 10 insns
Issue is, while min_value / max_value boundaries for the access
are adjusted appropriately, we change the pointer value in a way
that cannot be sufficiently tracked anymore from its origin.
Operations like BPF_{AND,OR,DIV,MUL,etc} on a destination register
that is PTR_TO_MAP_VALUE{,_ADJ} was probably unintended, in fact,
all the test cases coming with
|
||
|
|
035f0cd3f8 |
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu: "This fixes the following issues: - memory corruption when kmalloc fails in xts/lrw - mark some CCP DMA channels as private - fix reordering race in padata - regression in omap-rng DT description" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: xts,lrw - fix out-of-bounds write after kmalloc failure crypto: ccp - Make some CCP DMA channels private padata: avoid race in reordering dt-bindings: rng: clocks property on omap_rng not always mandatory |
||
|
|
7b09cc5a9d |
sched/clock: Fix broken stable to unstable transfer
When it is determined that the clock is actually unstable, and
we switch from stable to unstable, the __clear_sched_clock_stable()
function is eventually called.
In this function we set gtod_offset so the following holds true:
sched_clock() + raw_offset == ktime_get_ns() + gtod_offset
But instead of getting the latest timestamps, we use the last values
from scd, so instead of sched_clock() we use scd->tick_raw, and
instead of ktime_get_ns() we use scd->tick_gtod.
However, later, when we use gtod_offset sched_clock_local() we do not
add it to scd->tick_gtod to calculate the correct clock value when we
determine the boundaries for min/max clocks.
This can result in tick granularity sched_clock() values, so fix it.
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Fixes:
|
||
|
|
4c3de7e5bf |
Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit
Pull audit fix from Paul Moore: "We've got an audit fix, and unfortunately it is big. While I'm not excited that we need to be sending you something this large during the -rcX phase, it does fix some very real, and very tangled, problems relating to locking, backlog queues, and the audit daemon connection. This code has passed our testsuite without problem and it has held up to my ad-hoc stress tests (arguably better than the existing code), please consider pulling this as fix for the next v4.11-rcX tag" * 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit: audit: fix auditd/kernel connection state tracking |
||
|
|
b1977682a3 |
bpf: improve verifier packet range checks
llvm can optimize the 'if (ptr > data_end)' checks to be in the order
slightly different than the original C code which will confuse verifier.
Like:
if (ptr + 16 > data_end)
return TC_ACT_SHOT;
// may be followed by
if (ptr + 14 > data_end)
return TC_ACT_SHOT;
while llvm can see that 'ptr' is valid for all 16 bytes,
the verifier could not.
Fix verifier logic to account for such case and add a test.
Reported-by: Huapeng Zhou <hzhou@fb.com>
Fixes:
|
||
|
|
de5540d088 |
padata: avoid race in reordering
Under extremely heavy uses of padata, crashes occur, and with list
debugging turned on, this happens instead:
[87487.298728] WARNING: CPU: 1 PID: 882 at lib/list_debug.c:33
__list_add+0xae/0x130
[87487.301868] list_add corruption. prev->next should be next
(ffffb17abfc043d0), but was ffff8dba70872c80. (prev=ffff8dba70872b00).
[87487.339011] [<ffffffff9a53d075>] dump_stack+0x68/0xa3
[87487.342198] [<ffffffff99e119a1>] ? console_unlock+0x281/0x6d0
[87487.345364] [<ffffffff99d6b91f>] __warn+0xff/0x140
[87487.348513] [<ffffffff99d6b9aa>] warn_slowpath_fmt+0x4a/0x50
[87487.351659] [<ffffffff9a58b5de>] __list_add+0xae/0x130
[87487.354772] [<ffffffff9add5094>] ? _raw_spin_lock+0x64/0x70
[87487.357915] [<ffffffff99eefd66>] padata_reorder+0x1e6/0x420
[87487.361084] [<ffffffff99ef0055>] padata_do_serial+0xa5/0x120
padata_reorder calls list_add_tail with the list to which its adding
locked, which seems correct:
spin_lock(&squeue->serial.lock);
list_add_tail(&padata->list, &squeue->serial.list);
spin_unlock(&squeue->serial.lock);
This therefore leaves only place where such inconsistency could occur:
if padata->list is added at the same time on two different threads.
This pdata pointer comes from the function call to
padata_get_next(pd), which has in it the following block:
next_queue = per_cpu_ptr(pd->pqueue, cpu);
padata = NULL;
reorder = &next_queue->reorder;
if (!list_empty(&reorder->list)) {
padata = list_entry(reorder->list.next,
struct padata_priv, list);
spin_lock(&reorder->lock);
list_del_init(&padata->list);
atomic_dec(&pd->reorder_objects);
spin_unlock(&reorder->lock);
pd->processed++;
goto out;
}
out:
return padata;
I strongly suspect that the problem here is that two threads can race
on reorder list. Even though the deletion is locked, call to
list_entry is not locked, which means it's feasible that two threads
pick up the same padata object and subsequently call list_add_tail on
them at the same time. The fix is thus be hoist that lock outside of
that block.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
||
|
|
ebe64824e9 |
Merge tag 'pm-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"One of these is an intel_pstate regression fix and it is not a small
change, but it mostly removes code that shouldn't be there. That code
was acquired by mistake and has been a source of constant pain since
then, so the time has come to get rid of it finally. We have not seen
problems with this change in the lab, so fingers crossed.
The rest is more usual: one more intel_pstate commit removing useless
code, a cpufreq core fix to make it restore policy limits on CPU
online (which prevents the limits from being reset over system
suspend/resume), a schedutil cpufreq governor initialization fix to
make it actually work as advertised on all systems and an extra sanity
check in the cpuidle core to prevent crashes from happening if the
arch code messes things up.
Specifics:
- Make intel_pstate use one set of global P-state limits in the
active mode regardless of the scaling_governor settings for
individual CPUs instead of switching back and forth between two of
them in a way that is hard to control (Rafael Wysocki).
- Drop a useless function from intel_pstate to prevent it from
modifying the maximum supported frequency value unexpectedly which
may confuse the cpufreq core (Rafael Wysocki).
- Fix the cpufreq core to restore policy limits on CPU online so that
the limits are not reset over system suspend/resume, among other
things (Viresh Kumar).
- Fix the initialization of the schedutil cpufreq governor to make
the IO-wait boosting mechanism in it actually work on systems with
one CPU per cpufreq policy (Rafael Wysocki).
- Add a sanity check to the cpuidle core to prevent crashes from
happening if the architecture code initialization fails to set up
things as expected (Vaidyanathan Srinivasan)"
* tag 'pm-4.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Restore policy min/max limits on CPU online
cpuidle: Validate cpu_dev in cpuidle_add_sysfs()
cpufreq: intel_pstate: Fix policy data management in passive mode
cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start()
cpufreq: intel_pstate: One set of global limits in active mode
|
||
|
|
16ae1f2236 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts: drivers/net/ethernet/broadcom/genet/bcmmii.c drivers/net/hyperv/netvsc.c kernel/bpf/hashtab.c Almost entirely overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
f341d9f08a |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) Several netfilter fixes from Pablo and the crew:
- Handle fragmented packets properly in netfilter conntrack, from
Florian Westphal.
- Fix SCTP ICMP packet handling, from Ying Xue.
- Fix big-endian bug in nftables, from Liping Zhang.
- Fix alignment of fake conntrack entry, from Steven Rostedt.
2) Fix feature flags setting in fjes driver, from Taku Izumi.
3) Openvswitch ipv6 tunnel source address not set properly, from Or
Gerlitz.
4) Fix jumbo MTU handling in amd-xgbe driver, from Thomas Lendacky.
5) sk->sk_frag.page not released properly in some cases, from Eric
Dumazet.
6) Fix RTNL deadlocks in nl80211, from Johannes Berg.
7) Fix erroneous RTNL lockdep splat in crypto, from Herbert Xu.
8) Cure improper inflight handling during AF_UNIX GC, from Andrey
Ulanov.
9) sch_dsmark doesn't write to packet headers properly, from Eric
Dumazet.
10) Fix SCM_TIMESTAMPING_OPT_STATS handling in TCP, from Soheil Hassas
Yeganeh.
11) Add some IDs for Motorola qmi_wwan chips, from Tony Lindgren.
12) Fix nametbl deadlock in tipc, from Ying Xue.
13) GRO and LRO packets not counted correctly in mlx5 driver, from Gal
Pressman.
14) Fix reset of internal PHYs in bcmgenet, from Doug Berger.
15) Fix hashmap allocation handling, from Alexei Starovoitov.
16) nl_fib_input() needs stronger netlink message length checking, from
Eric Dumazet.
17) Fix double-free of sk->sk_filter during sock clone, from Daniel
Borkmann.
18) Fix RX checksum offloading in aquantia driver, from Pavel Belous.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits)
net:ethernet:aquantia: Fix for RX checksum offload.
amd-xgbe: Fix the ECC-related bit position definitions
sfc: cleanup a condition in efx_udp_tunnel_del()
Bluetooth: btqcomsmd: fix compile-test dependency
inet: frag: release spinlock before calling icmp_send()
tcp: initialize icsk_ack.lrcvtime at session start time
genetlink: fix counting regression on ctrl_dumpfamily()
socket, bpf: fix sk_filter use after free in sk_clone_lock
ipv4: provide stronger user input validation in nl_fib_input()
bpf: fix hashmap extra_elems logic
enic: update enic maintainers
net: bcmgenet: remove bcmgenet_internal_phy_setup()
ipv6: make sure to initialize sockc.tsflags before first use
fjes: Do not load fjes driver if extended socket device is not power on.
fjes: Do not load fjes driver if system does not have extended socket device.
net/mlx5e: Count LRO packets correctly
net/mlx5e: Count GSO packets correctly
net/mlx5: Increase number of max QPs in default profile
net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps
net/mlx5e: Use the proper UAPI values when offloading TC vlan actions
...
|
||
|
|
698eff6355 |
sched/clock, x86/perf: Fix "perf test tsc"
People reported that commit: |
||
|
|
71fdb70eb4 |
sched/clock: Fix clear_sched_clock_stable() preempt wobbly
Paul reported a problems with clear_sched_clock_stable(). Since we run all of __clear_sched_clock_stable() from workqueue context, there's a preempt problem. Solve it by only running the static_key_disable() from workqueue. Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: fweisbec@gmail.com Link: http://lkml.kernel.org/r/20170313124621.GA3328@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
|
|
bcc6b1b7eb |
bpf: Add hash of maps support
This patch adds hash of maps support (hashmap->bpf_map). BPF_MAP_TYPE_HASH_OF_MAPS is added. A map-in-map contains a pointer to another map and lets call this pointer 'inner_map_ptr'. Notes on deleting inner_map_ptr from a hash map: 1. For BPF_F_NO_PREALLOC map-in-map, when deleting an inner_map_ptr, the htab_elem itself will go through a rcu grace period and the inner_map_ptr resides in the htab_elem. 2. For pre-allocated htab_elem (!BPF_F_NO_PREALLOC), when deleting an inner_map_ptr, the htab_elem may get reused immediately. This situation is similar to the existing prealloc-ated use cases. However, the bpf_map_fd_put_ptr() calls bpf_map_put() which calls inner_map->ops->map_free(inner_map) which will go through a rcu grace period (i.e. all bpf_map's map_free currently goes through a rcu grace period). Hence, the inner_map_ptr is still safe for the rcu reader side. This patch also includes BPF_MAP_TYPE_HASH_OF_MAPS to the check_map_prealloc() in the verifier. preallocation is a must for BPF_PROG_TYPE_PERF_EVENT. Hence, even we don't expect heavy updates to map-in-map, enforcing BPF_F_NO_PREALLOC for map-in-map is impossible without disallowing BPF_PROG_TYPE_PERF_EVENT from using map-in-map first. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
56f668dfe0 |
bpf: Add array of maps support
This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
|
|
fad73a1a35 |
bpf: Fix and simplifications on inline map lookup
Fix in verifier:
For the same bpf_map_lookup_elem() instruction (i.e. "call 1"),
a broken case is "a different type of map could be used for the
same lookup instruction". For example, an array in one case and a
hashmap in another. We have to resort to the old dynamic call behavior
in this case. The fix is to check for collision on insn_aux->map_ptr.
If there is collision, don't inline the map lookup.
Please see the "do_reg_lookup()" in test_map_in_map_kern.c in the later
patch for how-to trigger the above case.
Simplifications on array_map_gen_lookup():
1. Calculate elem_size from map->value_size. It removes the
need for 'struct bpf_array' which makes the later map-in-map
implementation easier.
2. Remove the 'elem_size == 1' test
Fixes:
|
||
|
|
8c290e60fa |
bpf: fix hashmap extra_elems logic
In both kmalloc and prealloc mode the bpf_map_update_elem() is using per-cpu extra_elems to do atomic update when the map is full. There are two issues with it. The logic can be misused, since it allows max_entries+num_cpus elements to be present in the map. And alloc_extra_elems() at map creation time can fail percpu alloc for large map values with a warn: WARNING: CPU: 3 PID: 2752 at ../mm/percpu.c:892 pcpu_alloc+0x119/0xa60 illegal size (32824) or align (8) for percpu allocation The fixes for both of these issues are different for kmalloc and prealloc modes. For prealloc mode allocate extra num_possible_cpus elements and store their pointers into extra_elems array instead of actual elements. Hence we can use these hidden(spare) elements not only when the map is full but during bpf_map_update_elem() that replaces existing element too. That also improves performance, since pcpu_freelist_pop/push is avoided. Unfortunately this approach cannot be used for kmalloc mode which needs to kfree elements after rcu grace period. Therefore switch it back to normal kmalloc even when full and old element exists like it was prior to commit |
||
|
|
5b52330bbf |
audit: fix auditd/kernel connection state tracking
What started as a rather straightforward race condition reported by Dmitry using the syzkaller fuzzer ended up revealing some major problems with how the audit subsystem managed its netlink sockets and its connection with the userspace audit daemon. Fixing this properly had quite the cascading effect and what we are left with is this rather large and complicated patch. My initial goal was to try and decompose this patch into multiple smaller patches, but the way these changes are intertwined makes it difficult to split these changes into meaningful pieces that don't break or somehow make things worse for the intermediate states. The patch makes a number of changes, but the most significant are highlighted below: * The auditd tracking variables, e.g. audit_sock, are now gone and replaced by a RCU/spin_lock protected variable auditd_conn which is a structure containing all of the auditd tracking information. * We no longer track the auditd sock directly, instead we track it via the network namespace in which it resides and we use the audit socket associated with that namespace. In spirit, this is what the code was trying to do prior to this patch (at least I think that is what the original authors intended), but it was done rather poorly and added a layer of obfuscation that only masked the underlying problems. * Big backlog queue cleanup, again. In v4.10 we made some pretty big changes to how the audit backlog queues work, here we haven't changed the queue design so much as cleaned up the implementation. Brought about by the locking changes, we've simplified kauditd_thread() quite a bit by consolidating the queue handling into a new helper function, kauditd_send_queue(), which allows us to eliminate a lot of very similar code and makes the looping logic in kauditd_thread() clearer. * All netlink messages sent to auditd are now sent via auditd_send_unicast_skb(). Other than just making sense, this makes the lock handling easier. * Change the audit_log_start() sleep behavior so that we never sleep on auditd events (unchanged) or if the caller is holding the audit_cmd_mutex (changed). Previously we didn't sleep if the caller was auditd or if the message type fell between a certain range; the type check was a poor effort of doing what the cmd_mutex check now does. Richard Guy Briggs originally proposed not sleeping the cmd_mutex owner several years ago but his patch wasn't acceptable at the time. At least the idea lives on here. * A problem with the lost record counter has been resolved. Steve Grubb and I both happened to notice this problem and according to some quick testing by Steve, this problem goes back quite some time. It's largely a harmless problem, although it may have left some careful sysadmins quite puzzled. Cc: <stable@vger.kernel.org> # 4.10.x- Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Paul Moore <paul@paul-moore.com> |
||
|
|
4296f23ed4 |
cpufreq: schedutil: Fix per-CPU structure initialization in sugov_start()
sugov_start() only initializes struct sugov_cpu per-CPU structures
for shared policies, but it should do that for single-CPU policies too.
That in particular makes the IO-wait boost mechanism work in the
cases when cpufreq policies correspond to individual CPUs.
Fixes:
|
||
|
|
3e51f893de |
Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug fix from Thomas Gleixner: "A single fix preventing the concurrent execution of the CPU hotplug callback install/invocation machinery. Long standing bug caused by a massive brain slip of that Gleixner dude, which went unnoticed for almost a year" * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Serialize callback invocations proper |
||
|
|
a7fc726bb2 |
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A set of perf related fixes:
- fix a CR4.PCE propagation issue caused by usage of mm instead of
active_mm and therefore propagated the wrong value.
- perf core fixes, which plug a use-after-free issue and make the
event inheritance on fork more robust.
- a tooling fix for symbol handling"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf symbols: Fix symbols__fixup_end heuristic for corner cases
x86/perf: Clarify why x86_pmu_event_mapped() isn't racy
x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm
perf/core: Better explain the inherit magic
perf/core: Simplify perf_event_free_task()
perf/core: Fix event inheritance on fork()
perf/core: Fix use-after-free in perf_release()
|
||
|
|
cd21debe53 |
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"From the scheduler departement:
- a bunch of sched deadline related fixes which deal with various
buglets and corner cases.
- two fixes for the loadavg spikes which are caused by the delayed
NOHZ accounting"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/deadline: Use deadline instead of period when calculating overflow
sched/deadline: Throttle a constrained deadline task activated after the deadline
sched/deadline: Make sure the replenishment timer fires in the next period
sched/loadavg: Use {READ,WRITE}_ONCE() for sample window
sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
sched/deadline: Add missing update_rq_clock() in dl_task_timer()
|