syzbot reported a nasty crash [1] in net_tx_action() which
made little sense until we got a repro.
This repro installs a taprio qdisc, but providing an
invalid TCA_RATE attribute.
qdisc_create() has to destroy the just initialized
taprio qdisc, and taprio_destroy() is called.
However, the hrtimer used by taprio had already fired,
therefore advance_sched() called __netif_schedule().
Then net_tx_action was trying to use a destroyed qdisc.
We can not undo the __netif_schedule(), so we must wait
until one cpu serviced the qdisc before we can proceed.
Many thanks to Alexander Potapenko for his help.
[1]
BUG: KMSAN: uninit-value in queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
BUG: KMSAN: uninit-value in do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
BUG: KMSAN: uninit-value in __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
BUG: KMSAN: uninit-value in _raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
queued_spin_trylock include/asm-generic/qspinlock.h:94 [inline]
do_raw_spin_trylock include/linux/spinlock.h:191 [inline]
__raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
_raw_spin_trylock+0x92/0xa0 kernel/locking/spinlock.c:138
spin_trylock include/linux/spinlock.h:359 [inline]
qdisc_run_begin include/net/sch_generic.h:187 [inline]
qdisc_run+0xee/0x540 include/net/pkt_sched.h:125
net_tx_action+0x77c/0x9a0 net/core/dev.c:5086
__do_softirq+0x1cc/0x7fb kernel/softirq.c:571
run_ksoftirqd+0x2c/0x50 kernel/softirq.c:934
smpboot_thread_fn+0x554/0x9f0 kernel/smpboot.c:164
kthread+0x31b/0x430 kernel/kthread.c:376
ret_from_fork+0x1f/0x30
Uninit was created at:
slab_post_alloc_hook mm/slab.h:732 [inline]
slab_alloc_node mm/slub.c:3258 [inline]
__kmalloc_node_track_caller+0x814/0x1250 mm/slub.c:4970
kmalloc_reserve net/core/skbuff.c:358 [inline]
__alloc_skb+0x346/0xcf0 net/core/skbuff.c:430
alloc_skb include/linux/skbuff.h:1257 [inline]
nlmsg_new include/net/netlink.h:953 [inline]
netlink_ack+0x5f3/0x12b0 net/netlink/af_netlink.c:2436
netlink_rcv_skb+0x55d/0x6c0 net/netlink/af_netlink.c:2507
rtnetlink_rcv+0x30/0x40 net/core/rtnetlink.c:6108
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0xf3b/0x1270 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x1288/0x1440 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
____sys_sendmsg+0xabc/0xe90 net/socket.c:2482
___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2536
__sys_sendmsg net/socket.c:2565 [inline]
__do_sys_sendmsg net/socket.c:2574 [inline]
__se_sys_sendmsg net/socket.c:2572 [inline]
__x64_sys_sendmsg+0x367/0x540 net/socket.c:2572
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 6.0.0-rc2-syzkaller-47461-gac3859c02d7f #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
Fixes: 5a781ccbd1 ("tc: Add support for configuring the taprio scheduler")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a running wake_tx_queue() call is aborted due to a hw queue stop
the corresponding iTXQ is not always correctly marked for resumption:
wake_tx_push_queue() can stops the queue run without setting
@IEEE80211_TXQ_STOP_NETIF_TX.
Without the @IEEE80211_TXQ_STOP_NETIF_TX flag __ieee80211_wake_txqs()
will not schedule a new queue run and remaining frames in the queue get
stuck till another frame is queued to it.
Fix the issue for all drivers - also the ones with custom wake_tx_queue
callbacks - by moving the logic into ieee80211_tx_dequeue() and drop the
redundant @txqs_stopped.
@IEEE80211_TXQ_STOP_NETIF_TX is also renamed to @IEEE80211_TXQ_DIRTY to
better describe the flag.
Fixes: c850e31f79 ("wifi: mac80211: add internal handler for wake_tx_queue")
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Link: https://lore.kernel.org/r/20221230121850.218810-1-alexander@wetzel-home.de
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Tidy up the abort generation infrastructure in the following ways:
(1) Create an enum and string mapping table to list the reasons an abort
might be generated in tracing.
(2) Replace the 3-char string with the values from (1) in the places that
use that to log the abort source. This gets rid of a memcpy() in the
tracepoint.
(3) Subsume the rxrpc_rx_eproto tracepoint with the rxrpc_abort tracepoint
and use values from (1) to indicate the trace reason.
(4) Always make a call to an abort function at the point of the abort
rather than stashing the values into variables and using goto to get
to a place where it reported. The C optimiser will collapse the calls
together as appropriate. The abort functions return a value that can
be returned directly if appropriate.
Note that this extends into afs also at the points where that generates an
abort. To aid with this, the afs sources need to #define
RXRPC_TRACE_ONLY_DEFINE_ENUMS before including the rxrpc tracing header
because they don't have access to the rxrpc internal structures that some
of the tracepoints make use of.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, wifi, and netfilter.
Current release - regressions:
- bpf: fix nullness propagation for reg to reg comparisons, avoid
null-deref
- inet: control sockets should not use current thread task_frag
- bpf: always use maximal size for copy_array()
- eth: bnxt_en: don't link netdev to a devlink port for VFs
Current release - new code bugs:
- rxrpc: fix a couple of potential use-after-frees
- netfilter: conntrack: fix IPv6 exthdr error check
- wifi: iwlwifi: fw: skip PPAG for JF, avoid FW crashes
- eth: dsa: qca8k: various fixes for the in-band register access
- eth: nfp: fix schedule in atomic context when sync mc address
- eth: renesas: rswitch: fix getting mac address from device tree
- mobile: ipa: use proper endpoint mask for suspend
Previous releases - regressions:
- tcp: add TIME_WAIT sockets in bhash2, fix regression caught by
Jiri / python tests
- net: tc: don't intepret cls results when asked to drop, fix
oob-access
- vrf: determine the dst using the original ifindex for multicast
- eth: bnxt_en:
- fix XDP RX path if BPF adjusted packet length
- fix HDS (header placement) and jumbo thresholds for RX packets
- eth: ice: xsk: do not use xdp_return_frame() on tx_buf->raw_buf,
avoid memory corruptions
Previous releases - always broken:
- ulp: prevent ULP without clone op from entering the LISTEN status
- veth: fix race with AF_XDP exposing old or uninitialized
descriptors
- bpf:
- pull before calling skb_postpull_rcsum() (fix checksum support
and avoid a WARN())
- fix panic due to wrong pageattr of im->image (when livepatch and
kretfunc coexist)
- keep a reference to the mm, in case the task is dead
- mptcp: fix deadlock in fastopen error path
- netfilter:
- nf_tables: perform type checking for existing sets
- nf_tables: honor set timeout and garbage collection updates
- ipset: fix hash:net,port,net hang with /0 subnet
- ipset: avoid hung task warning when adding/deleting entries
- selftests: net:
- fix cmsg_so_mark.sh test hang on non-x86 systems
- fix the arp_ndisc_evict_nocarrier test for IPv6
- usb: rndis_host: secure rndis_query check against int overflow
- eth: r8169: fix dmar pte write access during suspend/resume with
WOL
- eth: lan966x: fix configuration of the PCS
- eth: sparx5: fix reading of the MAC address
- eth: qed: allow sleep in qed_mcp_trace_dump()
- eth: hns3:
- fix interrupts re-initialization after VF FLR
- fix handling of promisc when MAC addr table gets full
- refine the handling for VF heartbeat
- eth: mlx5:
- properly handle ingress QinQ-tagged packets on VST
- fix io_eq_size and event_eq_size params validation on big endian
- fix RoCE setting at HCA level if not supported at all
- don't turn CQE compression on by default for IPoIB
- eth: ena:
- fix toeplitz initial hash key value
- account for the number of XDP-processed bytes in interface stats
- fix rx_copybreak value update
Misc:
- ethtool: harden phy stat handling against buggy drivers
- docs: netdev: convert maintainer's doc from FAQ to a normal
document"
* tag 'net-6.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (112 commits)
caif: fix memory leak in cfctrl_linkup_request()
inet: control sockets should not use current thread task_frag
net/ulp: prevent ULP without clone op from entering the LISTEN status
qed: allow sleep in qed_mcp_trace_dump()
MAINTAINERS: Update maintainers for ptp_vmw driver
usb: rndis_host: Secure rndis_query check against int overflow
net: dpaa: Fix dtsec check for PCS availability
octeontx2-pf: Fix lmtst ID used in aura free
drivers/net/bonding/bond_3ad: return when there's no aggregator
netfilter: ipset: Rework long task execution when adding/deleting entries
netfilter: ipset: fix hash:net,port,net hang with /0 subnet
net: sparx5: Fix reading of the MAC address
vxlan: Fix memory leaks in error path
net: sched: htb: fix htb_classify() kernel-doc
net: sched: cbq: dont intepret cls results when asked to drop
net: sched: atm: dont intepret cls results when asked to drop
dt-bindings: net: marvell,orion-mdio: Fix examples
dt-bindings: net: sun8i-emac: Add phy-supply property
net: ipa: use proper endpoint mask for suspend
selftests: net: return non-zero for failures reported in arp_ndisc_evict_nocarrier
...
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Use signed integer in ipv6_skip_exthdr() called from nf_confirm().
Reported by static analysis tooling, patch from Florian Westphal.
2) Missing set type checks in nf_tables: Validate that set declaration
matches the an existing set type, otherwise bail out with EEXIST.
Currently, nf_tables silently accepts the re-declaration with a
different type but it bails out later with EINVAL when the user adds
entries to the set. This fix is relatively large because it requires
two preparation patches that are included in this batch.
3) Do not ignore updates of timeout and gc_interval parameters in
existing sets.
4) Fix a hang when 0/0 subnets is added to a hash:net,port,net type of
ipset. Except hash:net,port,net and hash:net,iface, the set types don't
support 0/0 and the auxiliary functions rely on this fact. So 0/0 needs
a special handling in hash:net,port,net which was missing (hash:net,iface
was not affected by this bug), from Jozsef Kadlecsik.
5) When adding/deleting large number of elements in one step in ipset,
it can take a reasonable amount of time and can result in soft lockup
errors. This patch is a complete rework of the previous version in order
to use a smaller internal batch limit and at the same time removing
the external hard limit to add arbitrary number of elements in one step.
Also from Jozsef Kadlecsik.
Except for patch #1, which fixes a bug introduced in the previous net-next
development cycle, anything else has been broken for several releases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Slaby reported regression of bind() with a simple repro. [0]
The repro creates a TIME_WAIT socket and tries to bind() a new socket
with the same local address and port. Before commit 28044fc1d4 ("net:
Add a bhash2 table hashed by port and address"), the bind() failed with
-EADDRINUSE, but now it succeeds.
The cited commit should have put TIME_WAIT sockets into bhash2; otherwise,
inet_bhash2_conflict() misses TIME_WAIT sockets when validating bind()
requests if the address is not a wildcard one.
The straight option is to move sk_bind2_node from struct sock to struct
sock_common to add twsk to bhash2 as implemented as RFC. [1] However, the
binary layout change in the struct sock could affect performances moving
hot fields on different cachelines.
To avoid that, we add another TIME_WAIT list in inet_bind2_bucket and check
it while validating bind().
[0]: https://lore.kernel.org/netdev/6b971a4e-c7d8-411e-1f92-fda29b5b2fb9@kernel.org/
[1]: https://lore.kernel.org/netdev/20221221151258.25748-2-kuniyu@amazon.com/
Fixes: 28044fc1d4 ("net: Add a bhash2 table hashed by port and address")
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rudi reports a compilation failure on x86_64 when CONFIG_NET_CLS or
CONFIG_NET_CLS_ACT is not set but CONFIG_RETPOLINE is set.
A misplaced '#endif' was causing the issue.
Fixes: 7f0e810220 ("net/sched: add retpoline wrapper for tc")
Tested-by: Rudi Heitbaum <rudi@heitbaum.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull 9p updates from Dominique Martinet:
- improve p9_check_errors to check buffer size instead of msize when
possible (e.g. not zero-copy)
- some more syzbot and KCSAN fixes
- minor headers include cleanup
* tag '9p-for-6.2-rc1' of https://github.com/martinetd/linux:
9p/client: fix data race on req->status
net/9p: fix response size check in p9_check_errors()
net/9p: distinguish zero-copy requests
9p/xen: do not memcpy header into req->rc
9p: set req refcount to zero to avoid uninitialized usage
9p/net: Remove unneeded idr.h #include
9p/fs: Remove unneeded idr.h #include
Set timeout and garbage collection interval updates are ignored on
updates. Add transaction to update global set element timeout and
garbage collection interval.
Fixes: 96518518cc ("netfilter: add nftables")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, netfilter and can.
Current release - regressions:
- bpf: synchronize dispatcher update with bpf_dispatcher_xdp_func
- rxrpc:
- fix security setting propagation
- fix null-deref in rxrpc_unuse_local()
- fix switched parameters in peer tracing
Current release - new code bugs:
- rxrpc:
- fix I/O thread startup getting skipped
- fix locking issues in rxrpc_put_peer_locked()
- fix I/O thread stop
- fix uninitialised variable in rxperf server
- fix the return value of rxrpc_new_incoming_call()
- microchip: vcap: fix initialization of value and mask
- nfp: fix unaligned io read of capabilities word
Previous releases - regressions:
- stop in-kernel socket users from corrupting socket's task_frag
- stream: purge sk_error_queue in sk_stream_kill_queues()
- openvswitch: fix flow lookup to use unmasked key
- dsa: mv88e6xxx: avoid reg_lock deadlock in mv88e6xxx_setup_port()
- devlink:
- hold region lock when flushing snapshots
- protect devlink dump by the instance lock
Previous releases - always broken:
- bpf:
- prevent leak of lsm program after failed attach
- resolve fext program type when checking map compatibility
- skbuff: account for tail adjustment during pull operations
- macsec: fix net device access prior to holding a lock
- bonding: switch back when high prio link up
- netfilter: flowtable: really fix NAT IPv6 offload
- enetc: avoid buffer leaks on xdp_do_redirect() failure
- unix: fix race in SOCK_SEQPACKET's unix_dgram_sendmsg()
- dsa: microchip: remove IRQF_TRIGGER_FALLING in
request_threaded_irq"
* tag 'net-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
net: fec: check the return value of build_skb()
net: simplify sk_page_frag
Treewide: Stop corrupting socket's task_frag
net: Introduce sk_use_task_frag in struct sock.
mctp: Remove device type check at unregister
net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq
can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len
can: flexcan: avoid unbalanced pm_runtime_enable warning
Documentation: devlink: add missing toc entry for etas_es58x devlink doc
mctp: serial: Fix starting value for frame check sequence
nfp: fix unaligned io read of capabilities word
net: stream: purge sk_error_queue in sk_stream_kill_queues()
myri10ge: Fix an error handling path in myri10ge_probe()
net: microchip: vcap: Fix initialization of value and mask
rxrpc: Fix the return value of rxrpc_new_incoming_call()
rxrpc: rxperf: Fix uninitialised variable
rxrpc: Fix I/O thread stop
rxrpc: Fix switched parameters in peer tracing
rxrpc: Fix locking issues in rxrpc_put_peer_locked()
rxrpc: Fix I/O thread startup getting skipped
...
Add the following fields to the set description:
- key type
- data type
- object type
- policy
- gc_int: garbage collection interval)
- timeout: element timeout
This prepares for stricter set type checks on updates in a follow up
patch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Now that in-kernel socket users that may recurse during reclaim have benn
converted to sk_use_task_frag = false, we can have sk_page_frag() simply
check that value.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sockets that can be used while recursing into memory reclaim, like
those used by network block devices and file systems, mustn't use
current->task_frag: if the current process is already using it, then
the inner memory reclaim call would corrupt the task_frag structure.
To avoid this, sk_page_frag() uses ->sk_allocation to detect sockets
that mustn't use current->task_frag, assuming that those used during
memory reclaim had their allocation constraints reflected in
->sk_allocation.
This unfortunately doesn't cover all cases: in an attempt to remove all
usage of GFP_NOFS and GFP_NOIO, sunrpc stopped setting these flags in
->sk_allocation, and used memalloc_nofs critical sections instead.
This breaks the sk_page_frag() heuristic since the allocation
constraints are now stored in current->flags, which sk_page_frag()
can't read without risking triggering a cache miss and slowing down
TCP's fast path.
This patch creates a new field in struct sock, named sk_use_task_frag,
which sockets with memory reclaim constraints can set to false if they
can't safely use current->task_frag. In such cases, sk_page_frag() now
always returns the socket's page_frag (->sk_frag). The first user is
sunrpc, which needs to avoid using current->task_frag but can keep
->sk_allocation set to GFP_KERNEL otherwise.
Eventually, it might be possible to simplify sk_page_frag() by only
testing ->sk_use_task_frag and avoid relying on the ->sk_allocation
heuristic entirely (assuming other sockets will set ->sk_use_task_frag
according to their constraints in the future).
The new ->sk_use_task_frag field is placed in a hole in struct sock and
belongs to a cache line shared with ->sk_shutdown. Therefore it should
be hot and shouldn't have negative performance impacts on TCP's fast
path (sk_shutdown is tested just before the while() loop in
tcp_sendmsg_locked()).
Link: https://lore.kernel.org/netdev/b4d8cb09c913d3e34f853736f3f5628abfd7f4b6.1656699567.git.gnault@redhat.com/
Signed-off-by: Guillaume Nault <gnault@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pull rdma updates from Jason Gunthorpe:
"Usual size of updates, a new driver, and most of the bulk focusing on
rxe:
- Usual typos, style, and language updates
- Driver updates for mlx5, irdma, siw, rts, srp, hfi1, hns, erdma,
mlx4, srp
- Lots of RXE updates:
* Improve reply error handling for bad MR operations
* Code tidying
* Debug printing uses common loggers
* Remove half implemented RD related stuff
* Support IBA's recently defined Atomic Write and Flush operations
- erdma support for atomic operations
- New driver 'mana' for Ethernet HW available in Azure VMs. This
driver only supports DPDK"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (122 commits)
IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces
RDMA: Add missed netdev_put() for the netdevice_tracker
RDMA/rxe: Enable RDMA FLUSH capability for rxe device
RDMA/cm: Make QP FLUSHABLE for supported device
RDMA/rxe: Implement flush completion
RDMA/rxe: Implement flush execution in responder side
RDMA/rxe: Implement RC RDMA FLUSH service in requester side
RDMA/rxe: Extend rxe packet format to support flush
RDMA/rxe: Allow registering persistent flag for pmem MR only
RDMA/rxe: Extend rxe user ABI to support flush
RDMA: Extend RDMA kernel verbs ABI to support flush
RDMA: Extend RDMA user ABI to support flush
RDMA/rxe: Fix incorrect responder length checking
RDMA/rxe: Fix oops with zero length reads
RDMA/mlx5: Remove not-used IB_FLOW_SPEC_IB define
RDMA/hns: Fix XRC caps on HIP08
RDMA/hns: Fix error code of CMD
RDMA/hns: Fix page size cap from firmware
RDMA/hns: Fix PBL page MTR find
RDMA/hns: Fix AH attr queried by query_qp
...
Merge in the left-over fixes before the net-next pull-request.
net/mptcp/subflow.c
d3295fee3c ("mptcp: use proper req destructor for IPv6")
36b122baf6 ("mptcp: add subflow_v(4,6)_send_synack()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
IPv6/TCP and GRO stacks can build big TCP packets with an added
temporary Hop By Hop header.
Is GSO is not involved, then the temporary header needs to be removed in
the driver. This patch provides a generic helper for drivers that need
to modify their headers in place.
Tested:
Compiled and ran with ethtool -K eth1 tso off
Could send Big TCP packets
Signed-off-by: Coco Li <lixiaoyan@google.com>
Link: https://lore.kernel.org/r/20221210041646.3587757-1-lixiaoyan@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- Add a new VID/PID 0489/e0f2 for MT7922
- Add Realtek RTL8852BE support ID 0x0cb8:0xc559
- Add a new PID/VID 13d3/3549 for RTL8822CU
- Add support for broadcom BCM43430A0 & BCM43430A1
- Add CONFIG_BT_HCIBTUSB_POLL_SYNC
- Add CONFIG_BT_LE_L2CAP_ECRED
- Add support for CYW4373A0
- Add support for RTL8723DS
- Add more device IDs for WCN6855
- Add Broadcom BCM4377 family PCIe Bluetooth
* tag 'for-net-next-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (51 commits)
Bluetooth: Wait for HCI_OP_WRITE_AUTH_PAYLOAD_TO to complete
Bluetooth: ISO: Avoid circular locking dependency
Bluetooth: RFCOMM: don't call kfree_skb() under spin_lock_irqsave()
Bluetooth: hci_core: don't call kfree_skb() under spin_lock_irqsave()
Bluetooth: hci_bcsp: don't call kfree_skb() under spin_lock_irqsave()
Bluetooth: hci_h5: don't call kfree_skb() under spin_lock_irqsave()
Bluetooth: hci_ll: don't call kfree_skb() under spin_lock_irqsave()
Bluetooth: hci_qca: don't call kfree_skb() under spin_lock_irqsave()
Bluetooth: btusb: don't call kfree_skb() under spin_lock_irqsave()
Bluetooth: btintel: Fix missing free skb in btintel_setup_combined()
Bluetooth: hci_conn: Fix crash on hci_create_cis_sync
Bluetooth: btintel: Fix existing sparce warnings
Bluetooth: btusb: Fix existing sparce warning
Bluetooth: btusb: Fix new sparce warnings
Bluetooth: btusb: Add a new PID/VID 13d3/3549 for RTL8822CU
Bluetooth: btusb: Add Realtek RTL8852BE support ID 0x0cb8:0xc559
dt-bindings: net: realtek-bluetooth: Add RTL8723DS
Bluetooth: btusb: Add a new VID/PID 0489/e0f2 for MT7922
dt-bindings: bluetooth: broadcom: add BCM43430A0 & BCM43430A1
Bluetooth: hci_bcm4377: Fix missing pci_disable_device() on error in bcm4377_probe()
...
====================
Link: https://lore.kernel.org/r/20221212222322.1690780-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
1) Incorrect error check in nft_expr_inner_parse(), from Dan Carpenter.
2) Add DATA_SENT state to SCTP connection tracking helper, from
Sriram Yagnaraman.
3) Consolidate nf_confirm for ipv4 and ipv6, from Florian Westphal.
4) Add bitmask support for ipset, from Vishwanath Pai.
5) Handle icmpv6 redirects as RELATED, from Florian Westphal.
6) Add WARN_ON_ONCE() to impossible case in flowtable datapath,
from Li Qiong.
7) A large batch of IPVS updates to replace timer-based estimators by
kthreads to scale up wrt. CPUs and workload (millions of estimators).
Julian Anastasov says:
This patchset implements stats estimation in kthread context.
It replaces the code that runs on single CPU in timer context every 2
seconds and causing latency splats as shown in reports [1], [2], [3].
The solution targets setups with thousands of IPVS services,
destinations and multi-CPU boxes.
Spread the estimation on multiple (configured) CPUs and multiple
time slots (timer ticks) by using multiple chains organized under RCU
rules. When stats are not needed, it is recommended to use
run_estimation=0 as already implemented before this change.
RCU Locking:
- As stats are now RCU-locked, tot_stats, svc and dest which
hold estimator structures are now always freed from RCU
callback. This ensures RCU grace period after the
ip_vs_stop_estimator() call.
Kthread data:
- every kthread works over its own data structure and all
such structures are attached to array. For now we limit
kthreads depending on the number of CPUs.
- even while there can be a kthread structure, its task
may not be running, eg. before first service is added or
while the sysctl var is set to an empty cpulist or
when run_estimation is set to 0 to disable the estimation.
- the allocated kthread context may grow from 1 to 50
allocated structures for timer ticks which saves memory for
setups with small number of estimators
- a task and its structure may be released if all
estimators are unlinked from its chains, leaving the
slot in the array empty
- every kthread data structure allows limited number
of estimators. Kthread 0 is also used to initially
calculate the max number of estimators to allow in every
chain considering a sub-100 microsecond cond_resched
rate. This number can be from 1 to hundreds.
- kthread 0 has an additional job of optimizing the
adding of estimators: they are first added in
temp list (est_temp_list) and later kthread 0
distributes them to other kthreads. The optimization
is based on the fact that newly added estimator
should be estimated after 2 seconds, so we have the
time to offload the adding to chain from controlling
process to kthread 0.
- to add new estimators we use the last added kthread
context (est_add_ktid). The new estimators are linked to
the chains just before the estimated one, based on add_row.
This ensures their estimation will start after 2 seconds.
If estimators are added in bursts, common case if all
services and dests are initially configured, we may
spread the estimators to more chains and as result,
reducing the initial delay below 2 seconds.
Many thanks to Jiri Wiesner for his valuable comments
and for spending a lot of time reviewing and testing
the changes on different platforms with 48-256 CPUs and
1-8 NUMA nodes under different cpufreq governors.
The new IPVS estimators do not use workqueue infrastructure
because:
- The estimation can take long time when using multiple IPVS rules (eg.
millions estimator structures) and especially when box has multiple
CPUs due to the for_each_possible_cpu usage that expects packets from
any CPU. With est_nice sysctl we have more control how to prioritize the
estimation kthreads compared to other processes/kthreads that have
latency requirements (such as servers). As a benefit, we can see these
kthreads in top and decide if we will need some further control to limit
their CPU usage (max number of structure to estimate per kthread).
- with kthreads we run code that is read-mostly, no write/lock
operations to process the estimators in 2-second intervals.
- work items are one-shot: as estimators are processed every
2 seconds, they need to be re-added every time. This again
loads the timers (add_timer) if we use delayed works, as there are
no kthreads to do the timings.
[1] Report from Yunhong Jiang:
https://lore.kernel.org/netdev/D25792C1-1B89-45DE-9F10-EC350DC04ADC@gmail.com/
[2] https://marc.info/?l=linux-virtual-server&m=159679809118027&w=2
[3] Report from Dust:
https://archive.linuxvirtualserver.org/html/lvs-devel/2020-12/msg00000.html
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
ipvs: run_estimation should control the kthread tasks
ipvs: add est_cpulist and est_nice sysctl vars
ipvs: use kthreads for stats estimation
ipvs: use u64_stats_t for the per-cpu counters
ipvs: use common functions for stats allocation
ipvs: add rcu protection to stats
netfilter: flowtable: add a 'default' case to flowtable datapath
netfilter: conntrack: set icmpv6 redirects as RELATED
netfilter: ipset: Add support for new bitmask parameter
netfilter: conntrack: merge ipv4+ipv6 confirm functions
netfilter: conntrack: add sctp DATA_SENT state
netfilter: nft_inner: fix IS_ERR() vs NULL check
====================
Link: https://lore.kernel.org/r/20221211101204.1751-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Broadcom 4377 controllers found in Apple x86 Macs with the T2 chip
claim to support extended scanning when querying supported states,
< HCI Command: LE Read Supported St.. (0x08|0x001c) plen 0
> HCI Event: Command Complete (0x0e) plen 12
LE Read Supported States (0x08|0x001c) ncmd 1
Status: Success (0x00)
States: 0x000003ffffffffff
[...]
LE Set Extended Scan Parameters (Octet 37 - Bit 5)
LE Set Extended Scan Enable (Octet 37 - Bit 6)
[...]
, but then fail to actually implement the extended scanning:
< HCI Command: LE Set Extended Sca.. (0x08|0x0041) plen 8
Own address type: Random (0x01)
Filter policy: Accept all advertisement (0x00)
PHYs: 0x01
Entry 0: LE 1M
Type: Active (0x01)
Interval: 11.250 msec (0x0012)
Window: 11.250 msec (0x0012)
> HCI Event: Command Complete (0x0e) plen 4
LE Set Extended Scan Parameters (0x08|0x0041) ncmd 1
Status: Unknown HCI Command (0x01)
Signed-off-by: Sven Peter <sven@svenpeter.dev>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Broadcom controllers present on Apple Silicon devices use the upper
8 bits of the event type in the LE Extended Advertising Report for
the channel on which the frame has been received.
These bits are reserved according to the Bluetooth spec anyway such that
we can just drop them to ensure that the advertising results are parsed
correctly.
The following excerpt from a btmon trace shows a report received on
channel 37 by these controllers:
> HCI Event: LE Meta Event (0x3e) plen 55
LE Extended Advertising Report (0x0d)
Num reports: 1
Entry 0
Event type: 0x2513
Props: 0x0013
Connectable
Scannable
Use legacy advertising PDUs
Data status: Complete
Reserved (0x2500)
Legacy PDU Type: Reserved (0x2513)
Address type: Public (0x00)
Address: XX:XX:XX:XX:XX:XX (Shenzhen Jingxun Software [...])
Primary PHY: LE 1M
Secondary PHY: No packets
SID: no ADI field (0xff)
TX power: 127 dBm
RSSI: -76 dBm (0xb4)
Periodic advertising interval: 0.00 msec (0x0000)
Direct address type: Public (0x00)
Direct address: 00:00:00:00:00:00 (OUI 00-00-00)
Data length: 0x1d
[...]
Flags: 0x18
Simultaneous LE and BR/EDR (Controller)
Simultaneous LE and BR/EDR (Host)
Company: Harman International Industries, Inc. (87)
Data: [...]
Service Data (UUID 0xfddf):
Name (complete): JBL Flip 5
Signed-off-by: Sven Peter <sven@svenpeter.dev>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
On cmd_timeout with no reset_gpio, reset the USB port as a last
resort.
This patch changes the behavior of btusb_intel_cmd_timeout and
btusb_rtl_cmd_timeout.
Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@google.com>
Reviewed-by: Ying Hsu <yinghsu@chromium.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
To ease the maintenance, it is often recommended to avoid having #ifdef
preprocessor conditions.
Here the section related to CONFIG_MPTCP was quite short but the next
commit needs to add more code around. It is then cleaner to move
specific MPTCP code to functions located in net/mptcp directory.
Now that mptcp_subflow_request_sock_ops structure can be static, it can
also be marked as "read only after init".
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>