Commit Graph

14407 Commits

Author SHA1 Message Date
Linus Torvalds
aa35e45cd4 Merge tag 'net-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Networking fixes, including fixes from netfilter, wireless and bpf
  trees.

  Current release - regressions:

   - mt76: fix NULL pointer dereference in mt76u_status_worker and
     mt76s_process_tx_queue

   - net: ipa: fix interconnect enable bug

  Current release - always broken:

   - netfilter: fixes possible oops in mtype_resize in ipset

   - ath11k: fix number of coding issues found by static analysis tools
     and spurious error messages

  Previous releases - regressions:

   - e1000e: re-enable s0ix power saving flows for systems with the
     Intel i219-LM Ethernet controllers to fix power use regression

   - virtio_net: fix recursive call to cpus_read_lock() to avoid a
     deadlock

   - ipv4: ignore ECN bits for fib lookups in fib_compute_spec_dst()

   - sysfs: take the rtnl lock around XPS configuration

   - xsk: fix memory leak for failed bind and rollback reservation at
     NETDEV_TX_BUSY

   - r8169: work around power-saving bug on some chip versions

  Previous releases - always broken:

   - dcb: validate netlink message in DCB handler

   - tun: fix return value when the number of iovs exceeds MAX_SKB_FRAGS
     to prevent unnecessary retries

   - vhost_net: fix ubuf refcount when sendmsg fails

   - bpf: save correct stopping point in file seq iteration

   - ncsi: use real net-device for response handler

   - neighbor: fix div by zero caused by a data race (TOCTOU)

   - bareudp: fix use of incorrect min_headroom size and a false
     positive lockdep splat from the TX lock

   - mvpp2:
      - clear force link UP during port init procedure in case
        bootloader had set it
      - add TCAM entry to drop flow control pause frames
      - fix PPPoE with ipv6 packet parsing
      - fix GoP Networking Complex Control config of port 3
      - fix pkt coalescing IRQ-threshold configuration

   - xsk: fix race in SKB mode transmit with shared cq

   - ionic: account for vlan tag len in rx buffer len

   - stmmac: ignore the second clock input, current clock framework does
     not handle exclusive clock use well, other drivers may reconfigure
     the second clock

  Misc:

   - ppp: change PPPIOCUNBRIDGECHAN ioctl request number to follow
     existing scheme"

* tag 'net-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits)
  net: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access
  net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs
  net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event
  r8169: work around power-saving bug on some chip versions
  net: usb: qmi_wwan: add Quectel EM160R-GL
  selftests: mlxsw: Set headroom size of correct port
  net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag
  ibmvnic: fix: NULL pointer dereference.
  docs: networking: packet_mmap: fix old config reference
  docs: networking: packet_mmap: fix formatting for C macros
  vhost_net: fix ubuf refcount incorrectly when sendmsg fails
  bareudp: Fix use of incorrect min_headroom size
  bareudp: set NETIF_F_LLTX flag
  net: hdlc_ppp: Fix issues when mod_timer is called while timer is running
  atlantic: remove architecture depends
  erspan: fix version 1 check in gre_parse_header()
  net: hns: fix return value check in __lb_other_process()
  net: sched: prevent invalid Scell_log shift count
  net: neighbor: fix a crash caused by mod zero
  ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()
  ...
2021-01-05 12:38:56 -08:00
David S. Miller
4bfc471484 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2020-12-28

The following pull-request contains BPF updates for your *net* tree.

There is a small merge conflict between bpf tree commit 69ca310f34
("bpf: Save correct stopping point in file seq iteration") and net tree
commit 66ed594409 ("bpf/task_iter: In task_file_seq_get_next use
task_lookup_next_fd_rcu"). The get_files_struct() does not exist anymore
in net, so take the hunk in HEAD and add the `info->tid = curr_tid` to
the error path:

  [...]
                curr_task = task_seq_get_next(ns, &curr_tid, true);
                if (!curr_task) {
                        info->task = NULL;
                        info->tid = curr_tid;
                        return NULL;
                }

                /* set info->task and info->tid */
  [...]

We've added 10 non-merge commits during the last 9 day(s) which contain
a total of 11 files changed, 75 insertions(+), 20 deletions(-).

The main changes are:

1) Various AF_XDP fixes such as fill/completion ring leak on failed bind and
   fixing a race in skb mode's backpressure mechanism, from Magnus Karlsson.

2) Fix latency spikes on lockdep enabled kernels by adding a rescheduling
   point to BPF hashtab initialization, from Eric Dumazet.

3) Fix a splat in task iterator by saving the correct stopping point in the
   seq file iteration, from Jonathan Lemon.

4) Fix BPF maps selftest by adding retries in case hashtab returns EBUSY
   errors on update/deletes, from Andrii Nakryiko.

5) Fix BPF selftest error reporting to something more user friendly if the
   vmlinux BTF cannot be found, from Kamal Mostafa.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-28 15:26:11 -08:00
Randy Dunlap
bd1248f1dd net: sched: prevent invalid Scell_log shift count
Check Scell_log shift size in red_check_params() and modify all callers
of red_check_params() to pass Scell_log.

This prevents a shift out-of-bounds as detected by UBSAN:
  UBSAN: shift-out-of-bounds in ./include/net/red.h:252:22
  shift exponent 72 is too large for 32-bit type 'int'

Fixes: 8afa10cbe2 ("net_sched: red: Avoid illegal values")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: syzbot+97c5bd9cc81eca63d36e@syzkaller.appspotmail.com
Cc: Nogah Frankel <nogahf@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-12-28 14:52:54 -08:00
Linus Torvalds
70990afa34 Merge tag '9p-for-5.11-rc1' of git://github.com/martinetd/linux
Pull 9p update from Dominique Martinet:

 - fix long-standing limitation on open-unlink-fop pattern

 - add refcount to p9_fid (fixes the above and will allow for more
   cleanups and simplifications in the future)

* tag '9p-for-5.11-rc1' of git://github.com/martinetd/linux:
  9p: Remove unnecessary IS_ERR() check
  9p: Uninitialized variable in v9fs_writeback_fid()
  9p: Fix writeback fid incorrectly being attached to dentry
  9p: apply review requests for fid refcounting
  9p: add refcount to p9_fid struct
  fs/9p: search open fids first
  fs/9p: track open fids
  fs/9p: fix create-unlink-getattr idiom
2020-12-21 10:28:02 -08:00
Magnus Karlsson
f09ced4053 xsk: Fix race in SKB mode transmit with shared cq
Fix a race when multiple sockets are simultaneously calling sendto()
when the completion ring is shared in the SKB case. This is the case
when you share the same netdev and queue id through the
XDP_SHARED_UMEM bind flag. The problem is that multiple processes can
be in xsk_generic_xmit() and call the backpressure mechanism in
xskq_prod_reserve(xs->pool->cq). As this is a shared resource in this
specific scenario, a race might occur since the rings are
single-producer single-consumer.

Fix this by moving the tx_completion_lock from the socket to the pool
as the pool is shared between the sockets that share the completion
ring. (The pool is not shared when this is not the case.) And then
protect the accesses to xskq_prod_reserve() with this lock. The
tx_completion_lock is renamed cq_lock to better reflect that it
protects accesses to the potentially shared completion ring.

Fixes: 35fcde7f8d ("xsk: support for Tx")
Reported-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/20201218134525.13119-2-magnus.karlsson@gmail.com
2020-12-18 16:10:21 +01:00
Linus Torvalds
ca5b877b6c Merge tag 'selinux-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
 "While we have a small number of SELinux patches for v5.11, there are a
  few changes worth highlighting:

   - Change the LSM network hooks to pass flowi_common structs instead
     of the parent flowi struct as the LSMs do not currently need the
     full flowi struct and they do not have enough information to use it
     safely (missing information on the address family).

     This patch was discussed both with Herbert Xu (representing team
     netdev) and James Morris (representing team
     LSMs-other-than-SELinux).

   - Fix how we handle errors in inode_doinit_with_dentry() so that we
     attempt to properly label the inode on following lookups instead of
     continuing to treat it as unlabeled.

   - Tweak the kernel logic around allowx, auditallowx, and dontauditx
     SELinux policy statements such that the auditx/dontauditx are
     effective even without the allowx statement.

  Everything passes our test suite"

* tag 'selinux-pr-20201214' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  lsm,selinux: pass flowi_common instead of flowi to the LSM hooks
  selinux: Fix fall-through warnings for Clang
  selinux: drop super_block backpointer from superblock_security_struct
  selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling
  selinux: allow dontauditx and auditallowx rules to take effect without allowx
  selinux: fix error initialization in inode_doinit_with_dentry()
2020-12-16 11:01:04 -08:00
Linus Torvalds
d635a69dd4 Merge tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
 "Core:

   - support "prefer busy polling" NAPI operation mode, where we defer
     softirq for some time expecting applications to periodically busy
     poll

   - AF_XDP: improve efficiency by more batching and hindering the
     adjacency cache prefetcher

   - af_packet: make packet_fanout.arr size configurable up to 64K

   - tcp: optimize TCP zero copy receive in presence of partial or
     unaligned reads making zero copy a performance win for much smaller
     messages

   - XDP: add bulk APIs for returning / freeing frames

   - sched: support fragmenting IP packets as they come out of conntrack

   - net: allow virtual netdevs to forward UDP L4 and fraglist GSO skbs

  BPF:

   - BPF switch from crude rlimit-based to memcg-based memory accounting

   - BPF type format information for kernel modules and related tracing
     enhancements

   - BPF implement task local storage for BPF LSM

   - allow the FENTRY/FEXIT/RAW_TP tracing programs to use
     bpf_sk_storage

  Protocols:

   - mptcp: improve multiple xmit streams support, memory accounting and
     many smaller improvements

   - TLS: support CHACHA20-POLY1305 cipher

   - seg6: add support for SRv6 End.DT4/DT6 behavior

   - sctp: Implement RFC 6951: UDP Encapsulation of SCTP

   - ppp_generic: add ability to bridge channels directly

   - bridge: Connectivity Fault Management (CFM) support as is defined
     in IEEE 802.1Q section 12.14.

  Drivers:

   - mlx5: make use of the new auxiliary bus to organize the driver
     internals

   - mlx5: more accurate port TX timestamping support

   - mlxsw:
      - improve the efficiency of offloaded next hop updates by using
        the new nexthop object API
      - support blackhole nexthops
      - support IEEE 802.1ad (Q-in-Q) bridging

   - rtw88: major bluetooth co-existance improvements

   - iwlwifi: support new 6 GHz frequency band

   - ath11k: Fast Initial Link Setup (FILS)

   - mt7915: dual band concurrent (DBDC) support

   - net: ipa: add basic support for IPA v4.5

  Refactor:

   - a few pieces of in_interrupt() cleanup work from Sebastian Andrzej
     Siewior

   - phy: add support for shared interrupts; get rid of multiple driver
     APIs and have the drivers write a full IRQ handler, slight growth
     of driver code should be compensated by the simpler API which also
     allows shared IRQs

   - add common code for handling netdev per-cpu counters

   - move TX packet re-allocation from Ethernet switch tag drivers to a
     central place

   - improve efficiency and rename nla_strlcpy

   - number of W=1 warning cleanups as we now catch those in a patchwork
     build bot

  Old code removal:

   - wan: delete the DLCI / SDLA drivers

   - wimax: move to staging

   - wifi: remove old WDS wifi bridging support"

* tag 'net-next-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1922 commits)
  net: hns3: fix expression that is currently always true
  net: fix proc_fs init handling in af_packet and tls
  nfc: pn533: convert comma to semicolon
  af_vsock: Assign the vsock transport considering the vsock address flags
  af_vsock: Set VMADDR_FLAG_TO_HOST flag on the receive path
  vsock_addr: Check for supported flag values
  vm_sockets: Add VMADDR_FLAG_TO_HOST vsock flag
  vm_sockets: Add flags field in the vsock address data structure
  net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabled
  tcp: Add logic to check for SYN w/ data in tcp_simple_retransmit
  net: mscc: ocelot: install MAC addresses in .ndo_set_rx_mode from process context
  nfc: s3fwrn5: Release the nfc firmware
  net: vxget: clean up sparse warnings
  mlxsw: spectrum_router: Use eXtended mezzanine to offload IPv4 router
  mlxsw: spectrum: Set KVH XLT cache mode for Spectrum2/3
  mlxsw: spectrum_router_xm: Introduce basic XM cache flushing
  mlxsw: reg: Add Router LPM Cache Enable Register
  mlxsw: reg: Add Router LPM Cache ML Delete Register
  mlxsw: spectrum_router_xm: Implement L-value tracking for M-index
  mlxsw: reg: Add XM Router M Table Register
  ...
2020-12-15 13:22:29 -08:00
Toke Høiland-Jørgensen
0780b41456 inet_ecn: Use csum16_add() helper for IP_ECN_set_* helpers
Jakub pointed out that the IP_ECN_set* helpers basically open-code
csum16_add(), so let's switch them over to using the helper instead.

v2:
- Use __be16 for check_add stack variable in IP_ECN_set_ce() (kbot)
v3:
- Turns out we need __force casts to do arithmetic on __be16 types

Reported-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Jonathan Morton <chromatix99@gmail.com>
Tested-by: Pete Heist <pete@heistp.net>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20201211142638.154780-1-toke@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 18:38:58 -08:00
Cambda Zhu
6d4634d1b0 net: Limit logical shift left of TCP probe0 timeout
For each TCP zero window probe, the icsk_backoff is increased by one and
its max value is tcp_retries2. If tcp_retries2 is greater than 63, the
probe0 timeout shift may exceed its max bits. On x86_64/ARMv8/MIPS, the
shift count would be masked to range 0 to 63. And on ARMv7 the result is
zero. If the shift count is masked, only several probes will be sent
with timeout shorter than TCP_RTO_MAX. But if the timeout is zero, it
needs tcp_retries2 times probes to end this false timeout. Besides,
bitwise shift greater than or equal to the width is an undefined
behavior.

This patch adds a limit to the backoff. The max value of max_when is
TCP_RTO_MAX and the min value of timeout base is TCP_RTO_MIN. The limit
is the backoff from TCP_RTO_MIN to TCP_RTO_MAX.

Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com>
Link: https://lore.kernel.org/r/20201208091910.37618-1-cambda@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:34:54 -08:00
Florian Westphal
049fe386d3 tcp: parse mptcp options contained in reset packets
Because TCP-level resets only affect the subflow, there is a MPTCP
option to indicate that the MPTCP-level connection should be closed
immediately without a mptcp-level fin exchange.

This is the 'MPTCP fast close option'.  It can be carried on ack
segments or TCP resets.  In the latter case, its needed to parse mptcp
options also for reset packets so that MPTCP can act accordingly.

Next patch will add receive side fastclose support in MPTCP.

Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 17:30:06 -08:00
Linus Torvalds
8c1dccc803 Merge tag 'core-rcu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Thomas Gleixner:
 "RCU, LKMM and KCSAN updates collected by Paul McKenney.

  RCU:
   - Avoid cpuinfo-induced IPI pileups and idle-CPU IPIs

   - Lockdep-RCU updates reducing the need for __maybe_unused

   - Tasks-RCU updates

   - Miscellaneous fixes

   - Documentation updates

   - Torture-test updates

  KCSAN:
   - updates for selftests, avoiding setting watchpoints on NULL pointers

   - fix to watchpoint encoding

  LKMM:
   - updates for documentation along with some updates to example-code
     litmus tests"

* tag 'core-rcu-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  srcu: Take early exit on memory-allocation failure
  rcu/tree: Defer kvfree_rcu() allocation to a clean context
  rcu: Do not report strict GPs for outgoing CPUs
  rcu: Fix a typo in rcu_blocking_is_gp() header comment
  rcu: Prevent lockdep-RCU splats on lock acquisition/release
  rcu/tree: nocb: Avoid raising softirq for offloaded ready-to-execute CBs
  rcu,ftrace: Fix ftrace recursion
  rcu/tree: Make struct kernel_param_ops definitions const
  rcu/tree: Add a warning if CPU being onlined did not report QS already
  rcu: Clarify nocb kthreads naming in RCU_NOCB_CPU config
  rcu: Fix single-CPU check in rcu_blocking_is_gp()
  rcu: Implement rcu_segcblist_is_offloaded() config dependent
  list.h: Update comment to explicitly note circular lists
  rcu: Panic after fixed number of stalls
  x86/smpboot:  Move rcu_cpu_starting() earlier
  rcu: Allow rcu_irq_enter_check_tick() from NMI
  tools/memory-model: Label MP tests' producers and consumers
  tools/memory-model: Use "buf" and "flag" for message-passing tests
  tools/memory-model: Add types to litmus tests
  tools/memory-model: Add a glossary of LKMM terms
  ...
2020-12-14 17:21:16 -08:00
Linus Torvalds
f9b4240b07 Merge tag 'fixes-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull misc fixes from Christian Brauner:
 "This contains several fixes which felt worth being combined into a
  single branch:

   - Use put_nsproxy() instead of open-coding it switch_task_namespaces()

   - Kirill's work to unify lifecycle management for all namespaces. The
     lifetime counters are used identically for all namespaces types.
     Namespaces may of course have additional unrelated counters and
     these are not altered. This work allows us to unify the type of the
     counters and reduces maintenance cost by moving the counter in one
     place and indicating that basic lifetime management is identical
     for all namespaces.

   - Peilin's fix adding three byte padding to Dmitry's
     PTRACE_GET_SYSCALL_INFO uapi struct to prevent an info leak.

   - Two smal patches to convert from the /* fall through */ comment
     annotation to the fallthrough keyword annotation which I had taken
     into my branch and into -next before df561f6688 ("treewide: Use
     fallthrough pseudo-keyword") made it upstream which fixed this
     tree-wide.

     Since I didn't want to invalidate all testing for other commits I
     didn't rebase and kept them"

* tag 'fixes-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  nsproxy: use put_nsproxy() in switch_task_namespaces()
  sys: Convert to the new fallthrough notation
  signal: Convert to the new fallthrough notation
  time: Use generic ns_common::count
  cgroup: Use generic ns_common::count
  mnt: Use generic ns_common::count
  user: Use generic ns_common::count
  pid: Use generic ns_common::count
  ipc: Use generic ns_common::count
  uts: Use generic ns_common::count
  net: Use generic ns_common::count
  ns: Add a common refcount into ns_common
  ptrace: Prevent kernel-infoleak in ptrace_get_syscall_info()
2020-12-14 16:40:27 -08:00
Jakub Kicinski
7bca5021a4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

1) Missing dependencies in NFT_BRIDGE_REJECT, from Randy Dunlap.

2) Use atomic_inc_return() instead of atomic_add_return() in IPVS,
   from Yejune Deng.

3) Simplify check for overquota in xt_nfacct, from Kaixu Xia.

4) Move nfnl_acct_list away from struct net, from Miao Wang.

5) Pass actual sk in reject actions, from Jan Engelhardt.

6) Add timeout and protoinfo to ctnetlink destroy events,
   from Florian Westphal.

7) Four patches to generalize set infrastructure to support
   for multiple expressions per set element.

* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next:
  netfilter: nftables: netlink support for several set element expressions
  netfilter: nftables: generalize set extension to support for several expressions
  netfilter: nftables: move nft_expr before nft_set
  netfilter: nftables: generalize set expressions support
  netfilter: ctnetlink: add timeout and protoinfo to destroy events
  netfilter: use actual socket sk for REJECT action
  netfilter: nfnl_acct: remove data from struct net
  netfilter: Remove unnecessary conversion to bool
  ipvs: replace atomic_add_return()
  netfilter: nft_reject_bridge: fix build errors due to code movement
====================

Link: https://lore.kernel.org/r/20201212230513.3465-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-14 15:43:21 -08:00
SeongJae Park
0b9b241406 inet: frags: batch fqdir destroy works
On a few of our systems, I found frequent 'unshare(CLONE_NEWNET)' calls
make the number of active slab objects including 'sock_inode_cache' type
rapidly and continuously increase.  As a result, memory pressure occurs.

In more detail, I made an artificial reproducer that resembles the
workload that we found the problem and reproduce the problem faster.  It
merely repeats 'unshare(CLONE_NEWNET)' 50,000 times in a loop.  It takes
about 2 minutes.  On 40 CPU cores / 70GB DRAM machine, the available
memory continuously reduced in a fast speed (about 120MB per second,
15GB in total within the 2 minutes).  Note that the issue don't
reproduce on every machine.  On my 6 CPU cores machine, the problem
didn't reproduce.

'cleanup_net()' and 'fqdir_work_fn()' are functions that deallocate the
relevant memory objects.  They are asynchronously invoked by the work
queues and internally use 'rcu_barrier()' to ensure safe destructions.
'cleanup_net()' works in a batched maneer in a single thread worker,
while 'fqdir_work_fn()' works for each 'fqdir_exit()' call in the
'system_wq'.  Therefore, 'fqdir_work_fn()' called frequently under the
workload and made the contention for 'rcu_barrier()' high.  In more
detail, the global mutex, 'rcu_state.barrier_mutex' became the
bottleneck.

This commit avoids such contention by doing the 'rcu_barrier()' and
subsequent lightweight works in a batched manner, as similar to that of
'cleanup_net()'.  The fqdir hashtable destruction, which is done before
the 'rcu_barrier()', is still allowed to run in parallel for fast
processing, but this commit makes it to use a dedicated work queue
instead of the 'system_wq', to make sure that the number of threads is
bounded.

Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20201211112405.31158-1-sjpark@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-12 15:08:54 -08:00
Pablo Neira Ayuso
563125a73a netfilter: nftables: generalize set extension to support for several expressions
This patch replaces NFT_SET_EXPR by NFT_SET_EXT_EXPRESSIONS. This new
extension allows to attach several expressions to one set element (not
only one single expression as NFT_SET_EXPR provides). This patch
prepares for support for several expressions per set element in the
netlink userspace API.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-12 19:20:24 +01:00
Jakub Kicinski
00f7763a26 Merge tag 'mac80211-next-for-net-next-2020-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:

====================
A new set of wireless changes:
 * validate key indices for key deletion
 * more preamble support in mac80211
 * various 6 GHz scan fixes/improvements
 * a common SAR power limitations API
 * various small fixes & code improvements

* tag 'mac80211-next-for-net-next-2020-12-11' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (35 commits)
  mac80211: add ieee80211_set_sar_specs
  nl80211: add common API to configure SAR power limitations
  mac80211: fix a mistake check for rx_stats update
  mac80211: mlme: save ssid info to ieee80211_bss_conf while assoc
  mac80211: Update rate control on channel change
  mac80211: don't filter out beacons once we start CSA
  mac80211: Fix calculation of minimal channel width
  mac80211: ignore country element TX power on 6 GHz
  mac80211: use bitfield helpers for BA session action frames
  mac80211: support Rx timestamp calculation for all preamble types
  mac80211: don't set set TDLS STA bandwidth wider than possible
  mac80211: support driver-based disconnect with reconnect hint
  cfg80211: support immediate reconnect request hint
  mac80211: use struct assignment for he_obss_pd
  cfg80211: remove struct ieee80211_he_bss_color
  nl80211: validate key indexes for cfg80211_registered_device
  cfg80211: include block-tx flag in channel switch started event
  mac80211: disallow band-switch during CSA
  ieee80211: update reduced neighbor report TBTT info length
  cfg80211: Save the regulatory domain when setting custom regulatory
  ...
====================

Link: https://lore.kernel.org/r/20201211142552.209018-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-12 10:07:56 -08:00
Pablo Neira Ayuso
92b211a289 netfilter: nftables: move nft_expr before nft_set
Move the nft_expr structure definition before nft_set. Expressions are
used by rules and sets, remove unnecessary forward declarations. This
comes as preparation to support for multiple expressions per set element.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-12 11:44:42 +01:00
Pablo Neira Ayuso
8cfd9b0f85 netfilter: nftables: generalize set expressions support
Currently, the set infrastucture allows for one single expressions per
element. This patch extends the existing infrastructure to allow for up
to two expressions. This is not updating the netlink API yet, this is
coming as an initial preparation patch.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-12 11:44:42 +01:00
Florian Westphal
86d21fc747 netfilter: ctnetlink: add timeout and protoinfo to destroy events
DESTROY events do not include the remaining timeout.

Add the timeout if the entry was removed explicitly. This can happen
when a conntrack gets deleted prematurely, e.g. due to a tcp reset,
module removal, netdev notifier (nat/masquerade device went down),
ctnetlink and so on.

Add the protocol state too for the destroy message to check for abnormal
state on connection termination.

Joint work with Pablo.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-12-12 11:44:42 +01:00
Jakub Kicinski
46d5e62dd3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
xdp_return_frame_bulk() needs to pass a xdp_buff
to __xdp_return().

strlcpy got converted to strscpy but here it makes no
functional difference, so just keep the right code.

Conflicts:
	net/netfilter/nf_tables_api.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-11 22:29:38 -08:00
Carl Huang
c534e093d8 mac80211: add ieee80211_set_sar_specs
This change registers ieee80211_set_sar_specs to
mac80211_config_ops, so cfg80211 can call it.

Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Abhishek Kumar <kuabhs@chromium.org>
Link: https://lore.kernel.org/r/20201203103728.3034-3-cjhuang@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-12-11 13:39:59 +01:00
Carl Huang
6bdb68cef7 nl80211: add common API to configure SAR power limitations
NL80211_CMD_SET_SAR_SPECS is added to configure SAR from
user space. NL80211_ATTR_SAR_SPEC is used to pass the SAR
power specification when used with NL80211_CMD_SET_SAR_SPECS.

Wireless driver needs to register SAR type, supported frequency
ranges to wiphy, so user space can query it. The index in
frequency range is used to specify which sub band the power
limitation applies to. The SAR type is for compatibility, so later
other SAR mechanism can be implemented without breaking the user
space SAR applications.

Normal process is user space queries the SAR capability, and
gets the index of supported frequency ranges and associates the
power limitation with this index and sends to kernel.

Here is an example of message send to kernel:
8c 00 00 00 08 00 01 00 00 00 00 00 38 00 2b 81
08 00 01 00 00 00 00 00 2c 00 02 80 14 00 00 80
08 00 02 00 00 00 00 00 08 00 01 00 38 00 00 00
14 00 01 80 08 00 02 00 01 00 00 00 08 00 01 00
48 00 00 00

NL80211_CMD_SET_SAR_SPECS:  0x8c
NL80211_ATTR_WIPHY:     0x01(phy idx is 0)
NL80211_ATTR_SAR_SPEC:  0x812b (NLA_NESTED)
NL80211_SAR_ATTR_TYPE:  0x00 (NL80211_SAR_TYPE_POWER)
NL80211_SAR_ATTR_SPECS: 0x8002 (NLA_NESTED)
freq range 0 power: 0x38 in 0.25dbm unit (14dbm)
freq range 1 power: 0x48 in 0.25dbm unit (18dbm)

Signed-off-by: Carl Huang <cjhuang@codeaurora.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Abhishek Kumar <kuabhs@chromium.org>
Link: https://lore.kernel.org/r/20201203103728.3034-2-cjhuang@codeaurora.org
[minor edits, NLA parse cleanups]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-12-11 13:38:54 +01:00
Johannes Berg
3f8a39ff28 mac80211: support driver-based disconnect with reconnect hint
Support the driver indicating that a disconnection needs
to be performed, and pass through the reconnect hint in
this case.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20201206145305.5c8dab7a22a0.I58459fdf6968b16c90cab9c574f0f04ca22b0c79@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-12-11 13:20:05 +01:00
Johannes Berg
3bb02143ff cfg80211: support immediate reconnect request hint
There are cases where it's necessary to disconnect, but an
immediate reconnection is desired. Support a hint to userspace
that this is the case, by including a new attribute in the
deauth or disassoc event.

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20201206145305.58d33941fb9d.I0e7168c205c7949529c8e3b86f3c9b12c01a7017@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-12-11 13:20:05 +01:00
Johannes Berg
539a36ba2f cfg80211: remove struct ieee80211_he_bss_color
We don't really use this struct, we're now using
struct cfg80211_he_bss_color instead.

Change the one place in mac80211 that's using the old
name to use struct assignment instead of memcpy() and
thus remove the wrong sizeof while at it.

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://lore.kernel.org/r/iwlwifi.20201206145305.f6698d97ae4e.Iba2dffcb79c4ab80bde7407609806010b55edfdf@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-12-11 13:20:04 +01:00