Commit Graph

70429 Commits

Author SHA1 Message Date
Linus Torvalds
511cce163b Merge tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from wifi and can.

  Current release - regressions:

   - phy: don't WARN for PHY_UP state in mdio_bus_phy_resume()

   - wifi: fix locking in mac80211 mlme

   - eth:
      - revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()"
      - mlxbf_gige: fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe

  Previous releases - regressions:

   - wifi: fix regression with non-QoS drivers

  Previous releases - always broken:

   - mptcp: fix unreleased socket in accept queue

   - wifi:
      - don't start TX with fq->lock to fix deadlock
      - fix memory corruption in minstrel_ht_update_rates()

   - eth:
      - macb: fix ZynqMP SGMII non-wakeup source resume failure
      - mt7531: only do PLL once after the reset
      - usbnet: fix memory leak in usbnet_disconnect()

  Misc:

   - usb: qmi_wwan: add new usb-id for Dell branded EM7455"

* tag 'net-6.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (30 commits)
  mptcp: fix unreleased socket in accept queue
  mptcp: factor out __mptcp_close() without socket lock
  net: ethernet: mtk_eth_soc: fix mask of RX_DMA_GET_SPORT{,_V2}
  net: mscc: ocelot: fix tagged VLAN refusal while under a VLAN-unaware bridge
  can: c_can: don't cache TX messages for C_CAN cores
  ice: xsk: drop power of 2 ring size restriction for AF_XDP
  ice: xsk: change batched Tx descriptor cleaning
  net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
  selftests: Fix the if conditions of in test_extra_filter()
  net: phy: Don't WARN for PHY_UP state in mdio_bus_phy_resume()
  net: stmmac: power up/down serdes in stmmac_open/release
  wifi: mac80211: mlme: Fix double unlock on assoc success handling
  wifi: mac80211: mlme: Fix missing unlock on beacon RX
  wifi: mac80211: fix memory corruption in minstrel_ht_update_rates()
  wifi: mac80211: fix regression with non-QoS drivers
  wifi: mac80211: ensure vif queues are operational after start
  wifi: mac80211: don't start TX with fq->lock to fix deadlock
  wifi: cfg80211: fix MCS divisor value
  net: hippi: Add missing pci_disable_device() in rr_init_one()
  net/mlxbf_gige: Fix an IS_ERR() vs NULL bug in mlxbf_gige_mdio_probe
  ...
2022-09-29 08:32:53 -07:00
Menglong Dong
30e51b923e mptcp: fix unreleased socket in accept queue
The mptcp socket and its subflow sockets in accept queue can't be
released after the process exit.

While the release of a mptcp socket in listening state, the
corresponding tcp socket will be released too. Meanwhile, the tcp
socket in the unaccept queue will be released too. However, only init
subflow is in the unaccept queue, and the joined subflow is not in the
unaccept queue, which makes the joined subflow won't be released, and
therefore the corresponding unaccepted mptcp socket will not be released
to.

This can be reproduced easily with following steps:

1. create 2 namespace and veth:
   $ ip netns add mptcp-client
   $ ip netns add mptcp-server
   $ sysctl -w net.ipv4.conf.all.rp_filter=0
   $ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1
   $ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1
   $ ip link add red-client netns mptcp-client type veth peer red-server \
     netns mptcp-server
   $ ip -n mptcp-server address add 10.0.0.1/24 dev red-server
   $ ip -n mptcp-server address add 192.168.0.1/24 dev red-server
   $ ip -n mptcp-client address add 10.0.0.2/24 dev red-client
   $ ip -n mptcp-client address add 192.168.0.2/24 dev red-client
   $ ip -n mptcp-server link set red-server up
   $ ip -n mptcp-client link set red-client up

2. configure the endpoint and limit for client and server:
   $ ip -n mptcp-server mptcp endpoint flush
   $ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2
   $ ip -n mptcp-client mptcp endpoint flush
   $ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2
   $ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \
     1 subflow

3. listen and accept on a port, such as 9999. The nc command we used
   here is modified, which makes it use mptcp protocol by default.
   $ ip netns exec mptcp-server nc -l -k -p 9999

4. open another *two* terminal and use each of them to connect to the
   server with the following command:
   $ ip netns exec mptcp-client nc 10.0.0.1 9999
   Input something after connect to trigger the connection of the second
   subflow. So that there are two established mptcp connections, with the
   second one still unaccepted.

5. exit all the nc command, and check the tcp socket in server namespace.
   And you will find that there is one tcp socket in CLOSE_WAIT state
   and can't release forever.

Fix this by closing all of the unaccepted mptcp socket in
mptcp_subflow_queue_clean() with __mptcp_close().

Now, we can ensure that all unaccepted mptcp sockets will be cleaned by
__mptcp_close() before they are released, so mptcp_sock_destruct(), which
is used to clean the unaccepted mptcp socket, is not needed anymore.

The selftests for mptcp is ran for this commit, and no new failures.

Fixes: f296234c98 ("mptcp: Add handling of incoming MP_JOIN requests")
Fixes: 6aeed90450 ("mptcp: fix race on unaccepted mptcp sockets")
Cc: stable@vger.kernel.org
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28 19:05:21 -07:00
Menglong Dong
26d3e21ce1 mptcp: factor out __mptcp_close() without socket lock
Factor out __mptcp_close() from mptcp_close(). The caller of
__mptcp_close() should hold the socket lock, and cancel mptcp work when
__mptcp_close() returns true.

This function will be used in the next commit.

Fixes: f296234c98 ("mptcp: Add handling of incoming MP_JOIN requests")
Fixes: 6aeed90450 ("mptcp: fix race on unaccepted mptcp sockets")
Cc: stable@vger.kernel.org
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-28 19:05:21 -07:00
Shakeel Butt
b1cab78ba3 Revert "net: set proper memcg for net_init hooks allocations"
This reverts commit 1d0403d20f.

Anatoly Pugachev reported that the commit 1d0403d20f ("net: set proper
memcg for net_init hooks allocations") is somehow causing the sparc64
VMs failed to boot and the VMs boot fine with that patch reverted. So,
revert the patch for now and later we can debug the issue.

Link: https://lore.kernel.org/all/20220918092849.GA10314@u164.east.ru/
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: Vasily Averin <vvs@openvz.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: cgroups@vger.kernel.org
Cc: sparclinux@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Fixes: 1d0403d20f ("net: set proper memcg for net_init hooks allocations")
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-09-28 09:20:37 -07:00
Jakub Kicinski
44d70bb561 Merge tag 'wireless-2022-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:

====================
A few late-comer fixes:
 * locking in mac80211 MLME
 * non-QoS driver crash/regression
 * minstrel memory corruption
 * TX deadlock
 * TX queues not always enabled
 * HE/EHT bitrate calculation

* tag 'wireless-2022-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: mac80211: mlme: Fix double unlock on assoc success handling
  wifi: mac80211: mlme: Fix missing unlock on beacon RX
  wifi: mac80211: fix memory corruption in minstrel_ht_update_rates()
  wifi: mac80211: fix regression with non-QoS drivers
  wifi: mac80211: ensure vif queues are operational after start
  wifi: mac80211: don't start TX with fq->lock to fix deadlock
  wifi: cfg80211: fix MCS divisor value
====================

Link: https://lore.kernel.org/r/20220927135923.45312-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-27 16:52:45 -07:00
Rafael Mendonca
6546646a7f wifi: mac80211: mlme: Fix double unlock on assoc success handling
Commit 6911458dc4 ("wifi: mac80211: mlme: refactor assoc success
handling") moved the per-link setup out of ieee80211_assoc_success() into a
new function ieee80211_assoc_config_link() but missed to remove the unlock
of 'sta_mtx' in case of HE capability/operation missing on HE AP, which
leads to a double unlock:

ieee80211_assoc_success() {
    ...
    ieee80211_assoc_config_link() {
        ...
        if (!(link->u.mgd.conn_flags & IEEE80211_CONN_DISABLE_HE) &&
            (!elems->he_cap || !elems->he_operation)) {
            mutex_unlock(&sdata->local->sta_mtx);
            ...
        }
        ...
    }
    ...
    mutex_unlock(&sdata->local->sta_mtx);
    ...
}

Fixes: 6911458dc4 ("wifi: mac80211: mlme: refactor assoc success handling")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Link: https://lore.kernel.org/r/20220925143420.784975-1-rafaelmendsr@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27 10:34:45 +02:00
Rafael Mendonca
883b8dc1a8 wifi: mac80211: mlme: Fix missing unlock on beacon RX
Commit 98b0b46746 ("wifi: mac80211: mlme: use correct link_sta")
switched to link station instead of deflink and added some checks to do
that, which are done with the 'sta_mtx' mutex held. However, the error
path of these checks does not unlock 'sta_mtx' before returning.

Fixes: 98b0b46746 ("wifi: mac80211: mlme: use correct link_sta")
Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com>
Link: https://lore.kernel.org/r/20220924184042.778676-1-rafaelmendsr@gmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27 10:33:51 +02:00
Paweł Lenkow
be92292b90 wifi: mac80211: fix memory corruption in minstrel_ht_update_rates()
During our testing of WFM200 module over SDIO on i.MX6Q-based platform,
we discovered a memory corruption on the system, tracing back to the wfx
driver. Using kfence, it was possible to trace it back to the root
cause, which is hw->max_rates set to 8 in wfx_init_common,
while the maximum defined by IEEE80211_TX_TABLE_SIZE is 4.

This causes array out-of-bounds writes during updates of the rate table,
as seen below:

BUG: KFENCE: memory corruption in kfree_rcu_work+0x320/0x36c

Corrupted memory at 0xe0a4ffe0 [ 0x03 0x03 0x03 0x03 0x01 0x00 0x00
0x02 0x02 0x02 0x09 0x00 0x21 0xbb 0xbb 0xbb ] (in kfence-#81):
kfree_rcu_work+0x320/0x36c
process_one_work+0x3ec/0x920
worker_thread+0x60/0x7a4
kthread+0x174/0x1b4
ret_from_fork+0x14/0x2c
0x0

kfence-#81: 0xe0a4ffc0-0xe0a4ffdf, size=32, cache=kmalloc-64

allocated by task 297 on cpu 0 at 631.039555s:
minstrel_ht_update_rates+0x38/0x2b0 [mac80211]
rate_control_tx_status+0xb4/0x148 [mac80211]
ieee80211_tx_status_ext+0x364/0x1030 [mac80211]
ieee80211_tx_status+0xe0/0x118 [mac80211]
ieee80211_tasklet_handler+0xb0/0xe0 [mac80211]
tasklet_action_common.constprop.0+0x11c/0x148
__do_softirq+0x1a4/0x61c
irq_exit+0xcc/0x104
call_with_stack+0x18/0x20
__irq_svc+0x80/0xb0
wq_worker_sleeping+0x10/0x100
wq_worker_sleeping+0x10/0x100
schedule+0x50/0xe0
schedule_timeout+0x2e0/0x474
wait_for_completion+0xdc/0x1ec
mmc_wait_for_req_done+0xc4/0xf8
mmc_io_rw_extended+0x3b4/0x4ec
sdio_io_rw_ext_helper+0x290/0x384
sdio_memcpy_toio+0x30/0x38
wfx_sdio_copy_to_io+0x88/0x108 [wfx]
wfx_data_write+0x88/0x1f0 [wfx]
bh_work+0x1c8/0xcc0 [wfx]
process_one_work+0x3ec/0x920
worker_thread+0x60/0x7a4
kthread+0x174/0x1b4
ret_from_fork+0x14/0x2c 0x0

After discussion on the wireless mailing list it was clarified
that the issue has been introduced by:
commit ee0e16ab75 ("mac80211: minstrel_ht: fill all requested rates")
and fix shall be in minstrel_ht_update_rates in rc80211_minstrel_ht.c.

Fixes: ee0e16ab75 ("mac80211: minstrel_ht: fill all requested rates")
Link: https://lore.kernel.org/all/12e5adcd-8aed-f0f7-70cc-4fb7b656b829@camlingroup.com/
Link: https://lore.kernel.org/linux-wireless/20220915131445.30600-1-lech.perczak@camlingroup.com/
Cc: Jérôme Pouiller <jerome.pouiller@silabs.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Peter Seiderer <ps.report@gmx.net>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Krzysztof Drobiński <krzysztof.drobinski@camlingroup.com>,
Signed-off-by: Paweł Lenkow <pawel.lenkow@camlingroup.com>
Signed-off-by: Lech Perczak <lech.perczak@camlingroup.com>
Reviewed-by: Peter Seiderer <ps.report@gmx.net>
Reviewed-by: Jérôme Pouiller <jerome.pouiller@silabs.com>
Acked-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27 10:33:10 +02:00
Hans de Goede
d873697ef2 wifi: mac80211: fix regression with non-QoS drivers
Commit 10cb8e6175 ("mac80211: enable QoS support for nl80211 ctrl port")
changed ieee80211_tx_control_port() to aways call
__ieee80211_select_queue() without checking local->hw.queues.

__ieee80211_select_queue() returns a queue-id between 0 and 3, which means
that now ieee80211_tx_control_port() may end up setting the queue-mapping
for a skb to a value higher then local->hw.queues if local->hw.queues
is less then 4.

Specifically this is a problem for ralink rt2500-pci cards where
local->hw.queues is 2. There this causes rt2x00queue_get_tx_queue() to
return NULL and the following error to be logged: "ieee80211 phy0:
rt2x00mac_tx: Error - Attempt to send packet over invalid queue 2",
after which association with the AP fails.

Other callers of __ieee80211_select_queue() skip calling it when
local->hw.queues < IEEE80211_NUM_ACS, add the same check to
ieee80211_tx_control_port(). This fixes ralink rt2500-pci and
similar cards when less then 4 tx-queues no longer working.

Fixes: 10cb8e6175 ("mac80211: enable QoS support for nl80211 ctrl port")
Cc: Markus Theil <markus.theil@tu-ilmenau.de>
Suggested-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20220918192052.443529-1-hdegoede@redhat.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27 10:32:36 +02:00
Alexander Wetzel
527008e5e8 wifi: mac80211: ensure vif queues are operational after start
Make sure local->queue_stop_reasons and vif.txqs_stopped stay in sync.

When a new vif is created the queues may end up in an inconsistent state
and be inoperable:
Communication not using iTXQ will work, allowing to e.g. complete the
association. But the 4-way handshake will time out. The sta will not
send out any skbs queued in iTXQs.

All normal attempts to start the queues will fail when reaching this
state.
local->queue_stop_reasons will have marked all queues as operational but
vif.txqs_stopped will still be set, creating an inconsistent internal
state.

In reality this seems to be race between the mac80211 function
ieee80211_do_open() setting SDATA_STATE_RUNNING and the wake_txqs_tasklet:
Depending on the driver and the timing the queues may end up to be
operational or not.

Cc: stable@vger.kernel.org
Fixes: f856373e2f ("wifi: mac80211: do not wake queues on a vif that is being stopped")
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Acked-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20220915130946.302803-1-alexander@wetzel-home.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27 10:31:52 +02:00
Alexander Wetzel
b7ce33df1c wifi: mac80211: don't start TX with fq->lock to fix deadlock
ieee80211_txq_purge() calls fq_tin_reset() and
ieee80211_purge_tx_queue(); Both are then calling
ieee80211_free_txskb(). Which can decide to TX the skb again.

There are at least two ways to get a deadlock:

1) When we have a TDLS teardown packet queued in either tin or frags
   ieee80211_tdls_td_tx_handle() will call ieee80211_subif_start_xmit()
   while we still hold fq->lock. ieee80211_txq_enqueue() will thus
   deadlock.

2) A variant of the above happens if aggregation is up and running:
   In that case ieee80211_iface_work() will deadlock with the original
   task: The original tasks already holds fq->lock and tries to get
   sta->lock after kicking off ieee80211_iface_work(). But the worker
   can get sta->lock prior to the original task and will then spin for
   fq->lock.

Avoid these deadlocks by not sending out any skbs when called via
ieee80211_free_txskb().

Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Link: https://lore.kernel.org/r/20220915124120.301918-1-alexander@wetzel-home.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27 10:29:04 +02:00
Tamizh Chelvam Raja
64e966d1e8 wifi: cfg80211: fix MCS divisor value
The Bitrate for HE/EHT MCS6 is calculated wrongly due to the
incorrect MCS divisor value for mcs6. Fix it with the proper
value.

previous mcs_divisor value = (11769/6144) = 1.915527

fixed mcs_divisor value = (11377/6144) = 1.851725

Fixes: 9c97c88d2f ("cfg80211: Add support to calculate and report 4096-QAM HE rates")
Signed-off-by: Tamizh Chelvam Raja <quic_tamizhr@quicinc.com>
Link: https://lore.kernel.org/r/20220908181034.9936-1-quic_tamizhr@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-09-27 10:26:55 +02:00
Hangyu Hua
6e23ec0ba9 net: sched: act_ct: fix possible refcount leak in tcf_ct_init()
nf_ct_put need to be called to put the refcount got by tcf_ct_fill_params
to avoid possible refcount leak when tcf_ct_flow_table_get fails.

Fixes: c34b961a24 ("net/sched: act_ct: Create nf flow table per zone")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Link: https://lore.kernel.org/r/20220923020046.8021-1-hbh25y@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-26 12:40:39 -07:00
Linus Torvalds
504c25cb76 Merge tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from wifi, netfilter and can.

  A handful of awaited fixes here - revert of the FEC changes, bluetooth
  fix, fixes for iwlwifi spew.

  We added a warning in PHY/MDIO code which is triggering on a couple of
  platforms in a false-positive-ish way. If we can't iron that out over
  the week we'll drop it and re-add for 6.1.

  I've added a new "follow up fixes" section for fixes to fixes in
  6.0-rcs but it may actually give the false impression that those are
  problematic or that more testing time would have caught them. So
  likely a one time thing.

  Follow up fixes:

   - nf_tables_addchain: fix nft_counters_enabled underflow

   - ebtables: fix memory leak when blob is malformed

   - nf_ct_ftp: fix deadlock when nat rewrite is needed

  Current release - regressions:

   - Revert "fec: Restart PPS after link state change" and the related
     "net: fec: Use a spinlock to guard `fep->ptp_clk_on`"

   - Bluetooth: fix HCIGETDEVINFO regression

   - wifi: mt76: fix 5 GHz connection regression on mt76x0/mt76x2

   - mptcp: fix fwd memory accounting on coalesce

   - rwlock removal fall out:
      - ipmr: always call ip{,6}_mr_forward() from RCU read-side
        critical section
      - ipv6: fix crash when IPv6 is administratively disabled

   - tcp: read multiple skbs in tcp_read_skb()

   - mdio_bus_phy_resume state warning fallout:
      - eth: ravb: fix PHY state warning splat during system resume
      - eth: sh_eth: fix PHY state warning splat during system resume

  Current release - new code bugs:

   - wifi: iwlwifi: don't spam logs with NSS>2 messages

   - eth: mtk_eth_soc: enable XDP support just for MT7986 SoC

  Previous releases - regressions:

   - bonding: fix NULL deref in bond_rr_gen_slave_id

   - wifi: iwlwifi: mark IWLMEI as broken

  Previous releases - always broken:

   - nf_conntrack helpers:
      - irc: tighten matching on DCC message
      - sip: fix ct_sip_walk_headers
      - osf: fix possible bogus match in nf_osf_find()

   - ipvlan: fix out-of-bound bugs caused by unset skb->mac_header

   - core: fix flow symmetric hash

   - bonding, team: unsync device addresses on ndo_stop

   - phy: micrel: fix shared interrupt on LAN8814"

* tag 'net-6.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  selftests: forwarding: add shebang for sch_red.sh
  bnxt: prevent skb UAF after handing over to PTP worker
  net: marvell: Fix refcounting bugs in prestera_port_sfp_bind()
  net: sched: fix possible refcount leak in tc_new_tfilter()
  net: sunhme: Fix packet reception for len < RX_COPY_THRESHOLD
  udp: Use WARN_ON_ONCE() in udp_read_skb()
  selftests: bonding: cause oops in bond_rr_gen_slave_id
  bonding: fix NULL deref in bond_rr_gen_slave_id
  net: phy: micrel: fix shared interrupt on LAN8814
  net/smc: Stop the CLC flow if no link to map buffers on
  ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient
  net: atlantic: fix potential memory leak in aq_ndev_close()
  can: gs_usb: gs_usb_set_phys_id(): return with error if identify is not supported
  can: gs_usb: gs_can_open(): fix race dev->can.state condition
  can: flexcan: flexcan_mailbox_read() fix return value for drop = true
  net: sh_eth: Fix PHY state warning splat during system resume
  net: ravb: Fix PHY state warning splat during system resume
  netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed
  netfilter: ebtables: fix memory leak when blob is malformed
  netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain()
  ...
2022-09-22 10:58:13 -07:00
Hangyu Hua
c2e1cfefca net: sched: fix possible refcount leak in tc_new_tfilter()
tfilter_put need to be called to put the refount got by tp->ops->get to
avoid possible refcount leak when chain->tmplt_ops != NULL and
chain->tmplt_ops != tp->ops.

Fixes: 7d5509fa0d ("net: sched: extend proto ops with 'put' callback")
Signed-off-by: Hangyu Hua <hbh25y@gmail.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Link: https://lore.kernel.org/r/20220921092734.31700-1-hbh25y@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 07:04:47 -07:00
Peilin Ye
db39dfdc1c udp: Use WARN_ON_ONCE() in udp_read_skb()
Prevent udp_read_skb() from flooding the syslog.

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Link: https://lore.kernel.org/r/20220921005915.2697-1-yepeilin.cs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-22 06:42:57 -07:00
Wen Gu
e738455b2c net/smc: Stop the CLC flow if no link to map buffers on
There might be a potential race between SMC-R buffer map and
link group termination.

smc_smcr_terminate_all()     | smc_connect_rdma()
--------------------------------------------------------------
                             | smc_conn_create()
for links in smcibdev        |
        schedule links down  |
                             | smc_buf_create()
                             |  \- smcr_buf_map_usable_links()
                             |      \- no usable links found,
                             |         (rmb->mr = NULL)
                             |
                             | smc_clc_send_confirm()
                             |  \- access conn->rmb_desc->mr[]->rkey
                             |     (panic)

During reboot and IB device module remove, all links will be set
down and no usable links remain in link groups. In such situation
smcr_buf_map_usable_links() should return an error and stop the
CLC flow accessing to uninitialized mr.

Fixes: b9247544c1 ("net/smc: convert static link ID instances to support multiple links")
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
Link: https://lore.kernel.org/r/1663656189-32090-1-git-send-email-guwen@linux.alibaba.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-22 12:53:53 +02:00
Florian Westphal
d250889322 netfilter: nf_ct_ftp: fix deadlock when nat rewrite is needed
We can't use ct->lock, this is already used by the seqadj internals.
When using ftp helper + nat, seqadj will attempt to acquire ct->lock
again.

Revert back to a global lock for now.

Fixes: c783a29c7e ("netfilter: nf_ct_ftp: prefer skb_linearize")
Reported-by: Bruno de Paula Larini <bruno.larini@riosoft.com.br>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-20 23:50:03 +02:00
Florian Westphal
62ce44c4ff netfilter: ebtables: fix memory leak when blob is malformed
The bug fix was incomplete, it "replaced" crash with a memory leak.
The old code had an assignment to "ret" embedded into the conditional,
restore this.

Fixes: 7997eff828 ("netfilter: ebtables: reject blobs that don't provide all entry points")
Reported-and-tested-by: syzbot+a24c5252f3e3ab733464@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-20 23:50:03 +02:00
Tetsuo Handa
9a4d6dd554 netfilter: nf_tables: fix percpu memory leak at nf_tables_addchain()
It seems to me that percpu memory for chain stats started leaking since
commit 3bc158f8d0 ("netfilter: nf_tables: map basechain priority to
hardware priority") when nft_chain_offload_priority() returned an error.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 3bc158f8d0 ("netfilter: nf_tables: map basechain priority to hardware priority")
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-20 23:50:03 +02:00
Tetsuo Handa
921ebde3c0 netfilter: nf_tables: fix nft_counters_enabled underflow at nf_tables_addchain()
syzbot is reporting underflow of nft_counters_enabled counter at
nf_tables_addchain() [1], for commit 43eb8949cf ("netfilter:
nf_tables: do not leave chain stats enabled on error") missed that
nf_tables_chain_destroy() after nft_basechain_init() in the error path of
nf_tables_addchain() decrements the counter because nft_basechain_init()
makes nft_is_base_chain() return true by setting NFT_CHAIN_BASE flag.

Increment the counter immediately after returning from
nft_basechain_init().

Link:  https://syzkaller.appspot.com/bug?extid=b5d82a651b71cd8a75ab [1]
Reported-by: syzbot <syzbot+b5d82a651b71cd8a75ab@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+b5d82a651b71cd8a75ab@syzkaller.appspotmail.com>
Fixes: 43eb8949cf ("netfilter: nf_tables: do not leave chain stats enabled on error")
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-09-20 23:50:03 +02:00
Vladimir Oltean
1461d212ab net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs
taprio can only operate as root qdisc, and to that end, there exists the
following check in taprio_init(), just as in mqprio:

	if (sch->parent != TC_H_ROOT)
		return -EOPNOTSUPP;

And indeed, when we try to attach taprio to an mqprio child, it fails as
expected:

$ tc qdisc add dev swp0 root handle 1: mqprio num_tc 8 \
	map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
$ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
	map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
	flags 0x0 clockid CLOCK_TAI
Error: sch_taprio: Can only be attached as root qdisc.

(extack message added by me)

But when we try to attach a taprio child to a taprio root qdisc,
surprisingly it doesn't fail:

$ tc qdisc replace dev swp0 root handle 1: taprio num_tc 8 \
	map 0 1 2 3 4 5 6 7 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
	flags 0x0 clockid CLOCK_TAI
$ tc qdisc replace dev swp0 parent 1:2 taprio num_tc 8 \
	map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
	base-time 0 sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
	flags 0x0 clockid CLOCK_TAI

This is because tc_modify_qdisc() behaves differently when mqprio is
root, vs when taprio is root.

In the mqprio case, it finds the parent qdisc through
p = qdisc_lookup(dev, TC_H_MAJ(clid)), and then the child qdisc through
q = qdisc_leaf(p, clid). This leaf qdisc q has handle 0, so it is
ignored according to the comment right below ("It may be default qdisc,
ignore it"). As a result, tc_modify_qdisc() goes through the
qdisc_create() code path, and this gives taprio_init() a chance to check
for sch_parent != TC_H_ROOT and error out.

Whereas in the taprio case, the returned q = qdisc_leaf(p, clid) is
different. It is not the default qdisc created for each netdev queue
(both taprio and mqprio call qdisc_create_dflt() and keep them in
a private q->qdiscs[], or priv->qdiscs[], respectively). Instead, taprio
makes qdisc_leaf() return the _root_ qdisc, aka itself.

When taprio does that, tc_modify_qdisc() goes through the qdisc_change()
code path, because the qdisc layer never finds out about the child qdisc
of the root. And through the ->change() ops, taprio has no reason to
check whether its parent is root or not, just through ->init(), which is
not called.

The problem is the taprio_leaf() implementation. Even though code wise,
it does the exact same thing as mqprio_leaf() which it is copied from,
it works with different input data. This is because mqprio does not
attach itself (the root) to each device TX queue, but one of the default
qdiscs from its private array.

In fact, since commit 13511704f8 ("net: taprio offload: enforce qdisc
to netdev queue mapping"), taprio does this too, but just for the full
offload case. So if we tried to attach a taprio child to a fully
offloaded taprio root qdisc, it would properly fail too; just not to a
software root taprio.

To fix the problem, stop looking at the Qdisc that's attached to the TX
queue, and instead, always return the default qdiscs that we've
allocated (and to which we privately enqueue and dequeue, in software
scheduling mode).

Since Qdisc_class_ops :: leaf  is only called from tc_modify_qdisc(),
the risk of unforeseen side effects introduced by this change is
minimal.

Fixes: 5a781ccbd1 ("tc: Add support for configuring the taprio scheduler")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 11:41:14 -07:00
Vladimir Oltean
db46e3a88a net/sched: taprio: avoid disabling offload when it was never enabled
In an incredibly strange API design decision, qdisc->destroy() gets
called even if qdisc->init() never succeeded, not exclusively since
commit 87b60cfacf ("net_sched: fix error recovery at qdisc creation"),
but apparently also earlier (in the case of qdisc_create_dflt()).

The taprio qdisc does not fully acknowledge this when it attempts full
offload, because it starts off with q->flags = TAPRIO_FLAGS_INVALID in
taprio_init(), then it replaces q->flags with TCA_TAPRIO_ATTR_FLAGS
parsed from netlink (in taprio_change(), tail called from taprio_init()).

But in taprio_destroy(), we call taprio_disable_offload(), and this
determines what to do based on FULL_OFFLOAD_IS_ENABLED(q->flags).

But looking at the implementation of FULL_OFFLOAD_IS_ENABLED()
(a bitwise check of bit 1 in q->flags), it is invalid to call this macro
on q->flags when it contains TAPRIO_FLAGS_INVALID, because that is set
to U32_MAX, and therefore FULL_OFFLOAD_IS_ENABLED() will return true on
an invalid set of flags.

As a result, it is possible to crash the kernel if user space forces an
error between setting q->flags = TAPRIO_FLAGS_INVALID, and the calling
of taprio_enable_offload(). This is because drivers do not expect the
offload to be disabled when it was never enabled.

The error that we force here is to attach taprio as a non-root qdisc,
but instead as child of an mqprio root qdisc:

$ tc qdisc add dev swp0 root handle 1: \
	mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 hw 0
$ tc qdisc replace dev swp0 parent 1:1 \
	taprio num_tc 8 map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 base-time 0 \
	sched-entry S 0x7f 990000 sched-entry S 0x80 100000 \
	flags 0x0 clockid CLOCK_TAI
Unable to handle kernel paging request at virtual address fffffffffffffff8
[fffffffffffffff8] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Call trace:
 taprio_dump+0x27c/0x310
 vsc9959_port_setup_tc+0x1f4/0x460
 felix_port_setup_tc+0x24/0x3c
 dsa_slave_setup_tc+0x54/0x27c
 taprio_disable_offload.isra.0+0x58/0xe0
 taprio_destroy+0x80/0x104
 qdisc_create+0x240/0x470
 tc_modify_qdisc+0x1fc/0x6b0
 rtnetlink_rcv_msg+0x12c/0x390
 netlink_rcv_skb+0x5c/0x130
 rtnetlink_rcv+0x1c/0x2c

Fix this by keeping track of the operations we made, and undo the
offload only if we actually did it.

I've added "bool offloaded" inside a 4 byte hole between "int clockid"
and "atomic64_t picos_per_byte". Now the first cache line looks like
below:

$ pahole -C taprio_sched net/sched/sch_taprio.o
struct taprio_sched {
        struct Qdisc * *           qdiscs;               /*     0     8 */
        struct Qdisc *             root;                 /*     8     8 */
        u32                        flags;                /*    16     4 */
        enum tk_offsets            tk_offset;            /*    20     4 */
        int                        clockid;              /*    24     4 */
        bool                       offloaded;            /*    28     1 */

        /* XXX 3 bytes hole, try to pack */

        atomic64_t                 picos_per_byte;       /*    32     0 */

        /* XXX 8 bytes hole, try to pack */

        spinlock_t                 current_entry_lock;   /*    40     0 */

        /* XXX 8 bytes hole, try to pack */

        struct sched_entry *       current_entry;        /*    48     8 */
        struct sched_gate_list *   oper_sched;           /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */

Fixes: 9c66d15646 ("taprio: Add support for hardware offloading")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 11:41:14 -07:00
Ido Schimmel
76dd072813 ipv6: Fix crash when IPv6 is administratively disabled
The global 'raw_v6_hashinfo' variable can be accessed even when IPv6 is
administratively disabled via the 'ipv6.disable=1' kernel command line
option, leading to a crash [1].

Fix by restoring the original behavior and always initializing the
variable, regardless of IPv6 support being administratively disabled or
not.

[1]
 BUG: unable to handle page fault for address: ffffffffffffffc8
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 173e18067 P4D 173e18067 PUD 173e1a067 PMD 0
 Oops: 0000 [#1] PREEMPT SMP KASAN
 CPU: 3 PID: 271 Comm: ss Not tainted 6.0.0-rc4-custom-00136-g0727a9a5fbc1 #1396
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
 RIP: 0010:raw_diag_dump+0x310/0x7f0
 [...]
 Call Trace:
  <TASK>
  __inet_diag_dump+0x10f/0x2e0
  netlink_dump+0x575/0xfd0
  __netlink_dump_start+0x67b/0x940
  inet_diag_handler_cmd+0x273/0x2d0
  sock_diag_rcv_msg+0x317/0x440
  netlink_rcv_skb+0x15e/0x430
  sock_diag_rcv+0x2b/0x40
  netlink_unicast+0x53b/0x800
  netlink_sendmsg+0x945/0xe60
  ____sys_sendmsg+0x747/0x960
  ___sys_sendmsg+0x13a/0x1e0
  __sys_sendmsg+0x118/0x1e0
  do_syscall_64+0x34/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Fixes: 0daf07e527 ("raw: convert raw sockets to RCU")
Reported-by: Roberto Ricci <rroberto2r@gmail.com>
Tested-by: Roberto Ricci <rroberto2r@gmail.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220916084821.229287-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 11:27:32 -07:00
Tetsuo Handa
d547c1b717 net: clear msg_get_inq in __get_compat_msghdr()
syzbot is still complaining uninit-value in tcp_recvmsg(), for
commit 1228b34c8d ("net: clear msg_get_inq in __sys_recvfrom() and
__copy_msghdr_from_user()") missed that __get_compat_msghdr() is called
instead of copy_msghdr_from_user() when MSG_CMSG_COMPAT is specified.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: 1228b34c8d ("net: clear msg_get_inq in __sys_recvfrom() and __copy_msghdr_from_user()")
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/d06d0f7f-696c-83b4-b2d5-70b5f2730a37@I-love.SAKURA.ne.jp
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20 08:23:20 -07:00