Before adding yet another possibly contended atomic_long_t,
it is time to add per-cpu storage for existing ones:
dev->tx_dropped, dev->rx_dropped, and dev->rx_nohandler
Because many devices do not have to increment such counters,
allocate the per-cpu storage on demand, so that dev_get_stats()
does not have to spend considerable time folding zero counters.
Note that some drivers have abused these counters which
were supposed to be only used by core networking stack.
v4: should use per_cpu_ptr() in dev_get_stats() (Jakub)
v3: added a READ_ONCE() in netdev_core_stats_alloc() (Paolo)
v2: add a missing include (reported by kernel test robot <lkp@intel.com>)
Change in netdev_core_stats_alloc() (Jakub)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: jeffreyji <jeffreyji@google.com>
Reviewed-by: Brian Vazquez <brianvv@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220311051420.2608812-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Network drivers such as igb or igc call eth_get_headlen() to determine the
header length for their to be constructed skbs in receive path.
When running HSR on top of these drivers, it results in triggering BUG_ON() in
skb_pull(). The reason is the skb headlen is not sufficient for HSR to work
correctly. skb_pull() notices that.
For instance, eth_get_headlen() returns 14 bytes for TCP traffic over HSR which
is not correct. The problem is, the flow dissection code does not take HSR into
account. Therefore, add support for it.
Reported-by: Anthony Harivel <anthony.harivel@linutronix.de>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20220228195856.88187-1-kurt@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In hsr, lockdep_is_held() is needed for rcu_dereference_bh_check().
But if lockdep is not enabled, lockdep_is_held() causes a build error:
ERROR: modpost: "lockdep_is_held" [net/hsr/hsr.ko] undefined!
Thus, this patch solved by adding lockdep_hsr_is_held(). This helper
function calls the lockdep_is_held() when lockdep is enabled, and returns 1
if not defined.
Fixes: e7f2742068 ("net: hsr: fix suspicious RCU usage warning in hsr_node_get_first()")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Link: https://lore.kernel.org/r/20220220153250.5285-1-claudiajkang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Currently, HSR manages mac addresses of known HSR nodes by using list_head.
It takes a lot of time when there are a lot of registered nodes due to
finding specific mac address nodes by using linear search. We can be
reducing the time by using hlist. Thus, this patch moves list_head to
hlist_head for mac addresses and this allows for further improvement of
network performance.
Condition: registered 10,000 known HSR nodes
Before:
# iperf3 -c 192.168.10.1 -i 1 -t 10
Connecting to host 192.168.10.1, port 5201
[ 5] local 192.168.10.2 port 59442 connected to 192.168.10.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.49 sec 3.75 MBytes 21.1 Mbits/sec 0 158 KBytes
[ 5] 1.49-2.05 sec 1.25 MBytes 18.7 Mbits/sec 0 166 KBytes
[ 5] 2.05-3.06 sec 2.44 MBytes 20.3 Mbits/sec 56 16.9 KBytes
[ 5] 3.06-4.08 sec 1.43 MBytes 11.7 Mbits/sec 11 38.0 KBytes
[ 5] 4.08-5.00 sec 951 KBytes 8.49 Mbits/sec 0 56.3 KBytes
After:
# iperf3 -c 192.168.10.1 -i 1 -t 10
Connecting to host 192.168.10.1, port 5201
[ 5] local 192.168.10.2 port 36460 connected to 192.168.10.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 7.39 MBytes 62.0 Mbits/sec 3 130 KBytes
[ 5] 1.00-2.00 sec 5.06 MBytes 42.4 Mbits/sec 16 113 KBytes
[ 5] 2.00-3.00 sec 8.58 MBytes 72.0 Mbits/sec 42 94.3 KBytes
[ 5] 3.00-4.00 sec 7.44 MBytes 62.4 Mbits/sec 2 131 KBytes
[ 5] 4.00-5.07 sec 8.13 MBytes 63.5 Mbits/sec 38 92.9 KBytes
Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The writer acquires dev_base_lock with disabled bottom halves.
The reader can acquire dev_base_lock without disabling bottom halves
because there is no writer in softirq context.
On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves (as in write_lock_bh()) and somewhere else with
enabled bottom halves (as by read_lock() in netstat_show()) followed by
disabling bottom halves (cxgb_get_stats() -> t4_wr_mbox_meat_timeout()
-> spin_lock_bh()). This is the reverse locking order.
All read_lock() invocation are from sysfs callback which are not invoked
from softirq context. Therefore there is no need to disable bottom
halves while acquiring a write lock.
Acquire the write lock of dev_base_lock without disabling bottom halves.
Reported-by: Pei Zhang <pezhang@redhat.com>
Reported-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
hsr_create_self_node() may get netdev->dev_addr
passed as argument, netdev->dev_addr will be
const soon.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert from ether_addr_copy() to eth_hw_addr_set():
@@
expression dev, np;
@@
- ether_addr_copy(dev->dev_addr, np)
+ eth_hw_addr_set(dev, np)
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't check the sequence number when deciding when to update time_in in
the node table if tag removal is offloaded since the sequence number is
part of the tag. This fixes a problem where the times in the node table
wouldn't update when 0 appeared to be before or equal to seq_out when
tag removal was offloaded.
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2e9f60932a ("net: hsr: check skb can contain struct hsr_ethhdr
in fill_frame_info") added the following which resulted in -EINVAL
always being returned:
if (skb->mac_len < sizeof(struct hsr_ethhdr))
return -EINVAL;
mac_len was not being set correctly so this check completely broke
HSR/PRP since it was always 14, not 20.
Set mac_len correctly and modify the mac_len checks to test in the
correct places since sometimes it is legitimately 14.
Fixes: 2e9f60932a ("net: hsr: check skb can contain struct hsr_ethhdr in fill_frame_info")
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
MAINTAINERS
- keep Chandrasekar
drivers/net/ethernet/mellanox/mlx5/core/en_main.c
- simple fix + trust the code re-added to param.c in -next is fine
include/linux/bpf.h
- trivial
include/linux/ethtool.h
- trivial, fix kdoc while at it
include/linux/skmsg.h
- move to relevant place in tcp.c, comment re-wrapped
net/core/skmsg.c
- add the sk = sk // sk = NULL around calls
net/tipc/crypto.c
- trivial
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reset MAC header in HSR Tx path. This is needed, because direct packet
transmission, e.g. by specifying PACKET_QDISC_BYPASS does not reset the MAC
header.
This has been observed using the following setup:
|$ ip link add name hsr0 type hsr slave1 lan0 slave2 lan1 supervision 45 version 1
|$ ifconfig hsr0 up
|$ ./test hsr0
The test binary is using mmap'ed sockets and is specifying the
PACKET_QDISC_BYPASS socket option.
This patch resolves the following warning on a non-patched kernel:
|[ 112.725394] ------------[ cut here ]------------
|[ 112.731418] WARNING: CPU: 1 PID: 257 at net/hsr/hsr_forward.c:560 hsr_forward_skb+0x484/0x568
|[ 112.739962] net/hsr/hsr_forward.c:560: Malformed frame (port_src hsr0)
The warning can be safely removed, because the other call sites of
hsr_forward_skb() make sure that the skb is prepared correctly.
Fixes: d346a3fae3 ("packet: introduce PACKET_QDISC_BYPASS socket option")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A typo is found out by codespell tool in 111th line of hsr_debugfs.c:
$ codespell ./net/hsr/
net/hsr/hsr_debugfs.c:111: Debufs ==> Debugfs
Fix typos found by codespell.
Signed-off-by: Xiong Zhenwu <xiong.zhenwu@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for offloading of HSR/PRP (IEC 62439-3) tag insertion
tag removal, duplicate generation and forwarding.
For HSR, insertion involves the switch adding a 6 byte HSR header after
the 14 byte Ethernet header. For PRP it adds a 6 byte trailer.
Tag removal involves automatically stripping the HSR/PRP header/trailer
in the switch. This is possible when the switch also performs auto
deduplication using the HSR/PRP header/trailer (making it no longer
required).
Forwarding involves automatically forwarding between redundant ports in
an HSR. This is crucial because delay is accumulated as a frame passes
through each node in the ring.
Duplication involves the switch automatically sending a single frame
from the CPU port to both redundant ports. This is required because the
inserted HSR/PRP header/trailer must contain the same sequence number
on the frames sent out both redundant ports.
Export is_hsr_master so DSA can tell them apart from other devices in
dsa_slave_changeupper.
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For a switch to offload insertion of HSR/PRP tags, frames must not be
sent to the CPU facing switch port with a tag. Generate supervision frames
(eth type ETH_P_PRP) without HSR v1 (ETH_P_HSR)/PRP tag and rely on
create_tagged_frame which inserts it later. This will allow skipping the
tag insertion for all outgoing frames in the future which is required for
HSR v1/PRP tag insertions to be offloaded.
HSR v0 supervision frames always contain tag information so insertion of
the tag can't be offloaded. IEC 62439-3 Ed.2.0 (HSR v1) specifically
notes that this was changed since v0 to allow offloading.
Signed-off-by: George McCollister <george.mccollister@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Two minor conflicts:
1) net/ipv4/route.c, adding a new local variable while
moving another local variable and removing it's
initial assignment.
2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes.
One pretty prints the port mode differently, whilst another
changes the driver to try and obtain the port mode from
the port node rather than the switch node.
Signed-off-by: David S. Miller <davem@davemloft.net>
clean follow coccicheck warning:
net//hsr/hsr_netlink.c:94:8-42: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
net//hsr/hsr_netlink.c:87:30-57: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
net//hsr/hsr_netlink.c:79:29-53: WARNING avoid newline at end of message
in NL_SET_ERR_MSG_MOD
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>