ipv6_cow_metrics() currently assumes only DST_HOST routes require
dynamic metrics allocation from inetpeer. The assumption breaks
when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric.
Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set()
is called after the route is created.
This patch creates the metrics array (by calling dst_cow_metrics_generic) in
ipv6_cow_metrics().
Test:
radvd.conf:
interface qemubr0
{
AdvLinkMTU 1300;
AdvCurHopLimit 30;
prefix fd00:face:face:face::/64
{
AdvOnLink on;
AdvAutonomous on;
AdvRouterAddr off;
};
};
Before:
[root@qemu1 ~]# ip -6 r show | egrep -v unreachable
fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec
fe80::/64 dev eth0 proto kernel metric 256
default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec
After:
[root@qemu1 ~]# ip -6 r show | egrep -v unreachable
fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec mtu 1300
fe80::/64 dev eth0 proto kernel metric 256 mtu 1300
default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec mtu 1300 hoplimit 30
Fixes: 8e2ec63917 (ipv6: don't use inetpeer to store metrics for routes.)
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a capability which let the hw could change the settings
automatically when the power change to ON. However, the USB reset
would reset the settings to the hw default, so the driver has to
restore the relative settings. Otherwise, it would influence the
functions of the hw, and the compatibility for the USB hub and
USB host controller.
The relative settings are as following.
- set the power down scale to 96.
- enable the power saving function of USB 2.0.
- disable the ALDPS of ECM mode.
- set burst mode depending on the burst size.
- enable the flow control of endpoint full.
- set fifo empty boundary to 32448 bytes.
- enable the function of exiting LPM when Rx OK occurs.
- set the connect timer to 1.
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If skb allocation fails once the IP header has been received, the rx state is
being set to WAIT_SYNC. The logic, though, shouldn't directly return, as the
buffer may contain a full packet, and therefore the WAIT_SYNC state needs to be
processed (resetting state to WAIT_IP, clearing rx_buf_size and re-initializing
rx_buf_missing).
So, just let the while loop continue so that in the next iteration the WAIT_SYNC
state cleanly stops the loop. The WAIT_SYNC processing will be done just after
that, only if the end of packet is flagged.
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 can keep a copy of SYN message using skb_get() in
tcp_v6_conn_request() so that caller wont free the skb when calling
kfree_skb() later.
Therefore TCP fast open has to clone the skb it is queuing in
child->sk_receive_queue, as all skbs consumed from receive_queue are
freed using __kfree_skb() (ie assuming skb->users == 1)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Fixes: 5b7ed0892f ("tcp: move fastopen functions to tcp_fastopen.c")
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_SYSCTL=n:
net/bridge/br_netfilter.c: In function ‘br_netfilter_init’:
net/bridge/br_netfilter.c:996: warning: label ‘err1’ defined but not used
Move the label and the code after it inside the existing #ifdef to get
rid of the warning.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use spin_lock_bh in ip6_fl_purge() to prevent following potentially
deadlock scenario between ip6_fl_purge() and ip6_fl_gc() timer.
=================================
[ INFO: inconsistent lock state ]
3.19.0 #1 Not tainted
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/5/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
(ip6_fl_lock){+.?...}, at: [<ffffffff8171155d>] ip6_fl_gc+0x2d/0x180
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810ee9a0>] __lock_acquire+0x4a0/0x10b0
[<ffffffff810efd54>] lock_acquire+0xc4/0x2b0
[<ffffffff81751d2d>] _raw_spin_lock+0x3d/0x80
[<ffffffff81711798>] ip6_flowlabel_net_exit+0x28/0x110
[<ffffffff815f9759>] ops_exit_list.isra.1+0x39/0x60
[<ffffffff815fa320>] cleanup_net+0x100/0x1e0
[<ffffffff810ad80a>] process_one_work+0x20a/0x830
[<ffffffff810adf4b>] worker_thread+0x11b/0x460
[<ffffffff810b42f4>] kthread+0x104/0x120
[<ffffffff81752bfc>] ret_from_fork+0x7c/0xb0
irq event stamp: 84640
hardirqs last enabled at (84640): [<ffffffff81752080>] _raw_spin_unlock_irq+0x30/0x50
hardirqs last disabled at (84639): [<ffffffff81751eff>] _raw_spin_lock_irq+0x1f/0x80
softirqs last enabled at (84628): [<ffffffff81091ad1>] _local_bh_enable+0x21/0x50
softirqs last disabled at (84629): [<ffffffff81093b7d>] irq_exit+0x12d/0x150
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(ip6_fl_lock);
<Interrupt>
lock(ip6_fl_lock);
*** DEADLOCK ***
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cosmetic patch to add __percpu qualifier to pcpu_stats
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tg3_init_one() calls tg3_halt() without tp->lock despite its assumption
and causes deadlock.
If lockdep is enabled, a warning like this shows up before the stall:
[ BUG: bad unlock balance detected! ]
3.19.0test #3 Tainted: G E
-------------------------------------
insmod/369 is trying to release lock (&(&tp->lock)->rlock) at:
[<ffffffffa02d5a1d>] tg3_chip_reset+0x14d/0x780 [tg3]
but there are no more locks to release!
tg3_init_one() doesn't call tg3_halt() under normal situation but
during kexec kdump I hit this problem.
Fixes: 932f19de ("tg3: Release tp->lock before invoking synchronize_irq()")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Northstar (Broadcom's ARM architecture) we need to manually enable
all cores. Code for that is already in place, but the condition for it
was wrong.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver keeps adding multicast addresses without deleting removed MACs and
worrying about adapters filter limit. This results into actual count of programmed
multicast addresses get accumulated over the time and overruns the adapter's
filter limit without putting device in ACCEPT_ALL_MULTI mode. This causes
newly added multicast traffic to fail after the sequence of addition - deletion
in certain pattern.
This issue is seen only when netdev's mcast list count is less than adapters
mcast filter limit.
e.g. If adapters multicast filter limit is 38 per function
then following sequence would result in multicast traffic failure for
newly added MACs.
- add less than 38 multicast MACs
- remove previously added multicast MACs
- add new multicast MACs (less than 38)
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code failed to configure the page size for architectures with page
size different than 4K - PPC for example.
Signed-off-by: Carol L Soto <clsoto@us.ibm.com>
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch unclones an skb for the case where the sunvnet driver needs to
change the segmentation size so that it doesn't interfere with TCP SACK's
use of them.
Signed-off-by: David L Stevens <david.stevens@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces the use of functions setup_timer
and mod_timer.
This is done using Coccinelle and semantic patch used
for this as follows:
// <smpl>
@@
expression x,y,z,a,b;
@@
-init_timer (&x);
+setup_timer (&x, y, z);
+mod_timer (&a, b);
-x.function = y;
-x.data = z;
-x.expires = b;
-add_timer(&a);
// </smpl>
Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_NET_XGENE=y but CONFIG_OF=n:
drivers/net/ethernet/apm/xgene/xgene_enet_main.c:1033: warning: ‘xgene_enet_of_match’ defined but not used
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is only an API consolidation and should make things more readable.
Converting milliseconds to jiffies by "val * HZ / 1000" is technically
OK but msecs_to_jiffies(val) is the cleaner solution and handles all
corner cases correctly. This is a minor API cleanup only.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Acked-by: Mark Einon <mark.einon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert says:
====================
net: Fixes to remote checksum offload and CHECKSUM_PARTIAL
This patch set fixes a correctness problem with remote checksum
offload, clarifies the meaning of CHECKSUM_PARTIAL, and allows
remote checksum offload to set CHECKSUM_PARTIAL instead of
calling csum_partial and modifying the checksum.
Specifically:
- In the GRO remote checksum path, restore the checksum after
calling lower layer GRO functions. This is needed if the
packet is forwarded off host with the Remote Checksum Offload
option still present.
- Clarify meaning of CHECKSUM PARTIAL in the receive path. Only
the checksums referred to by checksum partial and any preceding
checksums can be considered verified.
- Fixes to UDP tunnel GRO complete. Need to set SKB_GSO_UDP_TUNNEL_*,
SKB_GSO_TUNNEL_REMCSUM, and skb->encapsulation for forwarding
case.
- Infrastructure to allow setting of CHECKSUM_PARTIAL in remote
checksum offload. This a potential performance benefit instead
of calling csum_partial (potentially twice, once in GRO path
and once in normal path). The downside of using CHECKSUM_PARTIAL
and not actually writing the checksum is that we aren't verifying
that the sender correctly wrote the pseudo checksum into the
checksum field, or that the start/offset values actually point
to a checksum. If the sender did not set up these fields correctly,
a packet might be accepted locally, but not accepted by a peer
when the packet is forwarded off host. Verifying these fields
seems non-trivial, and because the fields can only be incorrect
due to sender error and not corruption (outer checksum protects
against that) we'll make use of CHECKSUM_PARTIAL the default. This
behavior can be reverted as an netlink option on the encapsulation
socket.
- Change VXLAN and GUE to set CHECKSUM_PARTIAL in remote checksum
offload by default, configuration hooks can revert to using
csum_partial.
Testing:
I ran performance numbers using netperf TCP_STREAM and TCP_RR with 200
streams for GRE/GUE and for VXLAN. This compares before the fixes,
the fixes with not setting checksum partial in remote checksum offload,
and with the fixes setting checksum partial. The overall effect seems
be that using checksum partial is a slight performance win, perf
definitely shows a significant reduction of time in csum_partial on
the receive CPUs.
GRE/GUE
TCP_STREAM
Before fixes
9.22% TX CPU utilization
13.57% RX CPU utilization
9133 Mbps
Not using checksum partial
9.59% TX CPU utilization
14.95% RX CPU utilization
9132 Mbps
Using checksum partial
9.37% TX CPU utilization
13.89% RX CPU utilization
9132 Mbps
TCP_RR
Before fixes
CPU utilization
159/251/447 90/95/99% latencies
1.1462e+06 tps
Not using checksum partial
92.94% CPU utilization
158/253/445 90/95/99% latencies
1.12988e+06 tps
Using checksum partial
92.78% CPU utilization
158/250/450 90/95/99% latencies
1.15343e+06 tps
VXLAN
TCP_STREAM
Before fixes
9.24% TX CPU utilization
13.74% RX CPU utilization
9093 Mbps
Not using checksum partial
9.95% TX CPU utilization
14.66% RX CPU utilization
9094 Mbps
Using checksum partial
10.24% TX CPU utilization
13.32% RX CPU utilization
9093 Mbps
TCP_RR
Before fixes
92.91% CPU utilization
151/241/437 90/95/99% latencies
1.15939e+06 tps
Not using checksum partial
93.07% CPU utilization
156/246/425 90/95/99% latencies
1.1451e+06 tps
Using checksum partial
95.51% CPU utilization
156/249/459 90/95/99% latencies
1.17004e+06 tps
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Change remote checksum handling to set checksum partial as default
behavior. Added an iflink parameter to configure not using
checksum partial (calling csum_partial to update checksum).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change remote checksum handling to set checksum partial as default
behavior. Added an iflink parameter to configure not using
checksum partial (calling csum_partial to update checksum).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds infrastructure so that remote checksum offload can
set CHECKSUM_PARTIAL instead of calling csum_partial and writing
the modfied checksum field.
Add skb_remcsum_adjust_partial function to set an skb for using
CHECKSUM_PARTIAL with remote checksum offload. Changed
skb_remcsum_process and skb_gro_remcsum_process to take a boolean
argument to indicate if checksum partial can be set or the
checksum needs to be modified using the normal algorithm.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the free and same_flow fields to be bit fields
(2 and 1 bit sized respectively). This frees up some space for u16's.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Properly set GSO types and skb->encapsulation in the UDP tunnel GRO
complete so that packets are properly represented for GSO. This sets
SKB_GSO_UDP_TUNNEL or SKB_GSO_UDP_TUNNEL_CSUM depending on whether
non-zero checksums were received, and sets SKB_GSO_TUNNEL_REMCSUM if
the remote checksum option was processed.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current meaning of CHECKSUM_PARTIAL for validating checksums
is that _all_ checksums in the packet are considered valid.
However, in the manner that CHECKSUM_PARTIAL is set only the checksum
at csum_start+csum_offset and any preceding checksums may
be considered valid. If there are checksums in the packet after
csum_offset it is possible they have not been verfied.
This patch changes CHECKSUM_PARTIAL logic in skb_csum_unnecessary and
__skb_gro_checksum_validate_needed to only considered checksums
referring to csum_start and any preceding checksums (with starting
offset before csum_start) to be verified.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remote checksum offload processing is currently the same for both
the GRO and non-GRO path. When the remote checksum offload option
is encountered, the checksum field referred to is modified in
the packet. So in the GRO case, the packet is modified in the
GRO path and then the operation is skipped when the packet goes
through the normal path based on skb->remcsum_offload. There is
a problem in that the packet may be modified in the GRO path, but
then forwarded off host still containing the remote checksum option.
A remote host will again perform RCO but now the checksum verification
will fail since GRO RCO already modified the checksum.
To fix this, we ensure that GRO restores a packet to it's original
state before returning. In this model, when GRO processes a remote
checksum option it still changes the checksum per the algorithm
but on return from lower layer processing the checksum is restored
to its original value.
In this patch we add define gro_remcsum structure which is passed
to skb_gro_remcsum_process to save offset and delta for the checksum
being changed. After lower layer processing, skb_gro_remcsum_cleanup
is called to restore the checksum before returning from GRO.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>