Add support for remote checksum offload in VXLAN. This uses a
reserved bit to indicate that RCO is being done, and uses the low order
reserved eight bits of the VNI to hold the start and offset values in a
compressed manner.
Start is encoded in the low order seven bits of VNI. This is start >> 1
so that the checksum start offset is 0-254 using even values only.
Checksum offset (transport checksum field) is indicated in the high
order bit in the low order byte of the VNI. If the bit is set, the
checksum field is for UDP (so offset = start + 6), else checksum
field is for TCP (so offset = start + 16). Only TCP and UDP are
supported in this implementation.
Remote checksum offload for VXLAN is described in:
https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4).
With UDP checksums and Remote Checksum Offload
IPv4
Client
11.84% CPU utilization
Server
12.96% CPU utilization
9197 Mbps
IPv6
Client
12.46% CPU utilization
Server
14.48% CPU utilization
8963 Mbps
With UDP checksums, no remote checksum offload
IPv4
Client
15.67% CPU utilization
Server
14.83% CPU utilization
9094 Mbps
IPv6
Client
16.21% CPU utilization
Server
14.32% CPU utilization
9058 Mbps
No UDP checksums
IPv4
Client
15.03% CPU utilization
Server
23.09% CPU utilization
9089 Mbps
IPv6
Client
16.18% CPU utilization
Server
26.57% CPU utilization
8954 Mbps
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces udp_offload_callbacks which has the same
GRO functions (but not a GSO function) as offload_callbacks,
except there is an argument to a udp_offload struct passed to
gro_receive and gro_complete functions. This additional argument
can be used to retrieve the per port structure of the encapsulation
for use in gro processing (mostly by doing container_of on the
structure).
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace tasklet with NAPI.
Add rx_queue to queue the remaining rx packets if the number of the
rx packets is more than the request from poll().
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ding Tianhong says:
====================
add hisilicon hip04 ethernet driver
v13:
- Fix the problem of alignment parameters for function and checkpatch warming.
v12:
- According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
for hip04 ethernet.
v11:
- Add ethtool support for tx coalecse getting and setting, the xmit_more
is not supported for this patch, but I think it could work for hip04,
will support it later after some tests for performance better.
Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
- Before:
$ ping 192.168.1.1 ...
=== 192.168.1.1 ping statistics ===
24 packets transmitted, 24 received, 0% packet loss, time 22999ms
rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
$ iperf -c 192.168.1.1 ...
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 115 MBytes 945 Mbits/sec
- After:
$ ping 192.168.1.1 ...
=== 192.168.1.1 ping statistics ===
24 packets transmitted, 24 received, 0% packet loss, time 22999ms
rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
$ iperf -c 192.168.1.1 ...
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 115 MBytes 965 Mbits/sec
v10:
- According Arnd's suggestion, remove the skb_orphan and use the hrtimer
for the cleanup of the TX queue and add some modification for the hip04
drivers.
1) drop the broken skb_orphan call
2) drop the workqueue
3) batch cleanup based on tx_coalesce_frames/usecs for better throughput
4) use a reasonable default tx timeout (200us, could be shorted
based on measurements) with a range timer
5) fix napi poll function return value
6) use a lockless queue for cleanup
v9:
- There is no tx completion interrupts to free DMAd Tx packets, it means taht
we rely on new tx packets arriving to run the destructors of completed packets,
which open up space in their sockets's send queues. Sometimes we don't get such
new packets causing Tx to stall, a single UDP transmitter is a good example of
this situation, so we need a clean up workqueue to reclaims completed packets,
the workqueue will only free the last packets which is already stay for several jiffies.
Also fix some format cleanups.
v8:
- Use poll to reclaim xmitted buffer as workaround since no tx done interrupt
v7:
- Remove select NET_CORE in 0002
v6:
- Suggest by Russell: Use netdev_sent_queue & netdev_completed_queue to solve latency issue
Also shorten the period of timer, which is used to wakeup the queue since no
tx completed interrupt.
v5:
- no big change, fix typo
v4:
- Modify accoringly to the suggetion from Arnd, Florian, Eric, David
Use of_parse_phandle_with_fixed_args & syscon_node_to_regmap get ppe info
Add skb_orphan() and tx_timer for reclaim since no tx_finished interrupt
Update timeout, and move of_phy_connect to probe to reuse open/stop
v3:
- Suggest from Arnd, use syscon & regmap_write/read to replace static void __iomem *ppebase.
Modify hisilicon-hip04-net.txt accrordingly to suggestion from Florian and Sergei.
v2:
- Got many suggestions from Russell, Arnd, Florian, Mark and Sergei
Remove memcpy, use dma_map/unmap_single, use dma_alloc_coherent rather than dma_pool, etc.
Refer property in ethernet.txt, change ppe description, etc.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller.
The controller has no tx done interrupt, reclaim xmitted buffer in the poll.
v13: Fix the problem of alignment parameters for function and checkpatch warming.
v12: According Alex's suggestion, modify the changelog and add MODULE_DEVICE_TABLE
for hip04 ethernet.
v11: Add ethtool support for tx coalecse getting and setting, the xmit_more
is not supported for this patch, but I think it could work for hip04,
will support it later after some tests for performance better.
Here are some performance test results by ping and iperf(add tx_coalesce_frames/users),
it looks that the performance and latency is more better by tx_coalesce_frames/usecs.
- Before:
$ ping 192.168.1.1 ...
=== 192.168.1.1 ping statistics ===
24 packets transmitted, 24 received, 0% packet loss, time 22999ms
rtt min/avg/max/mdev = 0.180/0.202/0.403/0.043 ms
$ iperf -c 192.168.1.1 ...
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 115 MBytes 945 Mbits/sec
- After:
$ ping 192.168.1.1 ...
=== 192.168.1.1 ping statistics ===
24 packets transmitted, 24 received, 0% packet loss, time 22999ms
rtt min/avg/max/mdev = 0.178/0.190/0.380/0.041 ms
$ iperf -c 192.168.1.1 ...
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 115 MBytes 965 Mbits/sec
v10: According David Miller and Arnd Bergmann's suggestion, add some modification
for v9 version
- drop the workqueue
- batch cleanup based on tx_coalesce_frames/usecs for better throughput
- use a reasonable default tx timeout (200us, could be shorted
based on measurements) with a range timer
- fix napi poll function return value
- use a lockless queue for cleanup
Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the Device Tree bindings for the Hisilicon hip04
Ethernet controller, including 100M / 1000M controller.
Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently `ethtool -S` simply returns "no stats available". It
would be more useful to see what the various ethtool statistics
registers' values are. This change implements get_ethtool_stats,
get_strings, and get_sset_count functions to accomplish this.
Read all GEM statistics registers and sum them into
macb.ethtool_stats. Add the necessary infrastructure to make this
accessible via `ethtool -S`.
Update gem_update_stats to utilize ethtool_stats.
Signed-off-by: Xander Huff <xander.huff@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change is to help improve at-a-glace knowledge of the purpose of the
various Cadence MACB/GEM registers. Comments are more helpful for human
readability than short acronyms.
Describe various #define varibles Cadence MACB/GEM registers as documented
in Xilinix's "Zynq-7000 All Programmable SoC TechnicalReference Manual, v1.9.1
(UG-585)"
Signed-off-by: Xander Huff <xander.huff@ni.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Vrabel says:
====================
xen-netfront: refactor making Tx requests
As netfront as evolved to handle different sorts of skbs the code to
fill a Tx requests has been copy and pasted several times. The series
refactors this and a few other areas.
The first patch is to a Xen header but this can be merged via
net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate all the duplicate code for making Tx requests by
consolidating them into a single xennet_make_one_txreq() function.
xennet_make_one_txreq() and xennet_make_txreqs() work with pages and
offsets so it will be easier to make netfront handle highmem frags in
the future.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A function to count the number of slots an skb needs is more useful
than one that counts the slots needed for only the frags.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pfn_to_mfn(page_to_pfn(p)) is a common use case so add a generic
helper for it.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each per bucket lock covers a configurable number of buckets. While
shrinking, two buckets in the old table contain entries for a single
bucket in the new table. We need to lock down both while linking.
Check if they are protected by different locks to avoid a recursive
lock.
Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The same macros are used for rx as well. So rename it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
tc code implicitly considers skb->protocol even in case of accelerated
vlan paths and expects vlan protocol type here. However, on rx path,
if the vlan header was already stripped, skb->protocol contains value
of next header. Similar situation is on tx path.
So for skbs that use skb->vlan_tci for tagging, use skb->vlan_proto instead.
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes some functions that are not used anywhere:
channel_to_vpivci() query_tx_channel_config() rx_disabled_handler()
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the function aal5_spacefor() that is not used anywhere.
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit f2f9800d49 "tipc: make tipc node table aware of net
namespace" has added a dereference of sock->sk before making sure it's
not NULL, which makes releasing a tipc socket NULL pointer dereference
for sockets that are not fully initialized.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the rx packet length check in the sunvnet driver to allow
for a TSO max packet length greater than the LDC channel negotiated MTU.
These are negotiated separately and there is no requirement that
port->tsolen be less than port->rmtu, but if it isn't, it'll drop packets
with rx length errors.
Signed-off-by: David L Stevens <david.stevens@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not all architectures are able to call __builtin_return_address().
On ARM, the mISDN code produces this warning:
hardware/mISDN/w6692.c: In function 'w6692_dctrl':
hardware/mISDN/w6692.c:1181:75: warning: unsupported argument to '__builtin_return_address'
pr_debug("%s: %s dev(%d) open from %p\n", card->name, __func__,
^
hardware/mISDN/mISDNipac.c: In function 'open_dchannel':
hardware/mISDN/mISDNipac.c:759:75: warning: unsupported argument to '__builtin_return_address'
pr_debug("%s: %s dev(%d) open from %p\n", isac->name, __func__,
^
In a lot of cases, this is relatively easy to work around by
passing the value of __builtin_return_address(0) from the
callers into the functions that want it. One exception is
the indirect 'open' function call in struct isac_hw. While it
would be possible to fix this as well, this patch only addresses
the other callers properly and lets this one return the direct
parent function, which should be good enough.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The return type of find_first_bit() is architecture specific,
on ARM it is 'unsigned int', while the asm-generic code used
on x86 and a lot of other architectures returns 'unsigned long'.
When building the mlx5 driver on ARM, we get a warning about
this:
infiniband/hw/mlx5/mem.c: In function 'mlx5_ib_cont_pages':
infiniband/hw/mlx5/mem.c:84:143: warning: comparison of distinct pointer types lacks a cast
m = min(m, find_first_bit(&tmp, sizeof(tmp)));
This patch changes the driver to use min_t to make it behave
the same way on all architectures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mlx5 driver passes a string pointer in through a 'u64' variable,
which on 32-bit machines causes a build warning:
drivers/net/ethernet/mellanox/mlx5/core/debugfs.c: In function 'qp_read_field':
drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:303:11: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
The code is in fact safe, so we can shut up the warning by adding
extra type casts.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rocker driver tries to assign a pointer to a 64-bit integer
and then back to a pointer. This is safe on all architectures,
but causes a compiler warning when pointers are shorter than
64-bit:
rocker/rocker.c: In function 'rocker_desc_cookie_ptr_get':
rocker/rocker.c:809:9: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
return (void *) desc_info->desc->cookie;
^
This adds another cast to uintptr_t to tell the compiler
that it's safe.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>