Commit Graph

444426 Commits

Author SHA1 Message Date
David S. Miller fba0e1a3cf Merge branch 'fec'
Fugang Duan says:

====================
net: fec: Enable Software TSO to improve the tx performance

Add SG and software TSO support for FEC.
This feature allows to improve outbound throughput performance.
Tested on imx6dl sabresd board, running iperf tcp tests shows:
        * 82% improvement comparing with NO SG & TSO patch

$ ethtool -K eth0 sg on
$ ethtool -K eth0 tso on
[  3] local 10.192.242.108 port 35388 connected with 10.192.242.167 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 3.0 sec   181 MBytes   506 Mbits/sec
* cpu loading is 30%

$ ethtool -K eth0 sg off
$ ethtool -K eth0 tso off
[  3] local 10.192.242.108 port 52618 connected with 10.192.242.167 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 3.0 sec  99.5 MBytes   278 Mbits/sec

FEC HW support IP header and TCP/UDP hw checksum, support multi buffer descriptor transfer
one frame, but don't support HW TSO. And imx6q/dl SOC FEC Gbps speed has HW bus Bandwidth
limitation (400Mbps ~ 700Mbps), imx6sx SOC FEC Gbps speed has no HW bandwidth limitation.

The patch set just enable TSO feature, which is done following the mv643xx_eth driver.

Test result analyze:
imx6dl sabresd board: there have 82% improvement, since imx6dl FEC HW has bandwidth limitation,
                      the performance with SW TSO is a milestone.

Addition test:
imx6sx sdb board:
upstream still don't support imx6sx due to some patches being upstream... they use same FEC IP.
Use the SW TSO patches test imx6sx sdb board in internal kernel tree:
No SW TSO patch: tx bandwidth 840Mbps, cpu loading is 100%.
SW TSO patch:    tx bandwidth 942Mbps, cpu loading is 65%.
It means the patch set have great improvement for imx6sx FEC performance.

V2:
* From Frank Li's suggestion:
	Change the API "fec_enet_txdesc_entry_free" name to "fec_enet_get_free_txdesc_num".
* Summary David Laight and Eric Dumazet's thoughts:
	RX BD entry number change to 256.
* From ezequiel's suggestion:
	Follow the latest TSO fixes from his solution to rework the queue stop/wake-up.
	Avoid unmapping the TSO header buffers.
* From Eric Dumazet's suggestion:
	Avoid more bytes copy, just copying the unaligned part of the payload into first
	descriptor. The suggestion will bring more complex for the driver, and imx6dl FEC
	DMA need 16 bytes alignment, but cpu loading is not problem that cpu loading is
	30%, the current performance is so better. Later chip like imx6sx Gigbit FEC DMA
	support byte alignment, so there don't exist memory copy. So, the V2 version drop
	the suggestion.
	Anyway, thanks for Eric's response and suggestion.

V3:
* From David Laight's feedback:
	Decide to drop RX BD entry number change for the SW TSO patch set.
	I will generate one separate patch to increase RX BDs entry for interrupt coalescing feature which
	will be supported in my later patch set.

V4:
* From David Laight's feedback:
	Remove the conditional in .fec_enet_get_bd_index().

V5:
* Patch #4 update:
  From David Laight's feedback:
	"expect fec_enet_get_free_txdesc_num() to return one less than it does currently."
	Change the function:
	Return space available, 0..size-1.  it always leave one free entry. Which is same as linux circ_buf.

Thanks for Eric and ezequiel's help and idea.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 11:02:08 -07:00
Nimrod Andy 79f339125e net: fec: Add software TSO support
Add software TSO support for FEC.
This feature allows to improve outbound throughput performance.

Tested on imx6dl sabresd board, running iperf tcp tests shows:
- 16.2% improvement comparing with FEC SG patch
- 82% improvement comparing with NO SG & TSO patch

$ ethtool -K eth0 tso on
$ iperf -c 10.192.242.167 -t 3 &
[  3] local 10.192.242.108 port 35388 connected with 10.192.242.167 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 3.0 sec   181 MBytes   506 Mbits/sec

During the testing, CPU loading is 30%.
Since imx6dl FEC Bandwidth is limited to SOC system bus bandwidth, the
performance with SW TSO is a milestone.

CC: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Laight <David.Laight@ACULAB.COM>
CC: Li Frank <B20596@freescale.com>
Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 11:01:57 -07:00
Nimrod Andy 6e909283cb net: fec: Add Scatter/gather support
Add Scatter/gather support for FEC.
This feature allows to improve outbound throughput performance.

Tested on imx6dl sabresd board:
Running iperf tests shows a 55.4% improvement.

$ ethtool -K eth0 sg off
$ iperf -c 10.192.242.167 -t 3 &
[  3] local 10.192.242.108 port 52618 connected with 10.192.242.167 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 3.0 sec  99.5 MBytes   278 Mbits/sec

$ ethtool -K eth0 sg on
$ iperf -c 10.192.242.167 -t 3 &
[  3] local 10.192.242.108 port 52617 connected with 10.192.242.167 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 3.0 sec   154 MBytes   432 Mbits/sec

CC: Li Frank <B20596@freescale.com>
Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 11:01:57 -07:00
Nimrod Andy 55d0218ae2 net: fec: Increase buffer descriptor entry number
In order to support SG, software TSO, let's increase BD entry number.

CC: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 11:01:57 -07:00
Nimrod Andy 09d1e541fd net: fec: Factorize feature setting
In order to enhance the code readable, let's factorize the
feature list.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 11:01:57 -07:00
Nimrod Andy 96c50caa51 net: fec: Enable IP header hardware checksum
IP header checksum is calcalated by network layer in default.
To support software TSO, it is better to use HW calculate the
IP header checksum.

FEC hw checksum feature request the checksum field in frame
is zero, otherwise the calculative CRC is not correct.

For segmentated TCP packet, HW calculate the IP header checksum again,
it doesn't bring any impact. For SW TSO, HW calculated checksum bring
better performance.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 11:01:57 -07:00
Nimrod Andy 61a4427b95 net: fec: Factorize the .xmit transmit function
Make the code more readable and easy to support other features like
SG, TSO, moving the common transmit function to one api.

And the patch also factorize the getting BD index to it own function.

CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 11:01:57 -07:00
Linus Lüssing 3993c4e159 bridge: fix compile error when compiling without IPv6 support
Some fields in "struct net_bridge" aren't available when compiling the
kernel without IPv6 support. Therefore adding a check/macro to skip the
complaining code sections in that case.

Introduced by 2cd4143192
("bridge: memorize and export selected IGMP/MLD querier port")

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 11:00:24 -07:00
Linus Lüssing 6c03ee8bda bridge: fix smatch warning / potential null pointer dereference
"New smatch warnings:
  net/bridge/br_multicast.c:1368 br_ip6_multicast_query() error:
    we previously assumed 'group' could be null (see line 1349)"

In the rare (sort of broken) case of a query having a Maximum
Response Delay of zero, we could create a potential null pointer
dereference.

Fixing this by skipping the multicast specific MLD Query parsing again
if no multicast group address is available.

Introduced by dc4eb53a99
("bridge: adhere to querier election mechanism specified by RFCs")

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 11:00:24 -07:00
François Cachereul 179584388d via-rhine: fix full-duplex with autoneg disable
With some specific configuration (VT6105M on Soekris 5510 and depending
on the device at the other end), fragmented packets were not transmitted
when forcing 100 full-duplex with autoneg disable.

This fix now write full-duplex chips register when forcing full or
half-duplex not only when autoneg is enable.

Signed-off-by: François Cachereul <f.cachereul@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 10:31:10 -07:00
David S. Miller a4d3de0d5f Merge branch 'bnx2x'
Yuval Mintz says:

====================
bnx2x: Bug fixes patch series

This patch series contains various bug fixes - 2 link related fixes,
one sriov-related issue and an additional fix for a theoretical bug
on new boards.

Please consider applying these patches to `net'.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 10:28:49 -07:00
Ariel Elior f2cfa997ef bnx2x: Enlarge the dorq threshold for VFs
A malicious VF might try to starve the other VFs & PF by creating
contineous doorbell floods. In order to negate this, HW has a threshold of
doorbells per client, which will stop the client doorbells from arriving
if crossed.

The threshold currently configured for VFs is too low - under extreme traffic
scenarios, it's possible for a VF to reach the threshold and thus for its
fastpath to stop working.

Signed-off-by: Ariel Elior <ariel.elior@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 10:28:18 -07:00
Yuval Mintz b17b0ca164 bnx2x: Check for UNDI in uncommon branch
If L2FW utilized by the UNDI driver has the same version number as that
of the regular FW, a driver loading after UNDI and receiving an uncommon
answer from management will mistakenly assume the loaded FW matches its
own requirement and try to exist the flow via FLR.

Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: Ariel Elior <ariel.elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 10:28:18 -07:00
Yaniv Rosner a2755be5b5 bnx2x: Fix 1G-baseT link
Set the phy access mode even in case of link-flap avoidance.

Signed-off-by: Yaniv Rosner <yaniv.rosner@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: Ariel Elior <ariel.elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 10:28:18 -07:00
Yaniv Rosner dad91ee478 bnx2x: Fix link for KR with swapped polarity lane
This avoids clearing the RX polarity setting in KR mode when polarity lane
is swapped, as otherwise this will result in failed link.

Signed-off-by: Yaniv Rosner <yaniv.rosner@qlogic.com>
Signed-off-by: Yuval Mintz <yuval.mintz@qlogic.com>
Signed-off-by: Ariel Elior <ariel.elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 10:28:18 -07:00
Xufeng Zhang d3217b15a1 sctp: Fix sk_ack_backlog wrap-around problem
Consider the scenario:
For a TCP-style socket, while processing the COOKIE_ECHO chunk in
sctp_sf_do_5_1D_ce(), after it has passed a series of sanity check,
a new association would be created in sctp_unpack_cookie(), but afterwards,
some processing maybe failed, and sctp_association_free() will be called to
free the previously allocated association, in sctp_association_free(),
sk_ack_backlog value is decremented for this socket, since the initial
value for sk_ack_backlog is 0, after the decrement, it will be 65535,
a wrap-around problem happens, and if we want to establish new associations
afterward in the same socket, ABORT would be triggered since sctp deem the
accept queue as full.
Fix this issue by only decrementing sk_ack_backlog for associations in
the endpoint's list.

Fix-suggested-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-12 10:27:14 -07:00
David S. Miller 902455e007 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/core/rtnetlink.c
	net/core/skbuff.c

Both conflicts were very simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 16:02:55 -07:00
Doug Ledford c5b4616087 net/core: Add VF link state control policy
Commit 1d8faf48c7 (net/core: Add VF link state control) added VF link state
control to the netlink VF nested structure, but failed to add a proper entry
for the new structure into the VF policy table.  Add the missing entry so
the table and the actual data copied into the netlink nested struct are in
sync.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:51:37 -07:00
Andy Fleming 39f33367e4 net/fsl: xgmac_mdio is dependent on OF_MDIO
Signed-off-by: Shruti Kanetkar <Shruti@Freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:50:59 -07:00
Shruti Kanetkar 55fd36419c net/fsl: Make xgmac_mdio read error message useful
Print the device address, the register number and the PHY ID for
which the MDIO read operation failed

Signed-off-by: Shruti Kanetkar <Shruti@Freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:50:59 -07:00
Florian Westphal 6e765a009a net_sched: drr: warn when qdisc is not work conserving
The DRR scheduler requires that items on the active list are work
conserving, i.e. do not hold on to skbs for throttling purposes, etc.
Attaching e.g. tbf renders DRR useless because all other classes on the
active list are delayed as well.

So, warn users that this configuration won't work as expected; we
already do this in couple of other qdiscs, see e.g.

commit b00355db3f
('pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler')

The 'const' change is needed to avoid compiler warning ("discards 'const'
qualifier from pointer target type").

tested with:
drr_hier() {
        parent=$1
        classes=$2
        for i in  $(seq 1 $classes); do
                classid=$parent$(printf %x $i)
                tc class add dev eth0 parent $parent classid $classid drr
		tc qdisc add dev eth0 parent $classid tbf rate 64kbit burst 256kbit limit 64kbit
        done
}
tc qdisc add dev eth0 root handle 1: drr
drr_hier 1: 32
tc filter add dev eth0 protocol all pref 1 parent 1: handle 1 flow hash keys dst perturb 1 divisor 32

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:50:59 -07:00
David S. Miller f3591fd4c9 Merge branch 'inet_csums'
Tom Herbert says:

====================
net: Checksum offload changes - Part IV

I am working on overhauling RX checksum offload. Goals of this effort
are:

- Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY
- Preserve CHECKSUM_COMPLETE through encapsulation layers
- Don't do skb_checksum more than once per packet
- Unify GRO and non-GRO csum verification as much as possible
- Unify the checksum functions (checksum_init)
- Simply code

What is in this fourth patch set:

- Preserve CHECKSUM_COMPLETE instead of changing it to
  CHECKSUM_UNNECESSARY. This allows correct reuse in validating multiple
  csums in a packet.
- When SW needs to compute the packet checksum, save it as
  CHECKSUM_COMPLETE. Also mark that checksum was compute by SW.
- Add skb_gro_postpull_rcsum to udp and vxlan to make GRO work with
  CHECKSUM_COMPLETE.

v2: Removed patch setting skb_encapsulation when validating checksum
    in tcp_gro_receive

Please review carefully and test if possible, mucking with basic
checksum functions is always a little precarious :-)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:46:17 -07:00
Tom Herbert 6bae1d4cc3 net: Add skb_gro_postpull_rcsum to udp and vxlan
Need to gro_postpull_rcsum for GRO to work with checksum complete.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:46:13 -07:00
Tom Herbert 7e3cead517 net: Save software checksum complete
In skb_checksum complete, if we need to compute the checksum for the
packet (via skb_checksum) save the result as CHECKSUM_COMPLETE.
Subsequent checksum verification can use this.

Also, added csum_complete_sw flag to distinguish between software and
hardware generated checksum complete, we should always be able to trust
the software computation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:46:13 -07:00
Tom Herbert 5d0c2b95bc net: Preserve CHECKSUM_COMPLETE at validation
Currently when the first checksum in a packet is validated using
CHECKSUM_COMPLETE, ip_summed is overwritten to be CHECKSUM_UNNECESSARY
so that any subsequent checksums in the packet are not correctly
validated.

This patch adds csum_valid flag in sk_buff and uses that to indicate
validated checksum instead of setting CHECKSUM_UNNECESSARY. The bit
is set accordingly in the skb_checksum_validate_* functions. The flag
is checked in skb_checksum_complete, so that validation is communicated
between checksum_init and checksum_complete sequence in TCP and UDP.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-11 15:46:12 -07:00