Commit Graph

686 Commits

Author SHA1 Message Date
Daniel Borkmann
8e2f1a63f2 packet: fix packet_direct_xmit for BQL enabled drivers
Currently, in packet_direct_xmit() we test the assigned netdevice queue
for netif_xmit_frozen_or_stopped() before doing an ndo_start_xmit().

This can have the side-effect that BQL enabled drivers which make use
of netdev_tx_sent_queue() internally, set __QUEUE_STATE_STACK_XOFF from
within the stack and would not fully fill the device's TX ring from
packet sockets with PACKET_QDISC_BYPASS enabled.

Instead, use a test without BQL bit so that bursts can be absorbed
into the NICs TX ring. Fix and code suggested by Eric Dumazet, thanks!

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-03 14:29:12 -04:00
Eric Dumazet
a50e233c50 net-gro: restore frag0 optimization
Main difference between napi_frags_skb() and napi_gro_receive() is that
the later is called while ethernet header was already pulled by the NIC
driver (eth_type_trans() was called before napi_gro_receive())

Jerry Chu in commit 299603e837 ("net-gro: Prepare GRO stack for the
upcoming tunneling support") tried to remove this difference by calling
eth_type_trans() from napi_frags_skb() instead of doing this later from
napi_frags_finish()

Goal was that napi_gro_complete() could call
ptype->callbacks.gro_complete(skb, 0)  (offset of first network header =
0)

Also, xxx_gro_receive() handlers all use off = skb_gro_offset(skb) to
point to their own header, for the current skb and ones held in gro_list

Problem is this cleanup work defeated the frag0 optimization:
It turns out the consecutive pskb_may_pull() calls are too expensive.

This patch brings back the frag0 stuff in napi_frags_skb().

As all skb have their mac header in skb head, we no longer need
skb_gro_mac_header()

Reported-by: Michal Schmidt <mschmidt@redhat.com>
Fixes: 299603e837 ("net-gro: Prepare GRO stack for the upcoming tunneling support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:26:40 -04:00
david decotigny
2d3b479df4 net-sysfs: expose number of carrier on/off changes
This allows to monitor carrier on/off transitions and detect link
flapping issues:
 - new /sys/class/net/X/carrier_changes
 - new rtnetlink IFLA_CARRIER_CHANGES (getlink)

Tested:
  - grep . /sys/class/net/*/carrier_changes
    + ip link set dev X down/up
    + plug/unplug cable
  - updated iproute2: prints IFLA_CARRIER_CHANGES
  - iproute2 20121211-2 (debian): unchanged behavior

Signed-off-by: David Decotigny <decot@googlers.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:24:52 -04:00
Florian Fainelli
339e022396 net: export NET_ADDR_* values to user-space API
NET_ADDR_* values are exported in the
/sys/class/net/<iface>/addr_assign_type sysfs attributes, and as such
constitutes an user-space ABI. Move the NET_ADDR_* definitions from
include/linux/netdevice.h to include/uapi/linux/netdevice.h

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:09:06 -04:00
Vlad Yasevich
1ee481fb4c net: Allow modules to use is_skb_forwardable
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-31 16:04:04 -04:00
David S. Miller
64c27237a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/marvell/mvneta.c

The mvneta.c conflict is a case of overlapping changes,
a conversion to devm_ioremap_resource() vs. a conversion
to netdev_alloc_pcpu_stats.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 18:48:54 -04:00
Eric W. Biederman
5efeac44cf netpoll: Respect NETIF_F_LLTX
Stop taking the transmit lock when a network device has specified
NETIF_F_LLTX.

If no locks needed to trasnmit a packet this is the ideal scenario for
netpoll as all packets can be trasnmitted immediately.

Even if some locks are needed in ndo_start_xmit skipping any unnecessary
serialization is desirable for netpoll as it makes it more likely a
debugging packet may be trasnmitted immediately instead of being
deferred until later.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 17:58:37 -04:00
Eric W. Biederman
a8779ec1c5 netpoll: Remove gfp parameter from __netpoll_setup
The gfp parameter was added in:
commit 47be03a28c
Author: Amerigo Wang <amwang@redhat.com>
Date:   Fri Aug 10 01:24:37 2012 +0000

    netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

    slave_enable_netpoll() and __netpoll_setup() may be called
    with read_lock() held, so should use GFP_ATOMIC to allocate
    memory. Eric suggested to pass gfp flags to __netpoll_setup().

    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Cong Wang <amwang@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

The reason for the gfp parameter was removed in:
commit c4cdef9b71
Author: dingtianhong <dingtianhong@huawei.com>
Date:   Tue Jul 23 15:25:27 2013 +0800

    bonding: don't call slave_xxx_netpoll under spinlocks

    The slave_xxx_netpoll will call synchronize_rcu_bh(),
    so the function may schedule and sleep, it should't be
    called under spinlocks.

    bond_netpoll_setup() and bond_netpoll_cleanup() are always
    protected by rtnl lock, it is no need to take the read lock,
    as the slave list couldn't be changed outside rtnl lock.

    Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
    Cc: Jay Vosburgh <fubar@us.ibm.com>
    Cc: Andy Gospodarek <andy@greyhouse.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Nothing else that calls __netpoll_setup or ndo_netpoll_setup
requires a gfp paramter, so remove the gfp parameter from both
of these functions making the code clearer.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-29 17:58:37 -04:00
Vlad Yasevich
53d6471cef net: Account for all vlan headers in skb_mac_gso_segment
skb_network_protocol() already accounts for multiple vlan
headers that may be present in the skb.  However, skb_mac_gso_segment()
doesn't know anything about it and assumes that skb->mac_len
is set correctly to skip all mac headers.  That may not
always be the case.  If we are simply forwarding the packet (via
bridge or macvtap), all vlan headers may not be accounted for.

A simple solution is to allow skb_network_protocol to return
the vlan depth it has calculated.  This way skb_mac_gso_segment
will correctly skip all mac headers.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 17:10:36 -04:00
Eric Dumazet
015f0688f5 net: net: add a core netdev->tx_dropped counter
Dropping packets in __dev_queue_xmit() when transmit queue
is stopped (NIC TX ring buffer full or BQL limit reached) currently
outputs a syslog message.

It would be better to get a precise count of such events available in
netdevice stats so that monitoring tools can have a clue.

This extends the work done in caf586e5f2
("net: add a core netdev->rx_dropped counter")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-28 16:49:48 -04:00
Eric W. Biederman
9c62a68d13 netpoll: Remove dead packet receive code (CONFIG_NETPOLL_TRAP)
The netpoll packet receive code only becomes active if the netpoll
rx_skb_hook is implemented, and there is not a single implementation
of the netpoll rx_skb_hook in the kernel.

All of the out of tree implementations I have found all call
netpoll_poll which was removed from the kernel in 2011, so this
change should not add any additional breakage.

There are problems with the netpoll packet receive code.  __netpoll_rx
does not call dev_kfree_skb_irq or dev_kfree_skb_any in hard irq
context.  netpoll_neigh_reply leaks every skb it receives.  Reception
of packets does not work successfully on stacked devices (aka bonding,
team, bridge, and vlans).

Given that the netpoll packet receive code is buggy, there are no
out of tree users that will be merged soon, and the code has
not been used for in tree for a decade let's just remove it.

Reverting this commit can server as a starting point for anyone
who wants to resurrect netpoll packet reception support.

Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-17 15:48:12 -04:00
stephen hemminger
693350c2ff netdev: set __percpu attribute on netdev_alloc_pcpu_stats
This patch fixes sparse warnings in vlan driver.
It propagates the sparse __percpu attribute from alloc_percpu
into netdev_alloc_pcpu_stats. I expect it may trigger additional
sparse warnings from other drivers that are missing the __percpu
attribute.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 16:37:14 -04:00
Luis R. Rodriguez
7aa98047df net: move net_device priv_flags out from UAPI
These are private to userspace, and they're unstable
anyway and can be shuffled at will (see 080e4130b1)
so any userspace application relying on them is on crack.

Test compiled with allyesconfig.

mcgrof@drvbp1 /pub/mem/mcgrof/net-next (git::master)$ make allyesconfig
mcgrof@drvbp1 /pub/mem/mcgrof/net-next (git::master)$ time make -j 20
...
  BUILD   arch/x86/boot/bzImage
Setup is 16992 bytes (padded to 17408 bytes).
System is 56153 kB
CRC 721d2751
Kernel: arch/x86/boot/bzImage is ready  (#1)
real    19m35.744s
user    280m37.984s
sys     27m54.104s

Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-27 15:59:09 -05:00
Amir Vadai
3f85944fe2 net: Add sysfs file for port number
Add a sysfs file to enable user space to query the device
port number used by a netdevice instance. This is needed for
devices that have multiple ports on the same PCI function.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-26 15:38:06 -05:00
David S. Miller
1e8d6421cf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_3ad.h
	drivers/net/bonding/bond_main.c

Two minor conflicts in bonding, both of which were overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-19 01:24:22 -05:00
Veaceslav Falico
f8ff080dac bonding: remove useless updating of slave->dev->last_rx
Now that all the logic is handled via last_arp_rx, we don't need to use
last_rx.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-18 16:47:15 -05:00
Daniel Borkmann
b9507bdaf4 netdevice: move netdev_cap_txqueue for shared usage to header
In order to allow users to invoke netdev_cap_txqueue, it needs to
be moved into netdevice.h header file. While at it, also add kernel
doc header to document the API.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 00:36:34 -05:00
Daniel Borkmann
99932d4fc0 netdevice: add queue selection fallback handler for ndo_select_queue
Add a new argument for ndo_select_queue() callback that passes a
fallback handler. This gets invoked through netdev_pick_tx();
fallback handler is currently __netdev_pick_tx() as most drivers
invoke this function within their customized implementation in
case for skbs that don't need any special handling. This fallback
handler can then be replaced on other call-sites with different
queue selection methods (e.g. in packet sockets, pktgen etc).

This also has the nice side-effect that __netdev_pick_tx() is
then only invoked from netdev_pick_tx() and export of that
function to modules can be undone.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 00:36:34 -05:00
WANG Cong
1c213bd24a net: introduce netdev_alloc_pcpu_stats() for drivers
There are many drivers calling alloc_percpu() to allocate pcpu stats
and then initializing ->syncp. So just introduce a helper function for them.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-14 15:49:55 -05:00
Florian Westphal
d206940319 net: core: introduce netif_skb_dev_features
Will be used by upcoming ipv4 forward path change that needs to
determine feature mask using skb->dst->dev instead of skb->dev.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 17:17:02 -05:00
Jiri Pirko
a9517d0f43 rtnetlink: remove ndo_get_slave
No longer used API bond-specific can be removed now. This is now handled
in a generic way in rtnl_link_ops.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-22 21:57:17 -08:00
Or Gerlitz
b582ef0990 net: Add GRO support for UDP encapsulating protocols
Add GRO handlers for protocols that do UDP encapsulation, with the intent of
being able to coalesce packets which encapsulate packets belonging to
the same TCP session.

For GRO purposes, the destination UDP port takes the role of the ether type
field in the ethernet header or the next protocol in the IP header.

The UDP GRO handler will only attempt to coalesce packets whose destination
port is registered to have gro handler.

Use a mark on the skb GRO CB data to disallow (flush) running the udp gro receive
code twice on a packet. This solves the problem of udp encapsulated packets whose
inner VM packet is udp and happen to carry a port which has registered offloads.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-21 18:05:04 -08:00
sfeldma@cumulusnetworks.com
1d3ee88ae0 bonding: add netlink attributes to slave link dev
If link is IFF_SLAVE, extend link dev netlink attributes to include
slave attributes with new IFLA_SLAVE nest.  Add netlink notification
(RTM_NEWLINK) when slave status changes from backup to active, or
visa-versa.

Adds new ndo_get_slave op to net_device_ops to fill skb with IFLA_SLAVE
attributes.  Currently only used by bonding driver, but could be
used by other aggregating devices with slaves.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-17 18:51:58 -08:00
Michael Dalton
a953be53ce net-sysfs: add support for device-specific rx queue sysfs attributes
Extend existing support for netdevice receive queue sysfs attributes to
permit a device-specific attribute group. Initial use case for this
support will be to allow the virtio-net device to export per-receive
queue mergeable receive buffer size.

Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 23:46:06 -08:00
Veaceslav Falico
1d486bfb66 net: add NETDEV_PRECHANGEMTU to notify before mtu change happens
Currently, if a device changes its mtu, first the change happens (invloving
all the side effects), and after that the NETDEV_CHANGEMTU is sent so that
other devices can catch up with the new mtu. However, if they return
NOTIFY_BAD, then the change is reverted and error returned.

This is a really long and costy operation (sometimes). To fix this, add
NETDEV_PRECHANGEMTU notification which is called prior to any change
actually happening, and if any callee returns NOTIFY_BAD - the change is
aborted. This way we're skipping all the playing with apply/revert the mtu.

CC: "David S. Miller" <davem@davemloft.net>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Eric Dumazet <edumazet@google.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-16 17:15:41 -08:00